I was really drawn to the senior capstone project I chose on fire risk prediction largely due to my interest in ML. I’m excited to be joining a team after I finish my degree which works heavily in leveraging big data and ML algorithms for customer insights and it’s been really interesting getting to learn through my project a little more about what ML, well, actually is.
I thought it’d be fun in the next few entries if I walk through basic ML modeling in Python with Jupyter notebook. I’ve had a little exposure to this before but I’m basically re-learning as I go, and it’s been a fun and educational process.
The dataset that I am working with is from Kaggle. This is a great resource for learning ML and finding ML datasets. In my capstone project, we are working on proprietary data so as a substitute for this exercise, I am using the Kaggle dataset on US Wages. These are the dependent variables in my dataset, the first few rows, and the commands to display them in Jupyter.
data:image/s3,"s3://crabby-images/c3e1e/c3e1ef91076b940d3b692957d1cebccd7c2f54da" alt=""
We can begin to do same basic data visualization by running scatterplots. For example, there is a clear relationship between educational level and earnings based on what we see here.
data:image/s3,"s3://crabby-images/d29fb/d29fb8919805472ddaf24d57cb1037d480d52bf8" alt=""
You can see that the variables are the type that we may be able to use to estimate wages – height, gender, educational level, age, etc. Before we are able to run this as a model, notice that some of our variables need to be transformed — you can’t plug “white” or “female” into an equation! We do this by breaking down the variables into dummy variables using the following command.
data:image/s3,"s3://crabby-images/cc202/cc202b03ca5ddd8e3b1fe9a6e181e66845707ab7" alt=""
data:image/s3,"s3://crabby-images/cce39/cce3959fb04513405818ce7519b04bec31f3657a" alt=""
Thanks for joining me as I explored and learned about basic data loading and visualization in Jupyter Notebook. Please continue to follow me in the upcoming weeks as I start implementing some basic ML tools!