{"id":95,"date":"2022-04-29T06:41:41","date_gmt":"2022-04-29T06:41:41","guid":{"rendered":"https:\/\/blogs.oregonstate.edu\/prettycode\/?p=95"},"modified":"2022-04-29T06:41:41","modified_gmt":"2022-04-29T06:41:41","slug":"intro-to-ml-part-1-data-exploration","status":"publish","type":"post","link":"https:\/\/blogs.oregonstate.edu\/prettycode\/2022\/04\/29\/intro-to-ml-part-1-data-exploration\/","title":{"rendered":"Intro to ML &#8211; Part 1 &#8211; Data Exploration"},"content":{"rendered":"\n<p>I was really drawn to the senior capstone project I chose on fire risk prediction largely due to my interest in ML. I&#8217;m excited to be joining a team after I finish my degree which works heavily in leveraging big data and ML algorithms for customer insights and it&#8217;s been really interesting getting to learn through my project a little more about what ML, well, actually is. <\/p>\n\n\n\n<p>I thought it&#8217;d be fun in the next few entries if I walk through basic ML modeling in Python with Jupyter notebook. I&#8217;ve had a little exposure to this before but I&#8217;m basically re-learning as I go, and it&#8217;s been a fun and educational process.<\/p>\n\n\n\n<p>The dataset that I am working with is from Kaggle. This is a great resource for learning ML and finding ML datasets. In my capstone project, we are working on proprietary data so as a substitute for this exercise, I am using the Kaggle dataset on US Wages. These are the dependent variables in my dataset, the first few rows, and the commands to display them in Jupyter.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"432\" height=\"627\" src=\"https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/5374\/files\/2022\/04\/2022-04-28-22_19_01-Untitled-Jupyter-Notebook.png\" alt=\"\" class=\"wp-image-97\" srcset=\"https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/5374\/files\/2022\/04\/2022-04-28-22_19_01-Untitled-Jupyter-Notebook.png 432w, https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/5374\/files\/2022\/04\/2022-04-28-22_19_01-Untitled-Jupyter-Notebook-207x300.png 207w\" sizes=\"auto, (max-width: 432px) 100vw, 432px\" \/><\/figure>\n\n\n\n<p>We can begin to do same basic data visualization by running scatterplots. For example, there is a clear relationship between educational level and earnings based on what we see here. <\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"795\" height=\"455\" src=\"https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/5374\/files\/2022\/04\/2022-04-28-22_23_30-Untitled-Jupyter-Notebook.png\" alt=\"\" class=\"wp-image-98\" srcset=\"https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/5374\/files\/2022\/04\/2022-04-28-22_23_30-Untitled-Jupyter-Notebook.png 795w, https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/5374\/files\/2022\/04\/2022-04-28-22_23_30-Untitled-Jupyter-Notebook-300x172.png 300w, https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/5374\/files\/2022\/04\/2022-04-28-22_23_30-Untitled-Jupyter-Notebook-768x440.png 768w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/figure>\n\n\n\n<p>You can see that the variables are the type that we may be able to use to estimate wages &#8211; height, gender, educational level, age, etc. Before we are able to run this as a model, notice that some of our variables need to be transformed &#8212; you can&#8217;t plug &#8220;white&#8221; or &#8220;female&#8221; into an equation! We do this by breaking down the variables into dummy variables using the following command. <\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"63\" src=\"https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/5374\/files\/2022\/04\/image.png\" alt=\"\" class=\"wp-image-99\" srcset=\"https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/5374\/files\/2022\/04\/image.png 768w, https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/5374\/files\/2022\/04\/image-300x25.png 300w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"798\" height=\"383\" src=\"https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/5374\/files\/2022\/04\/image-2.png\" alt=\"\" class=\"wp-image-101\" srcset=\"https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/5374\/files\/2022\/04\/image-2.png 798w, https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/5374\/files\/2022\/04\/image-2-300x144.png 300w, https:\/\/osu-wams-blogs-uploads.s3.amazonaws.com\/blogs.dir\/5374\/files\/2022\/04\/image-2-768x369.png 768w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/figure>\n\n\n\n<p>Thanks for joining me as I explored and learned about basic data loading and visualization in  Jupyter Notebook. Please continue to follow me in the upcoming weeks as I start implementing some basic ML tools!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I was really drawn to the senior capstone project I chose on fire risk prediction largely due to my interest in ML. I&#8217;m excited to be joining a team after I finish my degree which works heavily in leveraging big data and ML algorithms for customer insights and it&#8217;s been really interesting getting to learn &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/blogs.oregonstate.edu\/prettycode\/2022\/04\/29\/intro-to-ml-part-1-data-exploration\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Intro to ML &#8211; Part 1 &#8211; Data Exploration&#8221;<\/span><\/a><\/p>\n","protected":false},"author":12224,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-95","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/blogs.oregonstate.edu\/prettycode\/wp-json\/wp\/v2\/posts\/95","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.oregonstate.edu\/prettycode\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.oregonstate.edu\/prettycode\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.oregonstate.edu\/prettycode\/wp-json\/wp\/v2\/users\/12224"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.oregonstate.edu\/prettycode\/wp-json\/wp\/v2\/comments?post=95"}],"version-history":[{"count":1,"href":"https:\/\/blogs.oregonstate.edu\/prettycode\/wp-json\/wp\/v2\/posts\/95\/revisions"}],"predecessor-version":[{"id":102,"href":"https:\/\/blogs.oregonstate.edu\/prettycode\/wp-json\/wp\/v2\/posts\/95\/revisions\/102"}],"wp:attachment":[{"href":"https:\/\/blogs.oregonstate.edu\/prettycode\/wp-json\/wp\/v2\/media?parent=95"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.oregonstate.edu\/prettycode\/wp-json\/wp\/v2\/categories?post=95"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.oregonstate.edu\/prettycode\/wp-json\/wp\/v2\/tags?post=95"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}