The last few weeks I got to explore a large part of Japan and have finally made it to Taiwan. It’s gone by so fast. I’ve gotten to try so much good food and had a great time checking out all the different areas!
Jumping into my blog post… Last week we met with the subject matter expert for our capstone project “Algorithmic Stock Market Trading”. The resource was able to give us a lot of useful information and talk about his experience with this topic. We came up with a game plan to implement but still have to get into the details of the project to really figure out what direction to head.
https://www.w3schools.com/python/pandas/pandas_csv.asp
After hearing about the technical requirements for the application I think a csv file will be more than sufficient for our project. Once we get our data we can then export it as a common csv file and use that for any data processing needs. The pandas read_csv() function can take a simple file path and file name for reading a csv file. You can use some additional arguments to specify other stuff like separators, delimiters, and white space options.
df = pd.read_csv(‘data.csv’)
https://stackoverflow.com/questions/17071871/how-do-i-select-rows-from-a-dataframe-based-on-column-values
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html
Once you have the data into a dataframe pandas allows almost endless data manipulation. A good starting point is trying to select certain rows based on a column value. The Stackoverflow link has some good examples in there. We will be mainly using the df.loc function that allows you to locate certain values or look for values over a threshold. It’s common to write something like this df.loc[df[‘column_name’] > 6.
df = df.loc[df[‘column_name’] > 6
If you write this it will overwrite you current dataframe with only rows that have column_name > 6.
df_column_filtered = df.loc[df[‘column_name’] > 6
You can do something like this to create a new dataframe.
This is just a small example of one of the functions in the pandas library that is very useful for filtering data. Later on you can use these dataframes to create specific tables or plots.
Leave a Reply