Coding stories, tips, and tricks

Clara Bird1 and Karen Lohman2

1Masters Student in Wildlife Science, Geospatial Ecology of Marine Megafauna Lab

2Masters Student in Wildlife Science, Cetacean Conservation and Genomics Laboratory

In a departure from my typical science-focused blog, this week I thought I would share more about myself. This week I was inspired by International’s Woman’s Day and, with some reflection on the last eight months as a graduate student, I decided to look back on the role that coding has played in my life. We hear about how much coding can be empowering but I thought it might be cool to talk about my personal experience of feeling empowered by coding. I’ve also invited a fellow grad student in the Marine Mammal Institute, Karen Lohman, to co-author this post. We’re going to briefly talk about our experience with coding and then finish with advice for getting started with coding and coding for data analysis.

Our Stories

Clara

I’ve only been coding for a little over two and a half years. In summer 2017 I did an NSF REU (Research Experience for Undergraduates) at Bigelow Laboratory for Ocean Sciences and for my project I taught myself python (with the support of a post-doc) for a data analysis project. During those 10 weeks, I coded all day, every workday. From that experience, I not only acquired the hard skill of programming, but I gained a good amount of confidence in myself, and here’s why: For the first three years of my undergraduate career coding was a daunting skill that I knew I would eventually need but did not know where to start. So, I essentially ended up learning by jumping off the deep end. I found the immersion experience to be the most effective learning method for me. With coding, you find out if you got something right (or wrong) almost instantaneously. I’ve found that this is a double-edged sword. It means that you can easily have days where everything goes wrong. But, the feeling when it finally works is what I think of when I hear the term empowerment. I’m not quite sure how to put it into words, but it’s a combination of independence, confidence, and success. 

Aside from learning the fundamentals, I finished that summer with confidence in my ability to teach myself not just new coding skills, but other skills as well. I think that feeling confident in my ability to learn something new has been the most helpful aspect to allow me to hit the ground running in grad school and also keeping the ‘imposter syndrome’ at bay (most of the time).

Clara’s Favorite Command: pd.groupby (python) – Say you have a column of measurements and a second column with the field site of each location. If you wanted the mean of the measurement per each location, you could use groupby to get this. It would look like this: dataframe.groupby(‘Location’)[‘Measurement’].mean().reset_index()

Karen

I’m quite new to coding, but once I started learning I was completely enchanted! I was first introduced to coding while working as a field assistant for a PhD student (a true R wizard who has since developed deep learning computer vision packages for automated camera trap image analysis) in the cloud forest of the Ecuadorian Andes. This remote jungle was where I first saw how useful coding can be for data management and analysis. It was a strange juxtaposition between being fully immersed in nature for remote field work and learning to think along the lines of coding syntax. It wasn’t the typical introduction to R most people have, but it was an effective hook. We were able to produce preliminary figures and analysis as we collected data, which made a tough field season more rewarding. Coding gave us instant results and motivation.

I committed to fully learning how to code during my first year of graduate school. I first learned linux/command line and python, and then I started working in R that following summer. My graduate research uses population genetics/genomics to better understand the migratory connections of humpback whales. This research means I spend a great deal of time working to develop bioinformatics and big data skills, an essential skill for this area of research and a goal for my career. For me, coding is a skill that only returns what you put in; you can learn to code quite quickly, if you devote the time. After a year of intense learning and struggle, I am writing better code every day.

In grad school research progress can be nebulous, but for me coding has become a concrete way to measure success. If my code ran, I have a win for the week. If not, then I have a clear place to start working the next day. These “tiny wins” are adding up, and coding has become a huge confidence boost.

Karen’s Favorite Command: grep (linux) – Searches for a string pattern and prints all lines containing a match to the screen. Grep has a variety of flags making this a versatile command I use every time I’m working in linux.

Advice

Getting Started

  • Be kind to yourself, think of it as a foreign language. It takes a long time and a lot of practice.
  • Once you know the fundamental concepts in any language, learning another will be easier (we promise!).
  • Ask for help! The chances that you have run into a unique error are quite small, someone out there has already solved your problem, whether it’s a lab mate or another researcher you find on Google!

Coding Tips

1. Set yourself up for success by formatting your datasheets properly

  • Instead of making your spreadsheet easy to read, try and think about how you want to use the data in the analysis.
  • Avoid formatting (merged cells, wrap text) and spaces in headers
  • Try to think ahead when formatting your spreadsheet
    • Maybe chat with someone who has experience and get their advice!

2. Start with a plan, start on paper

This low-tech solution saves countless hours of code confusion. It can be especially helpful when manipulating large data frames or in multistep analysis. Drawing out the structure of your data and checking it frequently in your code (with ‘head’ in R/linux) after manipulation can keep you on track. It is easy to code yourself into circles when you don’t have a clear understanding of what you’re trying to do in each step. Or worse, you could end up with code that runs, but doesn’t conduct the analysis you intended, or needed to do.

3. Good organization and habits will get you far

There is an excellent blog by Nice R Code on project organization and file structure. I highly recommend reading and implementing their self-contained scripting suggestions. The further you get into your data analysis the more object, directory, and function names you have to remember. Develop a naming scheme that makes sense for your project (i.e. flexible, number based, etc.) and stick with it. Temporary object names in functions or code blocks can be a good way to clarify what is the code-in-progress or the code result.

Figure 1. An example of project based workflow directory organization from Nice R Code (https://nicercode.github.io/blog/2013-04-05-projects/ )

4. Annotate. Then annotate some more.

Make comments in your code so you can remember what each section or line is for. This makes debugging much easier! Annotation is also a good way to stay on track as you code, because you’ll be describing the goal of every line (remember tip 1?). If you’re following a tutorial (or STACKoverflow answer), copy the web address into your annotation so you can find it later. At the end of a coding session, make a quick note of your thought process so it’s easier to pick up when you come back. It’s also a good habit to add some ‘metadata’ details to the top of your script describing what the script is intended for, what the input files are, the expected outputs, and any other pertinent details for that script. Your future self will thank you!

Figure 2. Example code with comments explaining the purpose of each line.

5. Get with git/github already

Github is a great way to manage version control. Remember how life-changing the advent of dropbox was? This is like that, but for code! It’s also become a great open-source repository for newly developed code and packages. In addition to backing up and storing your code, GitHub has become a ‘coding CV’ that other researchers look to when hiring.

Wondering how to get started with GitHub? Check out this guide: https://guides.github.com/activities/hello-world/

Looking for a good text/code editor? Check out atom (https://atom.io/), you can push your edits straight to git from here.

6. You don’t have to learn everything, but you should probably learn the R Tidyverse ASAP

Tidyverse is a collection of data manipulation packages that make data wrangling a breeze. It also includes ggplot, an incredibly versatile data visualization package. For python users hesitant to start working in R, Tidyverse is a great place to start. The syntax will feel more familiar to python, and it has wonderful documentation online. It’s also similar to the awk/sed tools from linux, as dplyr removes any need to write loops. Loops in any language are awful, learn how to do them, and then how to avoid them.

7. Functions!

Break your code out into blocks that can be run as functions! This allows easier repetition of data analysis, in a more readable format. If you need to call your functions across multiple scripts, put them all into one ‘function.R’ script and source them in your working scripts. This approach ensures that all the scripts can access the same function, without copy and pasting it into multiple scripts. Then if you edit the function, it is changed in one place and passed to all dependent scripts.

8. Don’t take error messages personally

  • Repeat after me: Everyone googles for every other line of code, everyone forgets the command some (….er every) time.
  • Debugging is a lifestyle, not a task item.
  • One way to make it less painful is to keep a list of fixes that you find yourself needing multiple times. And ask for help when you’re stuck!

9. Troubleshooting

  • Know that you’re supposed to google but not sure what?
    • start by copying and pasting the error message
  • When I started it was hard to know how to phrase what I wanted, these might be some common terms
    • A dataframe is the coding equivalent of a spreadsheet/table
    • Do you want to combine two dataframes side by side? That’s a merge
    • Do you want to stack one dataframe on top of another? That’s concatenating
    • Do you want to get the average (or some other statistic) of values in a column that are all from one group or category? Check out group by or aggregate
    • A loop is when you loop through every value in a column or list and do something with it (use it in an equation, use it in an if/else statement, etc).

Favorite Coding Resource (other than github….)

  • Learnxinyminutes.com
    • This is great ‘one stop googling’ for coding in almost any language! I frequently switch between coding languages, and as a result almost always have this open to check syntax.
  • https://swirlstats.com/
    • This is a really good resource for getting an introduction to R

Parting Thoughts

We hope that our stories and advice have been helpful! Like many skills, you tend to only see people once they have made it over the learning curve. But as you’ve read Karen and I both started recently and felt intimidated at the beginning. So, be patient, be kind to yourself, believe in yourself, and good luck!

A few things I’ve learned while writing a thesis

By: Amanda Holdman, MS student, Geospatial Ecology and Marine Megafauna Lab & Oregon State Research Collective for Applied Acoustics, MMI

“Never use the passive where you can use the active.” I recently received this comment in a draft of my thesis. While this pertained to a particular edit, it has since become my motto for writing in general – to stay active in writing. I knew before beginning this process, from my peers, that it takes time to write a thesis or dissertation, and usually much longer than anticipated, resulting in late caffeinated hours. My roommates have recently moved out, making it a perfect opportunity to convert my home into a great evening office. I needed fewer distractions so I unplugged the TV and set up a desk with ideal conditions for writing. I’m in a race against time with my defense set for only a month away, and getting into good writing habits has helped me smooth out a lot of the writing stress, so I figured I could share those tips.

  1. Write sooner

The writing process can be daunting due to its size and importance. In the beginning I tended to wait until I thought I had researched enough about the topic. But, I have now learned not to wait until all the data is in and the results are clear to start writing. Some researchers might argue that results are needed before one can put the proper spin on the introduction, but spin isn’t quite needed for a first draft.  Most of the writing can be actually be done before all the data have arrived. For example, I didn’t need to know the results of my observations before writing the manuscript about them; the rationale for having done the research doesn’t change with the results, so a draft of the introduction can be written without knowing the results. The methodology also doesn’t depend on the results, nor does the analysis that will be performed on the data, so a good framework for the results section can be written before all of the statistical tests are run. And before I know it, I have almost a full draft, just with quite a few gaps.

  1. Write Continually

Productivity begets productivity, so don’t stop writing. It keeps my mind working and my project moving. I try to write a little every day or set a goal word limit. (500 words a day is easily obtainable and you feel proud at the end of the day). Writing as frequently as possible for me has helped to reveal gaps in my knowledge or understanding. Vague and disoriented writing tends to reflect a vague and disorganized thought, leading me to dig through the literature for more clarity.

  1. Figure out how you write and edit

Some people are better writers when they first put their thoughts on paper and plan to go back and fix awkward sentences, poor word choices, or illogical sentences later. My perfection has always plagued me, so I always edit as a write, with one goal only: to make sure I’ve expressed the idea in my head clearly on the page. I don’t move on until the sentence (or thought) makes sense with no ambiguity in the meaning. Clarity of thought is always the aim in writing a manuscript, yet it is very difficult to come back to a section of writing days or weeks later and sort out a mess of thought if I don’t clarify my writing while the thought is still fresh in your head. This means I am constantly re-reading and revising what I’ve written, but also hopefully means that when I submit something to my advisor or committee it only needs simple revisions, thereby saving time by getting as “close to right” as I could the first time around.

 

 

 

  1. Develop a routine

It’s important to learn when and what makes us productive. For me, writing in several short bursts is more efficient than writing in a few, long extended periods. When I try to write for long hours, I notice my concentration diminishing around the hour mark, so I try to take frequent 15 minute breaks. For me, the most productive parts of the day are the beginning the end. It’s important to build momentum early, and have a routine for ending the day too. At the end of each day, I always leave myself something easy to get started with the next day, so I wake up knowing exactly where I am going to start.

  1. Find a template

Usually, when we decide on a date and deadlines for the final draft of our thesis due, we’re so frantic and pressed for time trying to get all the content, that we forget about the time it takes to make a draft pretty. My last HUGE time-saving tip is to find a colleague who has recently turned in their thesis or dissertation and still has their final word document. You can save time by reusing their document as a template for margins, page number position and other formatting guidelines. Everything you’ve written can easily be pasted into a formatted template.

  1. Keep your motivation near

Finally, always try to keep the end result in mind. Whether it be holding a beautifully bound version of your thesis or a first author publication, keeping motivated is important. Publishing is not a requirement for completing a thesis but it is an ultimate goal for me. I know I owe it to myself, the people who I have worked with along the way, those who have supported me in some way (e.g., my committee), and to the funders that have helped pay for the research. Plus, to have a competitive edge in the next job I apply for, and to get the most leverage possible from my masters training, it is important for me to finish strong with a publication or two. Visualizing the end result helps me to take action to finish my thesis and advance my career.

Now, I think it’s about time to stop writing about writing a thesis and get back to actually writing my thesis.