Feed from the scientific network: the digital library of a millennial student

Solène Derville, Entropie Lab, Institute of Research for Development, Nouméa, New Caledonia (Ph.D. student under the co-supervision of Dr. Leigh Torres)

If you are a follower of our blog, you may have noticed that bioinformatics and statistics hold a very important role in the everyday life of the GEMM Lab. As good-old field observations remain essential to the study of animal behaviour and ecosystems, the ecology field has greatly benefited from advances in information technologies. In fact, data analysis is now a discipline in itself, as innovative solutions must continuously be developed to cope with the challenges of ever increasing dataset size and complexity.

communications-jpg-800x600_q96Artist’s impression of a complex network. ©iStock.com/Vertigo3d

So how does a poor biology student find her/his way in this digital and mathematical world? Most ecology departments will provide classes to learn the basics of statistical modelling and data analysis, but there is only so much you can learn through formal education. In practice, we ultimately always run into a problem, an exception that we have never heard of, and we have to figure it out on our own. As my initial training was in fundamental biology, self-teaching of other disciplines (statistics and bioinformatics) has taken a lot of my time as a Master’s student and now as a PhD student. This has made me feel lonely and a bit lost at times when I run into challenges that always seemed too big for me. But in the end, there is nothing more rewarding then solving problems by yourself after long hours of mind-scrambling.

Oh, sorry, did I say by myself? Nothing could be more wrong and more true at the same time! Because the place where I find all the answers to my questions, is in fact born from the contribution of thousands of scientists, which, despite not actually knowing each other, all work together to develop innovative solutions to modern world scientific challenges. The internet scientific network has been my best colleague over these past years and here I would like to share my enthusiasm for some of its best features that have helped me in my research.

If you look at my Firefox toolbar you will find two types of websites: let’s call them the “practical” and the “reflectional”.

The practical websites:

These are the websites I consult if I have a specific and practical question. Many forums exist where people exchange their experiences solving a great variety of problems. But sometimes conversations get lost in never-ending exchanges of opinions, some of which are not always scientifically well-founded. On the contrary, the StackExchange platform launched in 2009 has a strict policy on how questions should be asked (as precise and focused as possible) and should be answered (in an objective, opinion-free way). This makes it a very powerful tool to find quick and practical solutions to your everyday problems. This platform includes 136 different websites, each dedicated to a different topic. In my field, I mostly use: CrossValidated for statistical issues (e.g., Why does including latitude and longitude in a GAM account for spatial autocorrelation?) and StackOverflow for programming (e.g., plotting pie graphs on map in ggplot).

The latter will usually provide you with codes in the programming language of your choice (R, python, java, sql, etc.). Interestingly, even with more queries regarding Python to StackOverflow in 2015, R was the fastest-growing language between 2013 and 2015 on this same platform. If you haven’t decided on the language you want to “speak” yet, check out this fun infographic. But always remember that these tools keep evolving

4a9d355949d9cb77f8128dd517395405Academia can also be useful for questions regarding publications. For instance: How to reference multiple authors of a chapter from a book [APA]? Why might a journal editor reject a submission, but suggest submission to a sister journal? Or, how to best kill a manuscript as a peer reviewer?

And finally, if you’ve always wondered, “Why don’t we remove door handles and let doors open both ways (inwards, outwards)?, you’ll be pleased to know that other out-of-the-box-thinking people are sharing their opinion on the web…

Coming back to serious matters, it is important to recognize that you need the right key-word to access this gold-mine of website knowledge and sharing. The accuracy of your search answer will only be proportional to the quality of your question. In R for instance, if you keep googling “table” instead of “dataframe”, “list” instead of “vector”, or “size” instead of “dimensions”, you will likely get quickly drowned in the google-limbo. One way to be more efficient at your search strategy is to make sure you know your basics. Most of the programming languages used in ecology (e.g., R, Python, Matlab) share a similar vocabulary and structure, but before you start to run all sorts of crazy statistical analysis it is important to know what types of objects you are working with and how you want to format them. In R, I have found Hadley Wickham’s book, Advanced R, particularly useful to understand what happens back-stage.

Another good reference in the spatial ecology field is ZevRoss “Technical Tidbits From Spatial Analysis & Data Science. This website is a particularly up-to-date blog for data processing and visualization in R.

More generally, I regularly check R-bloggers or simply the Comprehensive R Archive Network. A note on the latter: I know it doesn’t look pretty and the reference manuals for R packages are rather intimidating but it is still the number one reference to check when encountering a problem with a given function. Some authors make a special effort to write more user-friendly tutorials to their packages. Check for those by looking at the CRAN page of a given package, in the “downloads” section, “vignettes” subsection (e.g., for the adehabitatLT package vignette).

4f5429df5ea6361fa8d3f08dfcdccdf9

 The reflectional websites:

The web is also an amazing media to reflect on our scientific practices, learn about current ecological theories, and acquire general knowledge across disciplines. In the scientific network, many blogs and forums exist where scientists can converse and debate ideas without the pressure of publication requirements. As a student trying to find my way in the great world of statistical modelling, I find these discussions and blogposts most useful to put my methodological choices in perspective and progressively build myself an opinion (still rather vague I’ll admit). Some of my most recent findings are: Dynamic Ecology Multa novit vulpes and From the bottom of the heap, the musings of a geographer. I am sure each of you has your own “rock star of the web”, so please share your favorite sites with us in the comments below.

Science not longer needs to wait for publication to be shared between peers and with the general public. The web offers us a new space to communicate, not only on that small part of our work that led to positive results, but also our negative results, frustrations and failures, which can at times be as informative and useful to the scientific community than our successes. So, wherever you stand, tell us about your ideas, and tell us about the challenges you have encountered, where you failed and where you succeeded. Because, this is what ecology is all about. Sharing knowledge across borders and cultures to understand the planet we live on and together take better care of it.