Mapmaking: Part 3

In the first two parts of this series, I introduced Lightroom, the Lightroom plugins LR/Transporter and FTP Publisher, and the programming languages AWK and R. With those tools, I organized my photos and got some of their metadata into a format that I can easily manipulate with R code.

After getting the photo information organized, I had a few more pieces of metadata to get together. In particular, I wanted to organize the map based on the taxonomy of the corals, and I wanted to include some information about the site of collection that wasn’t included in my sample metadata file. We are keeping this information in separate files, for a couple of reasons. Over the course of the project, multiple people have collected replicates of the same species of coral in different locations. Every time we collect a coral, we need to fill in a line of data in the sample metadata table. Right now, we have 57 columns in that table, meaning we have to manually fill in 57 pieces of information for each sample. On a whirlwind trip where we collect 50 samples, that adds up quickly to 2850 values, or 2850 opportunities to make a typo or some other error.

If any two columns in our table are highly repetitive and are dependent on each other, we should be able to allow the computer to fill one in based on the other. For example, we could create seven columns in the sample metadata file that detail each sample’s species, genus, family, order, phylogenetic clade, NCBI taxonomy ID number, and perhaps some published physiological data. However, all of these pieces of information are dependent on the first value: the species of coral sampled. If we collect the same species, say, Porites lobata, 25 times throughout the project, all the information associated with that species is going to be repeated again and again in our metadata sheet. However, if instead we create a single column in our sample metadata table for the species ID, we can then create a separate table for all the other information, with only one row per species. We cut down on the amount of manual data entry we have to do by 144 values for that species alone!* Not only does that save time; it helps to avoid errors. The same general principle applies to each site we’ve visited: certain values are consistent and prone to repetition and error, such as various scales of geographical information, measurements of water temperature and visibility, and locally relevant collaborators. So we created another table for ‘sites’. **

Excerpt from 'species' metadata table
genus_speciesgenusspeciesfamilycladeTAXON_IDNCBI_blast_name
Tubastrea coccineaTubastreacoccineaDendrophyllidaeII46700stony corals
Turbinaria reniformisTurbinariareniformisDendrophyllidaeII1381352stony corals
Porites astreoidesPoritesastreoidesPoritidaeIII104758stony corals
Acropora palmataAcroporapalmataAcroporidaeVI6131stony corals
Pavona maldivensisPavonamaldivensisAgaricidaeVII1387077stony corals
Herpolitha limaxHerpolithalimaxFungiidaeXI371667stony corals
Diploastrea helioporaDiploastreahelioporaDiploastreidaeXV214969stony corals
Symphyllia erythraeaSymphylliaerythraeaLobophyllidaeXIX1328287stony corals
Heliopora coeruleaHelioporacoeruleaHelioporaceaeOutgroup86515blue corals
Stylaster roseousStylasterroseousStylasteridaeOutgroup520406stony corals
Excerpt from 'sites' metadata table
reef_namedatereef_typesite_namecountrycollected_byrelevant_collaboratorsvisibility
Big Vickie20140728Midshelf inshore reefLizard IslandAustraliaRyan McMindsDavid Bourne, Katia Nicolet, Kathy Morrow, and many others at JCU, AIMS, and LIRS12
Horseshoe20140731Midshelf inshore reefLizard IslandAustraliaRyan McMindsDavid Bourne, Katia Nicolet, Kathy Morrow, and many others at JCU, AIMS, and LIRS15
Al Fahal20150311Offshore reefKAUST House ReefsSaudi ArabiaRyan McMinds, Jesse ZaneveldChris Voolstra, Maren Ziegler, Anna Roik, and many others at KAUSTUnknown
Far Flats20150630Fringing ReefLord Howe IslandAustraliaJoe Pollock15
Raffles Lighthouse20150723Inshore ReefSingaporeSingaporeJesse Zaneveld, Monica MedinaDanwei Huang4.5
Trou d'Eau20150817Lagoon Patch ReefReunion WestFranceRyan McMinds, Amelia Foster, Jerome PayetLe Club de Plongee Suwan Macha, Jean-Pascal Quod10
LTER_1_Fringing20151109Fringing ReefMooreaFrench PolynesiaRyan McMinds, Becky Vega Thurberthe Burkepile Lab>35

Thus, after loading and processing the sample and photo metadata files as in the last post, I needed to load these two extra files and merge them with our sample table. This is almost trivial, using commands that are essentially in English:

sites <- read.table('sites_metadata_file.txt',header=T,sep='\t',quote="\"")
data <- merge(samples,sites)
species_data <- read.table('species_metadata_file.txt',header=T,sep='\t',quote="\"")
data <- merge(data,species_data)

And we now have a fully expanded table.

A couple of commands are needed to account for empty values that are awaiting completion when we get the time:

data$relevant_collaborators[is.na(data$relevant_collaborators)] <- 'many collaborators'
data$photo_name[is.na(data$photo_name)] <- 'no_image'

These commands subset the table to just rows that had empty values for collaborators and photos, and assign to the subset a consistent and useful value. Empty collaborator cells aren’t accurate – we’ve gotten lots of help everywhere we’ve gone, and just haven’t pulled all the information from all the teams together yet! As for samples without images, I created a default image with the filename ‘no_image.jpg’ and uploaded it to the server as a stand-in.

Default image shown when a sample has no pictures.

Default image shown when a sample has no pictures.

Now I need to introduce the R package that I used to build my map: Leaflet for R. Leaflet is actually an extensive Javascript package, but the R wrapper makes it convenient to integrate my data. The package allows considerable control of the map within R, but the final product can be saved as an HTML file that sources the online Javascript libraries. Once it’s created, I just upload it to our webpage and direct you there!

Note that although I usually use R from the Terminal, it’s very convenient to use the application RStudio with this package, because you can see the product progress as it’s built, and then easily export it at the end.

To make my map more interesting, I took advantage of the fact that each marker on the Leaflet map can have a popup with its own arbitrary HTML-coded content. Thus, for each sample I integrated all my selected metadata into an organized graphical format. The potential uses for this are exciting to me; it means I could put more markers on the map, with tables, charts, interactive media, or lots of other things that can be specified with HTML. For now, though, I decided I wanted the popups to look like this, with just some organized text, links, and a photo:



So, I wrote the HTML and then used R’s paste0() function to plug in the sample-specific data in between HTML strings.

data$html <- paste0('300px; overflow:auto;">',
'<div width="100%" style="clear:both;">',
'<p>',
'<a href="https://www.flickr.com/search/?text=GCMP%20AND%20',data$genus_species,'"target="_blank">',data$genus_species,'</a>: ',
'<a href="https://www.flickr.com/search/?text=',gsub('.','',data$sample_name,fixed=T),'"target="_blank">',data$sample_name,'</a>',
'</p>',
'</div>',
'<div width="100%" style="float:left;clear:both;">',
'<img src="http://files.cgrb.oregonstate.edu/Thurber_Lab/GCMP/photos/sample_photos/processed/small/',data$photo_title,'.jpg" width="50%" style="float:left;">',
'<div width="50%" style="float:left; margin-left:10px; max-width:140px;">',
'Site: <a href="https://www.flickr.com/search/?text=GCMP%20AND%20',data$reef_name,'" target="_blank">',data$reef_name,'</a>',
'<p>Date: <a href="https://www.flickr.com/search/?text=GCMP%20AND%20',data$date,'"target="_blank">',data$date,'</a></p>',
'<p>Country: <a href="https://www.flickr.com/search/?text=GCMP%20AND%20',data$country,'"target="_blank">',data$country,'</a></p>',
'</div>',
'</div>',
'<div width="100%" style="float:left;">',
'<p>',
'Collected by <a href="https://www.flickr.com/search/?text=GCMP%20AND%20(',gsub(', ','%20OR%20',data$collected_by,fixed=T),')"target="_blank">',data$collected_by,'</a>',
' with the help of ',data$relevant_collaborators,'.',
'</p>',
'</div>',
'<div style="clear:both;"></div>',
'</div>')

Yeesh! I hate HTML. It definitely makes it uglier having to build the code within an R function, but hey, it works. If you want, we can go over that rat’s nest in more detail another time, but for now, the basics: I’ve created another column in our sample metadata table (data$html) that contains a unique string of HTML code on each row. In blue, I create a container for the first line of the popup, which contains the species name and sample name, stitched together into a link to their photos on Flickr. In orange, I paste together a source call to the sample’s photo on our server. In green, I create a container with metadata information (and links to all photos associated with that metadata on Flickr), which sits next to the image. And in purple, I stitch together some text and links to acknowledge the people who worked to collect that particular sample. Looking at that code right now, I’m marveling at how much nicer it looks now that I’ve cleaned it up for presentation…

And now that I’ve gotten all the metadata together and prepared the popups, the only thing left to do is create the map itself. However, I’ll leave that for just one more post in the series.


*math not thoroughly verified.

**edit: My father points out that we are essentially building a relational database of our metadata. In fact, I did initially intend to do that explicitly by loading these separate tables into a MySQL database. For now, however, our data isn’t all that complex or extensive, and separate tables that can be merged with simple R or Python code are working just fine. I’m sure someday we will return to a discussion of databases, but that day is not today.

Print Friendly
  • Dan McMinds

    You’re building a relational database from scratch. Not all data needs to be in every table, you relate the tables to each other using a factor that is common to all data, i.e. species. MS Access is a fairly simple program that does this easily but, to get the most out of the program you need to code in SQL. Access has another problem, it doesn’t play well with others, when more than one person is trying to use the same table it doesn’t behave well. SQL Server has no such limits, any number of people may use the same table.

    • Yes, actually, when I first began splitting our table up, my intent was to load them into a MySQL database. We still might do that in the future, but for now, we only have about 6 tables, and it seems to be easier to save them as tab-delimited text, so nobody has to learn anything new, they’re infinitely shareable, and we can open them in Excel to do quick edits and searches visually. I might edit the post to make it clear that I’m aware that that’s basically what we’re doing 😉