{"id":201,"date":"2020-09-04T00:29:57","date_gmt":"2020-09-04T00:29:57","guid":{"rendered":"https:\/\/blogs.oregonstate.edu\/earthmotes\/?p=201"},"modified":"2021-04-28T21:45:05","modified_gmt":"2021-04-28T21:45:05","slug":"16s-miseq-analysis-tutorial-part-1-nmds-and-environmental-vectors","status":"publish","type":"post","link":"https:\/\/blogs.oregonstate.edu\/earthmotes\/2020\/09\/04\/16s-miseq-analysis-tutorial-part-1-nmds-and-environmental-vectors\/","title":{"rendered":"16S MiSeq Analysis Tutorial Part 1: NMDS and Environmental Vectors"},"content":{"rendered":"\n<p>This tutorial aims to guide the user through a NMDS analysis of 16S abundance data using R, starting with a &#8216;sample x taxa&#8217; distance matrix and corresponding metadata.  <\/p>\n\n\n\n<p>My final code here represents a wide conglomeration of script from previous Colwell Lab graduate students and my own invention.  <\/p>\n\n\n\n<!--more-->\n\n\n\n<h2 class=\"wp-block-heading\">Set Up your Environment<\/h2>\n\n\n\n<p>Load in the necessary libraries and set your working directory.<\/p>\n\n\n\n<pre class=\"wp-block-verse\">library(vegan)\nlibrary(ggplot2)\nlibrary(gridExtra)\nlibrary(grid)\nlibrary(ape)\n\nsetwd(\"Your working directory here\")<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Create and Visualize a Dissimilarity Matrix<\/h2>\n\n\n\n<p>Read in your abundance matrix and metadata, ensuring that row order is the same on the two datasets.<\/p>\n\n\n\n<pre class=\"wp-block-verse\">data &lt;- read.table(\"sv-asv-abund.txt\", header=T, row.names = 1, sep=\"\\t\")\n\n#Rotate to match metadata if necessary:\n#rownames(data) &lt;- factor(rownames(data), levels = rownames(data))\n#data &lt;- as.data.frame(t(data[rowSums(data)!=0, ]))\n\nmetadata &lt;- read.table(\"metadata.txt\", header = T, row.names = 1, sep=\"\\t\")<\/pre>\n\n\n\n<p>After importing your abundance matrix as a &#8216;data&#8217; object, apply the statistic Bray-Curtis Dissimilarity to quantify the dissimilarity of taxa counts at each site\/sample, then convert to a PCoA object.<\/p>\n\n\n\n<pre class=\"wp-block-verse\">dist &lt;- vegdist(data, \"bray\")\npcoa &lt;- pcoa(dist)<\/pre>\n\n\n\n<p>Visualize the first ten principal components to identify principal component contribution to data variability.  <\/p>\n\n\n\n<pre class=\"wp-block-verse\">barplot(pcoa$values$Relative_eig[1:10], names = paste ('PCoA, 1:10'), ylab = 'eigenvalues')<\/pre>\n\n\n\n<p>Plot the principal components to identify any data outliers.  Any extreme outliers may be considered for removal in downstream analyses due to their effect on NMDS.<\/p>\n\n\n\n<pre class=\"wp-block-verse\">biplot.pcoa(pcoa)<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">NMDS Distance and Selection of Dimensionality<\/h2>\n\n\n\n<p>Visualize the dimensionality of your dissimilarity distance matrix by creating a scree plot in real time (kinda fun).  A scree plot will show the eigenvlaues of your principal components.  At the &#8220;elbow&#8221; of the plot, choose the number of dimensions your dataset posses (usually between 2 and 4). <\/p>\n\n\n\n<pre class=\"wp-block-verse\">NMDS.scree &lt;- function(x) { #where x is the name of the data frame variable\nplot(rep(1, 10), replicate(10, metaMDS(x, autotransform = F, k = 1)$stress),\nxlim = c(1, 10),ylim = c(0, 0.30), xlab = \"# of Dimensions\", ylab = \"Stress\",\nmain = \"NMDS stress plot\")\nfor (i in 1:10) {\npoints(rep(i + 1,10),replicate(10, metaMDS(x, autotransform = F, k = i + 1)$stress))\n}\n}\nNMDS.scree(dist)<\/pre>\n\n\n\n<p>The dimensional value you choose from the scree plot will be the &#8216;k&#8217; value for your ordination.  Now, preform an ordination using your Bray-Curtis dissimilarity distance matrix.  <strong>It is crucial you set the seed for reproducibility<\/strong> and record the stress of your NMDS found in the output of &#8216;ord&#8217; for future reference.  Stress greater than 0.2 is incredibly suspect, while &gt; 0.3 shows complete nonsensical relationships.  Acceptable stress is &lt; 0.2, with ideal stress &lt; 0.1. <\/p>\n\n\n\n<pre class=\"wp-block-verse\">set.seed(2)\nord &lt;- metaMDS(dist, k = 2, trymax = 100, trace = FALSE)\nord<\/pre>\n\n\n\n<p>Visualize the stress using a Shepard scatter plot.  You should expect a linear or non-metric fit close to .90 or greater.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Addition of Environmental Variables<\/h2>\n\n\n\n<p>Now, identify correlations between your environmental\/metadata and data shape.  This will create vectors overlaid on the NMDS plot to show biogeochemical relevance to the taxa distance matrix.<\/p>\n\n\n\n<pre class=\"wp-block-verse\">env_fit &lt;- envfit(ord$points, metadata, perm=999)\nenv_fit_df &lt;- as.data.frame(env_fit$vectors$arrows*sqrt(env_fit$vectors$r))\nenv_fit_df$vector &lt;- rownames(env_fit_df)\nenv_fit<\/pre>\n\n\n\n<p>Extract the scaling scores and sample names from your NMDS ordination and convert to a data frame.<\/p>\n\n\n\n<pre class=\"wp-block-verse\">data.scores &lt;- as.data.frame(scores(ord))\ndata.scores$sample &lt;- rownames(data.scores)<\/pre>\n\n\n\n<p>Add any information from your metadata table to the score dataframe.  &#8220;Pond&#8221; in this example refers to the geographic location of the samples.  <\/p>\n\n\n\n<pre class=\"wp-block-verse\">data.scores &lt;- data.scores[order(data.scores$sample),]\ndata.scores$Pond &lt;- metadata$pond<\/pre>\n\n\n\n<p>Clean up the score dataframe strings (make it nice and pretty).<\/p>\n\n\n\n<pre class=\"wp-block-verse\">data.scores &lt;- as.data.frame(data.scores, stringsAsFactors = TRUE)\ndata.scores$sample &lt;- factor(data.scores$sample, levels = data.scores$sample)<\/pre>\n\n\n\n<p>In addition to environmental data correlation, find any dispersion in the dissimilarity distance matrix (same as the one used for NMDS). <\/p>\n\n\n\n<pre class=\"wp-block-verse\">dist &lt;- vegdist(data, method=\"bray\")\ndisper_pond &lt;- betadisper(dist, data.scores$Pond)\ndisper_pond<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Visualization of NMDS<\/h2>\n\n\n\n<p>The moment we&#8217;ve all been waiting for!  This will create an NMDS plot containing environmental vectors and ellipses showing significance based on NMDS groupings.  Tweak away to create the NMDS of your dreams.<\/p>\n\n\n\n<pre class=\"wp-block-verse\">p.nmds &lt;- ggplot(data = data.scores, aes(x = NMDS1, y = NMDS2, fill=Pond)) +\ngeom_point(aes(), colour=\"black\", pch=21, size = 3) +\nxlab(paste0(\"NMDS1\")) +\nylab(paste0(\"NMDS2\")) +\nstat_ellipse(aes(x = NMDS1, y = NMDS2, color = Pond), level = 0.05) +\ncoord_fixed() +\ngeom_segment(data=env_fit_df, aes(x=0, xend=env_fit_df$MDS1, y=0, yend=env_fit_df$MDS2), inherit.aes = FALSE,\narrow = arrow(length = unit(0.25, \"cm\")), colour = \"grey\") +\ntheme_linedraw(base_size = 18) +\ntheme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),\nstrip.text = element_text(face = \"bold\")) +\ntheme(text=element_text(size = 20)) +\ngeom_text(data=env_fit_df, aes(x=env_fit_df$MDS1, y=env_fit_df$MDS2, label=vector),\ninherit.aes = FALSE, size=3) +\nggtitle(\"NMDS of ____\")\nprint(p.nmds)<\/pre>\n\n\n\n<p>And of course, to wrap up the part 1 of this analysis, finish with PERMANOVA comparisons to identify significance based on environmental variables.<\/p>\n\n\n\n<pre class=\"wp-block-verse\">perm_pond &lt;- adonis(data ~ pond, data=metadata, permutations = 10000, method = \"bray\")\nperm_pond$aov.tab<\/pre>\n\n\n\n<p>Analysis of similarities (ANOSIM):<\/p>\n\n\n\n<pre class=\"wp-block-verse\">anosim_pond &lt;- anosim(dist, metadata$pond, permutations = 999, distance = \"bray\")\nanosim_pond$signif<\/pre>\n\n\n\n<p>And finally, similarity percentages (SIMPER) for identifying the driving variable in a community matrix.  In this example, &#8220;6.4&#8221; and &#8220;16.8&#8221; are distances in km from proglacial lake, and &#8220;rhm&#8221; and &#8220;ts&#8221; are ponds.<\/p>\n\n\n\n<pre class=\"wp-block-verse\">sim &lt;- simper(comm = data, group = metadata$glacialpath, permutations = 100)\n\nsim.rhm.ts &lt;- as.data.frame(sim$<code>6.4_16.8<\/code>$species)\nnames(sim.rhm.ts)[1] &lt;- \"taxa\"\nsim.rhm.ts$cumsum &lt;- as.data.frame(sim$<code>6.4_16.8<\/code>$cusum)\nnames(sim.rhm.ts)[2] &lt;- \"cumsum\"<\/pre>\n\n\n\n<figure class=\"wp-block-pullquote\"><blockquote><p>Up next: Centroid Visualization, Alpha Diversity Analyses, Canonical Analyses of Principal Components, Relative Abundance Visualization, and Phylogenetic Analysis<\/p><\/blockquote><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>This tutorial aims to guide the user through a NMDS analysis of 16S abundance data using R, starting with a &#8216;sample x taxa&#8217; distance matrix and corresponding metadata. My final code here represents a wide conglomeration of script from previous Colwell Lab graduate students and my own invention.<\/p>\n","protected":false},"author":10317,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[],"class_list":["post-201","post","type-post","status-publish","format-standard","hentry","category-tutorials"],"_links":{"self":[{"href":"https:\/\/blogs.oregonstate.edu\/earthmotes\/wp-json\/wp\/v2\/posts\/201","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.oregonstate.edu\/earthmotes\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.oregonstate.edu\/earthmotes\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.oregonstate.edu\/earthmotes\/wp-json\/wp\/v2\/users\/10317"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.oregonstate.edu\/earthmotes\/wp-json\/wp\/v2\/comments?post=201"}],"version-history":[{"count":14,"href":"https:\/\/blogs.oregonstate.edu\/earthmotes\/wp-json\/wp\/v2\/posts\/201\/revisions"}],"predecessor-version":[{"id":323,"href":"https:\/\/blogs.oregonstate.edu\/earthmotes\/wp-json\/wp\/v2\/posts\/201\/revisions\/323"}],"wp:attachment":[{"href":"https:\/\/blogs.oregonstate.edu\/earthmotes\/wp-json\/wp\/v2\/media?parent=201"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.oregonstate.edu\/earthmotes\/wp-json\/wp\/v2\/categories?post=201"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.oregonstate.edu\/earthmotes\/wp-json\/wp\/v2\/tags?post=201"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}