We focused on natural selection as an evolutionary-genetic force during the Tuesday lectures of Weeks 9 and 10. Natural selection is a force that can be quantified by considering relative fitness (w), and the selection coefficient (s), which is more widely used by geneticists. Relative fitness values are generally obtained by comparing some measure of fitness (for example, virus doubling time, or fecundity in an animal species) in a reference genotype (often ‘wild-type’) to that observed in a genetic variant (often ‘mutant’). The selection coefficient can be calculated by simply subtracting the value for w from 1 (s = 1-w), and can be thought of as the ‘percent advantage (or disadvantage)’ of the mutant relative to wild-type. We spent time discussing one molecular test for the effects of natural selection on protein-coding sequences: the Ka/Ks test. Ka is the frequency (or rate) of replacement substitutions per replacement site; Ks is the frequency (or rate) of silent substitution per silent site. This is at its core a very simple test that is also very broadly applicable – all you need is two homologous protein-coding sequences to compare to each other. This test is being widely used today, applied to genome-scale data, to look at the average effects of different selective forces across many gene sequences. The genome-wide distributions of Ka/Ks values for Drosophila orthologs were all very small (most <0.3), suggesting pervasive purifying selection. A similar trend was observed in the gene-duplicate data for six different model organisms, though in this case some of the younger duplicates (low Ks values) showed evidence for positive selection.
We also revisited recombination, from an evolutionary perspective, this week. Ultimately, recombination can be described as a ‘natural selection facilitator’ because it makes natural selection more efficient by selection many different types of genotype combinations to act upon. We next progressed to study relationships between the effective population size (Ne) and genome size and content. This work (like the gene duplicate work) was done by Mike Lynch and John Conery in the early 2000s. Lynch and Conery looked for correlations between Ne and genomic traits such as genome size, numbers and sizes of introns, transposon abundances, etc. They found strong and significant correlations – as Ne decreases as you move from prokaryotes to unicellular eukaryotes to multicellular eukaryotes, genome size increases. Further, intron abundances and lengths increase, and transposon abundances increase. The big (and still somewhat controversial) conclusion from this study is that for species evolving in relatively small Ne (for example, trees and humans), genetic drift is too strong and natural selection is too weak to keep deleterious genomic features (such as transposons and other forms of ‘extra genome space’) out of the genome. In other words, the underlying cause of our remarkably complex and transposon-laden genomes is an inability of natural selection to maintain a ‘tidy and streamlined’ genome similar to those observed in prokaryotes.
On Thursday, we discussed the exciting topic of next-generation DNA sequencing technologies. We focused mostly on the Illumina approach. This method starts with the random fragmentation of your starting target genome (what you want sequenced) into smaller pieces (usually 200-1,000 bp). ‘Adapter’ molecules are then ligated onto the ends of the fragmented genome pieces, a different specific adapter goes on each end of the molecule. These adapters have DNA sequences that are known to the experimenter – these known sequences are essential for the downstream methods, for example by providing known priming sites. Next, adapter-ligated genome pieces are washed over a ‘flow cell’ (= fancy microscope slide) which contains a lawn of oligonucleotides (‘oligos’) bound to the surface that are complimentary to the adapter sequences. The adapter sequences hybridize with the lawn oligos, thereby initiation a ‘bridge amplification’ process involving nearby oligos as primers which results in an amplified ‘cluster’ of DNA molecules (all identical to the molecule that started the amplification process) at a particular spot on the slide. Many, many millions of these clusters are created across the flow cell simultaneously. After cluster generation, sequencing occurs with reversible dye-labeled terminators, similar to the Sanger technology (see textbook section 6.8). However, with Illumina ALL of the nucleotides in the reactions are dye-labeled terminators. In the first ‘cycle’, all four dye-labeled terminator nucleotides (each with a different particular dye attached) are washed over the slide and the appropriate nucleotide incorporates in the first position. Then, a fancy fluorescence microscope scans along the slide and ‘reads’ what dye (= base) is present for each cluster. After the first cycle, an enzyme comes along that cleaves off the dye, making the incorporated nucleotide now free to accept another base in position 2 in the next cycle. This process continues on for up to ~250 cycles, yielding 250-bp reads for many hundreds of millions of different DNA sequences.
We also discussed Pacific Biosciences single-molecule real-time (SMRT) DNA sequencing technology. With this newer approach, longer reads are achieved (~5,000-bp) and sequencing is done without terminators! A DNA polymerase is anchored to the bottom of a sample well and then a single molecule is replicated in the well by the polymerase. The nucleotides used have fluorescent dyes attached to the terminal phosphate; when the nucleotide is incorporated by the polymerase, a fluorescence pulse is emitted which is detected below the plate.