Week 9 & 10 Reflections

We focused on natural selection as an evolutionary-genetic force during the Tuesday lectures of Weeks 9 and 10. Natural selection is a force that can be quantified by considering relative fitness (w), and the selection coefficient (s), which is more widely used by geneticists. Relative fitness values are generally obtained by comparing some measure of fitness (for example, virus doubling time, or fecundity in an animal species) in a reference genotype (often ‘wild-type’) to that observed in a genetic variant (often ‘mutant’). The selection coefficient can be calculated by simply subtracting the value for w from 1 (s = 1-w), and can be thought of as the ‘percent advantage (or disadvantage)’ of the mutant relative to wild-type. We spent time discussing one molecular test for the effects of natural selection on protein-coding sequences: the Ka/Ks test. Ka is the frequency (or rate) of replacement substitutions per replacement site; Ks is the frequency (or rate) of silent substitution per silent site. This is at its core a very simple test that is also very broadly applicable – all you need is two homologous protein-coding sequences to compare to each other. This test is being widely used today, applied to genome-scale data, to look at the average effects of different selective forces across many gene sequences. The genome-wide distributions of Ka/Ks values for Drosophila orthologs were all very small (most <0.3), suggesting pervasive purifying selection. A similar trend was observed in the gene-duplicate data for six different model organisms, though in this case some of the younger duplicates (low Ks values) showed evidence for positive selection.

We also revisited recombination, from an evolutionary perspective, this week. Ultimately, recombination can be described as a ‘natural selection facilitator’ because it makes natural selection more efficient by selection many different types of genotype combinations to act upon. We next progressed to study relationships between the effective population size (Ne) and genome size and content. This work (like the gene duplicate work) was done by Mike Lynch and John Conery in the early 2000s. Lynch and Conery looked for correlations between Ne and genomic traits such as genome size, numbers and sizes of introns, transposon abundances, etc. They found strong and significant correlations – as Ne decreases as you move from prokaryotes to unicellular eukaryotes to multicellular eukaryotes, genome size increases. Further, intron abundances and lengths increase, and transposon abundances increase. The big (and still somewhat controversial) conclusion from this study is that for species evolving in relatively small Ne (for example, trees and humans), genetic drift is too strong and natural selection is too weak to keep deleterious genomic features (such as transposons and other forms of ‘extra genome space’) out of the genome. In other words, the underlying cause of our remarkably complex and transposon-laden genomes is an inability of natural selection to maintain a ‘tidy and streamlined’ genome similar to those observed in prokaryotes.

On Thursday, we discussed the exciting topic of next-generation DNA sequencing technologies. We focused mostly on the Illumina approach. This method starts with the random fragmentation of your starting target genome (what you want sequenced) into smaller pieces (usually 200-1,000 bp). ‘Adapter’ molecules are then ligated onto the ends of the fragmented genome pieces, a different specific adapter goes on each end of the molecule. These adapters have DNA sequences that are known to the experimenter – these known sequences are essential for the downstream methods, for example by providing known priming sites. Next, adapter-ligated genome pieces are washed over a ‘flow cell’ (= fancy microscope slide) which contains a lawn of oligonucleotides (‘oligos’) bound to the surface that are complimentary to the adapter sequences. The adapter sequences hybridize with the lawn oligos, thereby initiation a ‘bridge amplification’ process involving nearby oligos as primers which results in an amplified ‘cluster’ of DNA molecules (all identical to the molecule that started the amplification process) at a particular spot on the slide. Many, many millions of these clusters are created across the flow cell simultaneously. After cluster generation, sequencing occurs with reversible dye-labeled terminators, similar to the Sanger technology (see textbook section 6.8). However, with Illumina ALL of the nucleotides in the reactions are dye-labeled terminators. In the first ‘cycle’, all four dye-labeled terminator nucleotides (each with a different particular dye attached) are washed over the slide and the appropriate nucleotide incorporates in the first position. Then, a fancy fluorescence microscope scans along the slide and ‘reads’ what dye (= base) is present for each cluster. After the first cycle, an enzyme comes along that cleaves off the dye, making the incorporated nucleotide now free to accept another base in position 2 in the next cycle. This process continues on for up to ~250 cycles, yielding 250-bp reads for many hundreds of millions of different DNA sequences.

We also discussed Pacific Biosciences single-molecule real-time (SMRT) DNA sequencing technology. With this newer approach, longer reads are achieved (~5,000-bp) and sequencing is done without terminators! A DNA polymerase is anchored to the bottom of a sample well and then a single molecule is replicated in the well by the polymerase. The nucleotides used have fluorescent dyes attached to the terminal phosphate; when the nucleotide is incorporated by the polymerase, a fluorescence pulse is emitted which is detected below the plate.

Quiz #5 Answers

Correct answers in Bold.

#1. (5 points) When population size is large and mutation rates are low (~10-8), as an evolutionary force mutation is generally characterized as:

A. Strong

B. Weak

C. Balancing

D. Beneficial


#2. (5 points) You sequenced a RNA polymerase gene from Caenorhabditis elegans, and a RNA polymerase gene from its sister species Caenorhabditis briggsae. You performed a Ka/Ks analysis on these genes and the calculated value was 0.005. What kind of selection is mostly likely influencing the evolution of these genes?

A. Negative (or, purifying) selection

B. Positive (or, directional) selection

C. Balancing selection

D. Neutrality (no selection)


#3. You are studying the population genetics of Mendel’s pea plants. You remember that they are diploid, and that the G allele is fully dominant and results in yellow peas; the g allele is recessive and results in green peas (when homozygous, of course). Upon visiting a field, you discover 200 pea plants. 160 of the plants produce yellow peas, and 40 of the plants produce green peas. Assume this population is in Hardy-Weinberg equilibrium.


Part A. (3 points) What is the genotype frequency for gg plants?

40/200 = 0.20


Part B. (3 points) What is the allele frequency for g?

q = 0.447


Part C. (4 points) What is the expected number of heterozygous plants in this population?

98.9 (or round to 99) plants

Quiz #4 Answers

correct answers in bold

#1 (5 points) How can the abundance and function of a gene’s products (RNA, protein) be regulated inside the cell?

A. At the level of transcription, synthesis of a RNA molecule from coding DNA

B. At the level of translation, synthesis of a protein product from a template mRNA

C. RNA stability and structure, post-transcriptional regulation of RNA

D. All of the above

#2 (5 points) You have a lacIS (‘super repressor’) mutant of coli. The LacI protein of this mutant cannot bind lactose, but can bind the operator. The lac operon structural genes (lacZ, lacY, lacA) will be expressed:

A. Only when there is lactose in the growth media.

B. Only when there is not lactose in the growth media

C. Always, regardless of whether or not there is lactose in the growth media

D. Never, regardless of whether or not there is lactose in the growth media


#3 (5 points) You are a population geneticist studying a diploid species of salamander. The gene B influences salamander coloration. Two alleles of this gene exist, allele B (dominant) results in yellow salamanders, allele b (recessive) results in brown salamanders. The frequency of allele B is 0.07. What is the frequency of the b allele?

A. 0.93

B. 0.14

C. 0.07

D. 0.01


#4. (5 points) In which of the following populations is genetic drift expected to be strongest, in terms of its effects on change in allele frequencies over time?


A. 10,000 rabbits

B. 1,000 rabbits

C. 100 rabbits

D. 10 rabbits

Week 8 Reflections

We started the week discussing gene regulation – how cellular systems regulate the amounts of protein products (RNA, protein). The famous lac operon system was introduced; this set of genes in E. coli provided the foundation for much of our understanding of regulation at the transcriptional level. We then turned to discuss the similar gal system in yeast as a model for eukaryotes before considering some post-transcriptional forms of regulation such as alternative splicing and RNAi. On Thursday we turned to population and evolutionary genetics. Much of the discussion centered on the Hardy-Weinberg Principle of population genetics which provides important avenues for calculating genotype frequencies and allele frequencies. The class discussed some of the key underlying assumptions of the H-W Principle as well as two major implications of that principle for population genetic processes. Lecture material transitioned to a bit about evolutionary genetics with special emphasis on the forces of evolution. Mutation, although the most fundamental of the forces that provides the key ‘variation substrate’ for other evolutionary forces to act upon, is also a very weak evolutionary force because it is not able to cause rapid change in allele frequencies (unless population sizes are very, very small). Genetic drift was also discussed – the role of ‘sampling error’ of gametes from one generation to the next. When population sizes are small, there is a high probability that one particular allele might be completely lost (or completely fixed) in the population simply because that allele happened not to be ‘sampled’ purely due to chance from one generation to the next. An analogy used to help make drift more understandable is flipping a coin – with one million coin flips, you are very likely to end up with something very close to 50% heads and 50% tails (and almost certainly heads and tails getting sampled at least once) whereas with a much smaller number of coin flips (six, for example) there is a much greater chance of not getting a ‘50/50’ result and not either getting heads or tails at all in the series of coin flips.

We also discussed Muller’s Ratchet – a theoretical evolutionary concept not covered in the book. The idea behind the ratchet is that if you have a small isolated population that reproduces asexually, you might expect to lose the most-fit class of individual in that population (individual with fewest deleterious alleles) by drift. On top of this, because the population is small, mutation is a stronger force – individuals are accumulating new deleterious mutations on top of this. With no input of genetic variation from migration and no recombination, such populations would be expected to over time become less and less fit (individuals harboring increasing numbers of deleterious alleles) until it reaches extinction. How optimistic! This might seem rather unrealistic, but the theory has served as a model for helping evolutionary geneticists understand why other evolutionary forces (such as recombination) came about. Also, there are some asexual genetic components of endangered populations (e.g., mitochondrial DNA) that might be subject to the ratchet. Why ‘ratchet’? The idea here is that every time the population gets worse (loses most-fit individual) due to drift, this is a turn of the ratchet toward extinction.


Week 9 Sneak Peek: We will finish up Thursday’s lecture material, and then continue our population/evolutionary-genetic unit by discussing natural selection – Darwin’s key premises and deduction leading to his theory, and the many different sub-types of selection that are possible. We will focus on one test for the effects of natural selection on protein-coding sequences. I will also be posting some supplemental reading on this topic – it is a primary research article that I will be covering. Reading this paper is not absolutely required, but might help you understand things a bit better. Remember: no class on Thursday, and no recitation all week!

Answers to HW #8

Chapter 9:

9.3: (a) One mutation that might cause the constitutive phenotype is a mutation in an operator region of the enzyme-coding gene that makes it insensitive to repression. The second is a mutation that impairs the structure of the repressor. Such mutations can range from deletion of the repressor gene to subtler mutations that impair binding of the repressor to arginine, to the DNA, or to both. (b) A mutation in an arginine biosynthetic enzyme, not sufficient to cause a requirement for arginine but enough to reduce the amount of arginine in the cell, could activate a regulatory response by a normal regulatory system and induce constitutive synthesis.

9.6: The mutant gene should bind more of the activator protein, or have more efficient binding of the activator. Thus, the mutant gene should be induced with lower levels of activator protein or expressed at higher levels, compared to wild-type.

9.10: (a) The phenotype of cells with mutant Gal4p would be non-inducible; the mutant gene would be recessive because the wild-type Gal4p would still function normally. (b) The phenotype of cells with mutant Gal80p would be constitutive; the mutant gene would be recessive because wildtype Gal80p would bind Gal4p in the normal way.

9.18: (a): Yes, the repressor is functional, and the presence of lactose activates transcription of the lac genes; (b) and (d): Yes, at 42C, the repressor cannot bind the operator, which means that the lac operon is transcribed whether or not the inducer is present. (c): No, at this temperature, the repressor functions normally – because lactose is absent, the lac operon is in a repressed state.

Chapter 14:

14.1. The frequency of A1 equals 0.35.

14.2. The expected frequency of A2 in the next generation is the same as its frequency in the current generation: 0.65.

14.4. (A) no; (B) yes.

14.8. The frequency of homozygous recessives is q2 = 0.16; this implies that q=0.4. The frequency of the dominant allele is therefore p=0.6.

14.9. Two issues need to be considered. First, recessive alleles are maintained in heterozygous individuals and so are not exposed to selection. Second, new mutations in each generation replenish the number eliminated by selection in the homozygous recessives.

14.16. The numbers of alleles are as follows: A, 8+10+2 = 20; B, 10+48+20=78; C, 20+20+2 = 42. The total number of alleles is 140, so the allele frequencies are as follows: A, 20/140 = 0.14; B, 78/140 = 0.56; C, 42/140 = 0.30. The expected numbers among the 70 plants are:

AA: 1.42, AB: 11.14, BB: 21.72, BC: 23.40, CC: 6.30, AC: 6.00.

Answers to HW #7

8.4: MALISASY (in single-letter amino acid code)


8.9: 5’-AUU = Isoleucine at amino terminus of protein; UUA-3’ = Leucine at carboxy terminus


8.10: This encodes the alternating polypeptide Cys-Val


8.18: There are three possible reading frames, each encoding a different repeating polymer. One has a repeating Val (GUC), one has a repeating Ser (UCG), and one has a repeating Arg (CGU).

Answers for questions from last year’s Midterm #2


D, dosage compensation


D, 0.0001


Cyclins tether specific target proteins and bring them to the complex; CDKs phosophorylate those target proteins to change their structure/function.


(0.10 x 0.005) x 10,000 = 5; coeff. coinc. = 2/5 = 0.4

i = [1-(0.4)] = 0.6