NOVEMBER 12, 2020

Photo courtesy of The Corvallis Advocate

From The Corvallis Advocate: “Oregon State University brought its TRACE Community COVID-19 testing program to Eugene, sending three-member teams – one OSU student, one UO student and one professional –to city neighborhoods to collect nasal-swab samples from hundreds of residents and sewage samples from around Eugene and Springfield. This will further expand TRACE’s coverage, which includes five similar sweeps in Corvallis, as well as some study in Bend, Hermiston and Newport. TRACE will be working in tandem with UO’s Monitoring and Assessment Program (MAP).” See the full article for more information.

Fall Term 2020
September 30, 2020Liang Huang
Oregon State University
Fighting COVID-19 with RNA Folding and RNA Design
October 14, 2020Mak Saito
Woods Hole Oceanographic Institution
Host: Steve Giovannoni
Exploring the use of metals in biogeochemically important enzymes in the oceans, and development of the Biogeochemical AUV Clio and the Ocean Protein Portal
October 28, 2020Scott Doney
University of Virginia
Host: Kim Halsey
Developing models of marine planktonic systems
November 18, 2020A. Murat Eren
University of Chicago
Host: Maude David and Steve Giovannoni
High-resolution insights into the genomic dynamism of closely related gut microbial populations in unrelated humans
December 2, 2020Katherine Amato
Northwestern University
Host: Tom Sharpton
A case for comparative research: Using primates to gain insight into the human gut microbiome
Winter Term 2021
January 20, 2021Maria Clara Franco (Maca)
Oregon State University
Host: Michael Freitag
The relevance of oxidatively-modified proteins as therapeutic tumor-directed targets
February 3, 2021Bruce Hungate
Northern Arizona University
Host: David Myrold
Frontiers in ecosystem science: microbial ecology to biogeochemistry
February 17, 2021Yuanchao Wang
Nanjing Agricultural University
Host: Brett Tyler
The story of XEG1: From a core effector to broad spectrum resistance
March 3, 2021Francesca Marassi
Sanford Burnham Prebys Medical Discovery Institute
Host: Elisar Barbar
Spring Term 2021
March 31, 2021Martin Egan
University of Arkansas
Host: Weihong Qiu
Forging the Rings of Power – Formation and remodeling of higher-order septin structures for plant invasion by the blast fungus
April 14, 2021Clare Bird
University of Stirling
Host: Jennifer Fehrenbacher
The microbiomes of single cells
April 28, 2021Zachary Lippman
Cold Spring Harbor Laboratory
Host: Steven Strauss
Dissecting and exploiting mechanisms of quantitative trait variation in pl
May 12, 2021Xiangshu Xiao
Oregon Health & Science University
Host: Siva Kolluri
Cancer Drug Discovery Targeting Transcription and DNA Repair
May 26, 2021Peter Ralph
University of Oregon
Host: Aaron Liston

Keynote Speaker

Charisse Madlock-Brown
University of Tennessee Health Science Center

Charisse Madlock-Brown is a faculty member in Health Informatics and Information Management at the University of Tennessee Health Science Center. She received her Master’s in Library Science and Ph.D. in Health Informatics from the University of Iowa. She has expertise in data management, data mining, and visualization. She has a broad background in health informatics, with a current focus on obesity trends and multimorbidity. Other areas of interest are network analysis and emerging topic detection in biomedicine. She has authored several book chapters and journal articles and continues to keep up-to-date on data integration, data architecture, database management, and analytic methods. She runs the UTHSC Research Pipelines labs, which provide online interfaces for distributed computing and storage systems. Her lab can manage projects from data extraction and transformation to modeling and visualization for small-scale and big data projects. 

Introductions from DSPG Leaders
1:00 PMIntroduction to the Oregon State University Date Science for the Public Good ProjectBrett Tyler, Director, Center for Quantitative Life Sciences, Oregon State University
1:05 PMTri-state Data Science for the Public Good ProjectSallie Keller, Director, Social and Decision Analytics Division, University of Virginia’s Biocomplexity Institute
1:10 PM Data Science Knowledge and Resources for Extension ProfessionalsLindsey Shirley, Associate Provost, Oregon State University Outreach and Engagement
Keynote Speaker
1:15 PM Social determinants of health related to COVID-19: disparities between urban and rural communities– Charisse Madlock-Brown, University of Tennessee Health Science Center
1:45 PMBreak
Presentations
2:00 PMMeasuring Economic and Social Infrastructure: Intergenerational Poverty in Page County, VA
2:15 PMWintertime air quality health impacts in Oakridge and Westfir
2:30 PMMapping Iowa’s Substance Use Care Infrastructure
2:45 PMImpacts of dam water release policy on Deschutes River health and recreation
3:00 PMBarriers to Health Care Access and Use in Patrick County, VA
3:15 PMForecasting tools for cost analysis of water and wastewater facilities in Oregon small towns and cities
3:30 PMEconomic Mobility Baseline and Comparative Analysis for the South Wasco County School District Area, Oregon
3:45 PMRegulatory impacts on economic development in the Eastern Oregon border region
4:00 PMWater quality requirements for fresh produce growers

DSPG is a coalition of five public universities across three states: Oregon State University, Iowa State University, Virginia Tech, University of Virginia, and Virginia State University.

Brent Kronmiller, Edward Davis, David Hendrix, Thomas Sharpton, Clinton Epps, Pankaj Jaiswal, Stephen Ramsey et al – Pan-tissue transcriptome analysis of long noncoding RNAs in the American beaver Castor canadensis

Another great term of the CGRB’s Bioinformatics User Group (BUG) is in the books!

This term we had a wide range of presenters—graduate students to Principle Investigators. It was nice to get the perspective of folks who are in different parts of their careers.

A special thanks to all of our presenters:

Sept 25: Christopher Sullivan and Ken Lett (Center for Genome Research & Biocomputing)

  • Title: CGRB’s new DFS for one and all!, i.e., Don’t know what a Distributed File System is? Come find out!
  • Abstract: The CGRB works with researchers to provide the most robust computational infrastructure available today. Many group rely on file services at the heart of their research computing needs and the CGRB has worked for over 2 decades to provide redundant high speed file services.  Over the years users have grown to expect the best solution at a very cheap price. Because of this model the CGRB spends a great deal of time evaluating the available systems to ensure we always have the best at the lowest price. In the past year the CGRB has worked to evaluate and purchase new file service hardware that will replace our existing setups. We will be explaining the pathway taken to bring the new service online and some of the new exciting features.

Oct 9: Lillian Padgitt-Cobb (David Hendrix Lab, Biochemistry & Biophysics)

  • Title: A phased, diploid assembly of the hop (Humulus lupulus) genome reveals patterns of selection and haplotype variation, i.e., Resolving functional and evolutionary mysteries of a large, complex plant genome with genomic data science
  • Abstract: Hop (Humulus lupulus) is a plant valued for its use in brewing and traditional medicine. Efforts to determine how biosynthetic pathways in hop are regulated have been challenged by its complex genomic landscape. The diploid hop genome is large, repetitive, and heterozygous, which challenged early attempts at sequencing with short-reads. Advances in long-read sequencing have improved detection of repeats and heterozygous regions, revealing that the genome is nearly 78% repetitive. For our assembly, PacBio long-read sequences were assembled with FALCON and phased into haplotype assemblies with FALCON-Unzip. Using the phased, diploid assembly to assess haplotype variation, we discovered genes under positive selection enriched for stress-response, growth, and flowering functions. Comparative analysis of haplotypes provides insight into large-scale structural variation and the selective pressures that have driven hop evolution. The approaches we developed to analyze the phased, diploid assembly of hop have broader applicability to the study of other large, complex genomes.
  • Lillian’s GitHubhttps://github.com/padgittl/CascadeHopAssembly
  • Hop Genome Browserhttp://hopbase.org/

Oct 23: Kelly Vining (Kelly Vining Lab, Horticulture)

  • Title: R/qtl, i.e., Applications and methods for analysis of quantitative traits
  • Abstract: R/qtl is an R package that is used for genetic mapping and marker-trait association. This presentation will explore specific features of R/qtl applied to plant breeding populations. Data types, functions, and interpretation of results will be explored.

Nov 6: Ed Davis (Center for Genome Research & Biocomputing)

  • Title: Introductory microbiome analysis using phyloseq, i.e., How to generate exploratory diversity plots and what they mean
  • Abstract: Generating high quality, publication ready figures for a microbiome study can be somewhat difficult. An understanding of both the statistical tests and how to effectively use R to produce figures is required, so the learning curve can be somewhat steep. Fortunately, there are several easy-to-use packages in R that facilitate the analysis of microbiome studies using 16S amplicon data, including the phyloseq package that will be the focus of my talk. I will cover the basics of analyzing alpha and beta diversity and provide some code and example images to show how to generate publication ready figures starting from the base phyloseq output. I will also generate some exploratory charts and graphs such that one would be able to form and later test hypotheses using microbiome data. I will be happy to share the examples and code as well, so that I might catalyze the analysis of your own microbiome studies.
  • Follow up blog post: https://tips.cgrb.oregonstate.edu/posts/phyloseq-bug-meeting-presentation-fall-2019/

Nov 20:  Cedar Warman (John Fowler Lab, Botany & Plant Pathology)

  • Title: High-throughput maize ear phenotyping with a custom-built scanner and machine learning seed detection, i.e., Computer counts corn, correctly.
  • Abstract: Near-incomprehensible amounts of maize are produced each year, but our understanding of the dominant North American crop is fundamentally incomplete. Of particular interest is the seed-producing structure of maize, the ear. Here, we present a novel maize ear phenotyping system. Our system captures a video of a rotating ear, which is subsequently flattened into a projection of the ear’s surface. Seed positions and genetic markers can be quantified manually from this projection. To increase throughput, we applied deep learning-based computer vision approaches to seed and marker quantification. Our progress towards a completely automated phenotyping system will be described, in addition to challenges we continue to face adapting computer vision technology to maize ears.
  • Links from Cedar’s presentation:
  • Movie flattening: github.com/fowler-lab-osu/flatten_all_videos_in_pwd
  • Seed distribution analysis: github.com/vischulisem/Maize_Scanner
  • Also here’s a preprint describing the scanner: https://www.biorxiv.org/content/10.1101/780650v2

Dec 4: Christina Mulch (Kelly Vining Lab, Horticulture)

  • Title: IsoSeq pooling and HiSeq multiplexing comparison for Rubus occidentalis samples to explore Aphid resistance, i.e., Utilizing RNA to find differences between Aphid Resistant and Susceptible plants.
  • Abstract: Black raspberry (Rubus occidentalis L.) is a small specialty crop produced primarily in the Pacific Northwest of the U.S. A major challenge for its success is Black raspberry necrosis virus vectored by the Large Raspberry Aphid (Amphorophora agathonica A.). We used Pacific Biosciences IsoSeq long read sequencing technology to study the gene expression patterns in leaves following aphid inoculation. We collected samples from a segregating population for resistance to the pest. High quality RNA was extracted from 20 samples, 10 resistant (R) and 10 susceptible (S) using a modified RNA extraction protocol. Data processing was preformed using the IsoSeq3 pipeline. Alignment of each R and S pool to the latest chromosome level black raspberry reference genome used minimap2 according to recommended options for IsoSeq. Reads were filtered based on mapping quality, alignment length, and presence or absence in multiple samples. This study seeks to reveal the genetic underpinning of aphid resistance with the ultimate goal of enabling marker assisted selection.

Thank you for attending and we look forward to seeing you in 2020!

All of the slots for winter 2020 are full, but please contact us if you’re interested in presenting in the future.

Aaron Trippe discusses the changes and challenges of working with the PacBio Sequel since 2016. He discusses improvements in the technology since 2016 and has advice for user who would like to utilize this service.

Aaron Trippe, our long-time PacBio technician, stands next to the CGRB’s Pacific Biosciences Sequel.

Q1: How long have you been running the PacBio sequencing service at the CGRB?

The CGRB was one of the early adopters of the Sequel, the second phase of long read genomic sequencing technology from Pacific Biosciences.  It arrived here on campus in August of 2016.  Since then the technology has made significant improvements to the user-interface, and has tremendously increased read lengths and output. 

Q2: You started up the PacBio sequencing service at the CGRB. What has been the most challenging aspect about developing this service?

Aside from the continually changing and evolving technology, one of the most challenging aspects of the service is getting everything you feed the machine to produce optimal results.  One of the advantages of the technology is that you are sequencing native DNA, but that also makes it challenging when working with an organism that traditionally is difficult to work with and considered problematic.  Finding ways to produce super clean and high molecular weight DNA from just about everything is probably the largest hurdle to working with the technology as a service provider.  The keys to success are definitely within the sample quality.  Having pure, high molecular weight DNA is essential to take advantage of the long read aspect of the technology, and is directly correlated to the quality of the sequencing output.

Q3: What type(s) of project(s) would you recommend to use PacBio’s long read technology?

The technology is great for just about any sequencing application.  With the long reads, you have access to regions of DNA that were not previously accessible due to repetitive regions in genomic DNA.  There is enough output to multiplex several microbial genomes on a single SMRT Cell.  Complete sequences of multiplexed amplicons using Circular Consensus Sequencing for high fidelity reads of shorter inserts. With the read lengths exceeding that of RNA transcripts, Isoform sequencing using the Iso-seq application is also available for obtaining complete transcripts.

Q4: Favorite or most interesting project you’ve worked on?

Since managing the PacBio Sequel, I’ve gotten to work with plants, animals (vertebrates/invertebrates), fungi, bacteria, and insects for the local scientific community, and beyond.  I can’t say that I have had a favorite organism, and they have all been interesting projects, but overcoming challenges with successful results always feels rewarding.

For more information please visit the CGRB website: https://cgrb.oregonstate.edu/core/pacbio

Note: We wish Aaron the best as he purses a new opportunity and are grateful he was able to develop a successful PacBio Service at the CGRB! For future sequencing inquires please contact Katie Carter.

Close up of a PacBio SMRT cell.

Friday, September 20, 2019 CH2M HILL Alumni Center

Congratulations to our winners! To all the participants and presenters, another thank you for your contributions. We look forward to welcoming the CGRB Community to the 2020 Spring Conference on April 24, 2020.

Undergraduate Poster: Kelsey Shimoda “Altering metabolic gene expression in fruit flies: effects on longevity and brain aging”
Graduate Poster:             Benjamin Americus “Elegant Infection Machines: Nematocyst diversification within Myxozoa”
                                           Daniel Schneck “Phenotypic and transcriptional responses to different light regimes in allopatric populations of Tigriopus californicus”
                                            Manoj Gurung “Lactobacilli ameliorate western diet induced diabetes by preventing hepatic mitochondrial damage”
                                            Miranda Leek “A Biochemical and Biophysical Investigation into the Pathological Gain-of-Function of Nitrated Hsp90”
                                            Rebecca Veitch “Exposure to Light at Night Alters DNA Methylation and Expression of Proliferative and DNA Damage Repair Genes”
Post-Doc Poster :               Allie Graham “Independent losses of the Hypoxia-Inducible Factor (HIF) Pathway within Crustacea”
Lightning Talks:                 Eileen Chow “Daily blue light exposure accelerates aging in Drosophila melanogaster”

Raffle drawing for voting in all categories:             1st place – Miranda Leek
                                                                                       2nd place – Alexandre Sathler

8:00Registration & refreshments (Poster & sponsor setup) 
8:50Brett Tyler, Director, CGRB
Introduction, CGRB update
9:15Andrew Annalora, Environmental and Molecular Toxicology
Exploring Splice Variant Biology in Nuclear Receptor and Cytochrome P450 Genes
9:40Ed Kelly, University of Washington
Tissue Chips for Human Disease Modeling
10:20Break
10:50Lightning Talks Moderated by Jeff Anderson
Featuring: Martin Pearce, Stephanie Bollmann, Benjamin Americus, Evan Carpenter, Eileen Chow, Lauren Chan, Nolan Newman, Alexandra Weisberg
11:35Felipe Barreto, Integrative Biology
Genomics in the Tidepool: Functional and Population Genetics of Adaptation and Speciation in a Tiny Crustacean
12:00Kevin Brown, Pharmaceutical Sciences and Chemical, Biological, and Environmental Engineering
Adventures in Complex Systems
12:25Lunch
1:25Afua Nyarko, Biochemistry & Biophysics
Selectivity and Specificity in Cancer Regulatory Proteins
1:50Daniel Liefwalker, Oregon Health and Science University 
Therapeutic strategies targeting c-MYC
2:30Lightning Talks
Moderated by Jeff Anderson
Featuring: Armando Alcazar Magana, Christine Tataru, Sarah Alto, Anh Ha, Heather Forsythe, Kayla Jara, Rebecca France, Rachel Franklin
3:15Break
3:45Morgan Giers, Chemical, Biological, and Environmental Engineering
Regenerating the Intervertebral Disc: Developing Effective Therapies in a Nutrient Limited Environment  
4:10Doris Taylor, Texas Heart Institute
Building Solutions for Heart Disease: A 2019 Update
5:00-7:30Poster Session Reception, Sponsor Displays

THANK YOU TO OUR 2019 FALL CONFERENCE COMMITTEE:

Jaga Giebultowicz, Department of Integrative Biology
Craig Marcus, Environmental and Molecular Toxicology
Jeff Anderson, Department of Botany and Plant Pathology
Viviana Perez, Department of Biochemistry and Biophysics

Fall Term 2019
October 9, 2019Marilyn Roossinck
The Pennsylvania State University
Lessons in Virus Ecology from Forty Years of Research
Host: Jerri Bartholomew
October 23, 2019Ran Blekhman
The University of Minnesota
Population and Functional Genomics of Host-Microbiome Interactions
Host: Tom Sharpton
November 6, 2019Mark Farman
UC Davis
Telomeric transposons: major drivers of fungal genome evolution and guards against genome change
Host: Michael Freitag
December 4, 2019Carolina Tropini
The University of British Columbia
Physical perturbations to the gut microbiota during health and disease
Host: Natalia Shulzhenko
Winter Term 2020
January 8, 2020Chris Hittinger
University of Wisconsin-Madison
Host: Joey Spatafora
Genomic and metabolic evolution across budding yeasts
January 22, 2020Audrey Gasch
University of Wisconsin-Madison
Host: Michael Freitag
The genetic basis of aneuploidy tolerance in wild yeast
February 19, 2020Jill Banfield
UC Berkeley
Host: Steve Giovannoni
March 4, 2020Josh Cuperus
University of Washington
Host: Molly Megraw, John Fowler
Spring Term 2020
April 1, 2020Pankaj Kapahi
USC Leonard Davis
Host: Jaga Giebultowicz
April 15, 2020Jose Dinneny
Stanford University
Host: John Fowler
May 13, 2020TBD
May 27, 2020TBD

The CGRB will be offering three different workshops this fall. For more information and to register, see the CGRB website.

All workshops are available for credit for students or available to non-students as non-credit workshop(s).

To give perspective students a better insight on each course, we’ve conducted short interviews with the instructors about their course.

See course descriptions and the interviews with the instructors below!


Courses Offered:

Introduction to Unix/Linux and Command-Line Data Analysis (2 modules x 5 weeks @ 2 hrs per week)

Instructor: Matthew Peterson

Course Description:

Introduction to Unix/Linux (5 weeks @ 2 hrs per week)

Logistics:
Date & Time:
Sep 25 – Oct 23, Mon/Wed 2:00pm – 2:50pm
For credit: BDS599 CRN 20579
Workshop Cost: $250


This module introduces the natural environment of bioinformatics: the Linux command line. Material will cover logging into remote machines, filesystem organization and file manipulation, and installing and using software (including examples such as HMMER, BLAST, and MUSCLE). Finally, we introduce the CGRB research infrastructure (including submitting batch jobs) and concepts for data analysis on the command line with tools such as grep and wc.

Command-Line Data Analysis (5 weeks @ 2 hrs per week)

Logistics:
Date & Time: Nov 4  – Dec 4, Mon/Wed, 2:00pm – 2:50pm
For credit: BDS599 CRN 20580
Workshop Cost: $250


The Linux command-line environment has long been used for analyzing text-based and scientific data, and there are a large number of tools pre-installed for data analysis. These can be chained together to form powerful pipelines. Material will cover these and related tools (including grep, sort, awk, sed, etc.) driven by examples of biological data in a problem-solving context that introduces programmatic thinking. This module also covers regular expressions, a useful syntax for matching and substituting string and sequence data.

Matthew Peterson in the CGRB server room

Q1: What do you hope students gain from this workshop?

My hope is that students come to appreciate the power and flexibility of using the text-based command-line interface to interact with (Linux) computational infrastructures. With practice students will become self-sufficient in utilizing the infrastructure to conduct their own research.

Q2: Favorite topic in your course?

Pipelines! The ability to chain the inputs and outputs of multiple commands to filter data is immensely powerful.

Q3: Who should register for this course?

From the first page of the course syllabus: “Linux/Unix Commands, Bioinformatics Utilities, Computational Infrastructure: If you know nothing about the above, then you are exactly in the right course! WELCOME!”

Q4: Advice for users new to bioinformatics and/or programming?

Practice, practice, practice! Learning how to use the command-line effectively is like making a clay pot, you need to get your hands dirty!


RNA-Sequencing (10 weeks @ 2 hrs per week)

Instructor: Dr. Andrew Black

Course Description:

Logistics:
Date & Time: Sept 25 – Dec 5, Tue/Thur 11:00am – 11:50am
For credit: BDS 599, CRN 20581
Workshop Cost: $500

This course provides an introduction to, and practical experience with, the computational component of bulk-RNA-sequencing. After a general overview, participants will obtain a working introduction to command line, R-studio, and accessing and utilizing a computing infrastructure. Students with then work through a series of exercises cleaning raw FASTQ files, aligning reads to a reference genome, quasi-mapping reads to a transcriptome / de novo assembly, followed by data visualization and Differential Gene Expression analysis.

Dr. Andrew Black will teach the RNA-seq workshop this term.

Q1: What do you hope students gain from this workshop?

I hope that students gain an understanding of the computational workflow involved with RNA-seq and an appreciation of the methodology! My overarching goal with this course is that people can use material from this course as scaffolding for analyzing their own data on the CGRB infrastructure.

Q2: Favorite topic in your course?

I added a lord of the rings theme to my course; students are looking for differentially expressed genes between hobbits and golems. I’m a dork, I know, but I had fun spiking different genes into the data and enjoy having students visualize this.

Q3: Who should register for this course?

Graduate students, postdocs, faculty, or anyone outside of OSU that are interested in receiving an introduction to RNA-seq or for those that are needing to learn the workflow for their own project(s).

Q4: Advice for users new to bioinformatics and/or programming?

Take it one step at a time and get comfortable with several commands before expanding your scope. Also, record your commands / code in a text document, because if you aren’t using it on a daily basis, you’ll forget it!


Data Programming in R (6 weeks @ 3 hrs per week)

Instructor: Dr. Shawn O’Neil

Logistics:
Date & Time: Sept. 25 – Nov. 6, Mon/Weds/Fri 9:00am – 9:50am
For credit: ST 599, CRN 17196
Workshop Cost: $500


The R programming language is widely used for the analysis of statistical data sets. This course introduces the language from a computer science perspective, covering topics such as basic data types (e.g. integers, numerics, characters, vectors, lists, matrices, and data frames), importing and manipulating data (in particular, vector and data-frame indexing), control flow (loops, conditionals, and functions), and good practices for producing readable, reusable, and efficient R code. We’ll also explore functional programming concepts and the powerful data manipulation and visualization packages dplyr and tidyr, and ggplot2.

Q1: What do you hope students gain from this workshop?

I really hope that students gain an appreciation for programming as a creative activity. It’s not just a means to an end, even with a statistical language like R; there’s a lot of room for play and exploration. Simulation, for example, is a great way to explore complex systems and ask ‘what if’ questions. Many languages (including R) support programmatic drawing and data visualization which can be quite fun.

Q2: Favorite topic in your course?

I always enjoy the point when we first start scaling analyses to thousands of statistical tests. It’s an eye-opening moment, and doing so in R introduces ‘functional programming,’ a powerful and increasingly important paradigm for software design. 

Q3: Who should register for this course?

Anyone who is interested in doing data analysis, especially of a statistical sort. For those interested in learning programming in a broader sense, our winter Intro to Python series is an excellent overview of fundamental concepts. Although we cover the same topics in the R course, R organizes its features differently than most mainstream programming languages like Python, Java, and C++. Learning both Python and R provides a solid foundation for data science!

Q4: Advice for users new to bioinformatics and/or programming?

I do recommend learning more than one programming language, eventually, as this helps separate deeper concepts from syntax. Find what motivates you and explore it via programming — this could be your primary research project, some field you’ve been wanting to learn more about, or even a hobby.