Data science is a strategic opportunity for the College of Science. By making strategic investments in mathematics, statistics and life sciences faculty, the College has extended its impact of data science on transdisciplinary research. In a science-without-borders approach, the College is deepening engagement between data science and other sciences, engineering, education, arts and business. Cluster hiring in bioinformatics across disciplines has brought expertise in mathematical biology; ecological, evolutionary, and functional properties of the microbiome; and deep sequencing data.
Read more about data science in the College of Science in our iMPACT magazine.
Charlotte Wickham, Statistics
“Our visual system is one of the fastest ways for us to consume information. The goal of my research is to harness this strength, not only to help scientists make discoveries, but also to engage and communicate with the public at large.
“The object of visualization is very often not raw data. Particularly in the era of big data, summarization or modeling is an essential precursor to making sense of the data. Visualization becomes crucial to understanding how decisions at this stage propagate to conclusions and good visualization tools encourage experimentation with alternate approaches. We have methods for propagating statistical uncertainty through a data pipeline, but we are still learning how to best communicate uncertainty visually.
“There are interesting technical challenges along the way. For example, where should the data live? Can analyses be run on the fly, or do they require lengthy distributed computing? Can an approximate answer be achieved in a quicker manner? Is an approximate answer good enough for visualization purposes? Answering these questions requires close collaboration between computer scientists, statisticians and domain experts.”
Bringing data science to the non-data scientist. Wickham recently won first place in an international competition sponsored by EMC2 and hosted by Crowdanalytix . The contest was designed to visually reveal insights into the differences between a professional and amateur motorcycle rider based on data collected at the millisecond level from sensors on the bike, engine and rider during six laps of racing. Simply separating the data into laps posed a data exploration challenge. The iteration between data preparation and visualization was the key to separating the interesting from the uninteresting data.
Duo Jiang, Statistics
“My research aims at developing statistical and computational methods to address challenges posed by the growing amount, dimensionality and complexity of data in biological and biomedical research. A recent focus has been on correlated data methods in genetic association studies, functional enrichment analysis and biological network inference.
“Through interdisciplinary research and collaborations, I hope to make statistical innovations that not only provide improved data analysis, but also enable new ways of leveraging data to answer biological questions and transform study design considerations for researchers at OSU and in the broader scientific community.”
Debashis Mondal, Statistics
Mondal focuses on research applications in agriculture, geographical epidemiology and environmental sciences.
“Advances in the field of spatial statistics are important because they can be used to answer scientific questions in agriculture, astronomy, biomedical imaging, computer vision, climate and environmental sciences, epidemiology and geology.
“I seek to enhance scientific understanding of environmental bioassays, arsenic contamination of groundwater and geographic variations in cancer risk. My statistical and computational work addresses questions relevant to environmental or global change and to health studies. I am also interested in Markov chain Monte Carlo computations, time series, ranking and selection and random graphs and trees.”
Sharmodeep Bhattacharyya, Statistics
“I work on developing statistical methods for network and high-dimensional data. Large network data sets are currently becoming quite common in several scientific fields from biological to social sciences. My work is focused on networks and high-dimensional data related to large scale -omics studies, neuroscience studies and social interaction studies.
“The development of statistical methods to analyze large-scale data coming from several different experimental sources helps our understanding of complex systems, such as human brain, which has so far remained highly elusive.”
Davide Lazzati, Physics
Lazzati’s research is focused on understanding the physics of cosmic dust and gamma-ray bursts—the brightest and most mysterious explosions in the present day universe. He also studies theoretical high-energy astrophysics, quantum chemistry, soft condensed matter and numerical methods. He was among the first to realize the importance of time dependent effects in the interaction of the burst radiation with interstellar material.
Patrick De Leenheer, Mathematics and Integrative Biology – Bioinformatics hire
De Leenheer’s research interests include mathematical biology, differential equations and control theory. He brings extensive experience in developing instructional and scholarly bridges between mathematicians and biologists. Prior to joining OSU, he was on the mathematics faculty at the University of Florida for nearly 10 years.
De Leenheer earned a master of science electro-mechanical engineering and a Ph.D. in applied sciences from Ghent University in Belgium.
David Hendrix, Biochemistry & Biophysics – Bioinformatics hire
Hendrix’s lab focuses on understanding the structure, function and mechanisms of action of non-coding RNAs. Since the discovery of numerous non-coding RNAs in the past decade, their function is still largely unknown. Hendrix uses structure prediction, genome-wide sequence analysis and deep sequencing data to explore the roles these molecules play in gene regulation. His team also develops algorithms to understand different areas of computational biology.
Thomas Sharpton, Microbiology and Statistics – Bioinformatics hire
Sharpton is developing the quantitative biology curricula and is teaching courses in bioinformatics and microbial genomics. His research team focuses on characterizing the ecological, evolutionary, and functional properties of the microbiome—the vast collection of microorganisms that live on our bodies.
The team seeks to better understand how the physiologies of our body and our microbiome interact. Their work is interdisciplinary, relying heavily on microbiology, bioinformatics and systems biology, and borrowing from molecular biology, computer science, and statistics.
David Koslicki, Mathematics – Bioinformatics hire
“My research is mainly data-driven as I primarily develop new mathematical techniques to answer biological questions in genomics. Studying metagenomics in particular, I routinely analyze DNA sequencing data with sizes ranging from 10’s of gigabytes to 10’s of terabytes. Thankfully, Oregon State is well equipped to facilitate analyzing this sort of data, particularly with the Center for Genomics Research and Biocomputing.
“The recent discoveries regarding the human microbiome make it an exciting time to be at the interface of biology, mathematics, and computer science.”
Koslicki’s research focuses on bioinformatics and the application of tools from the mathematical theory of symbolic dynamical systems to problems in genomics. He is currently interested in problems stemming from the field of metagenomics: the study of bacterial communities through their sampled DNA. He uses a variety of big data techniques, including compressed sensing, probabilistic data structures, and high-performance computing.
Joe Beckman, Biochemistry and Biophysics
“Researchers increasing collaborate across OSU and around the world to better understand what we are exposed to in everyday life, what the cellular actions of these exposures are and how we respond biochemically to these exposures. This involves measuring thousands of chemicals, tens of thousands of genes that are changing, and hundreds of thousands of biochemical molecules.
“The integration and management of these data has become a major challenge as has learning how to make the result comprehensible to the public and to decision makers.”
Juan Restrepo, Mathematics
Restrepo’s research is focused on uncertainty quantification, ocean dynamics, climate, oil/pollution transport and acoustics. He has worked on bio-related homeland security work as a visiting professor at Los Alamos National Laboratory, bone dynamics, voting theory as well as climate dynamics research.
“Elucidating whether a present or future extreme event has low probability, and/or is the result of a changing world is fundamental to developing risk analyses. Finding ways to improve the chances of a fast and cheap recovery after a disaster (rather than of avoiding it) is of great social interest. Producing better predictions from complex dynamic models by combining data and models, taking into account their inherent uncertainties, has high practical engineering and scientific impact.
“The two aspects that distinguish our research, which focuses on extremely high-dimensional problems, are 1) we work with time dependent processes, in which classical equilibrium notions are not applicable, and 2) we work with processes that generate outcomes which are not simply characterized by their mean and their variance.
“My group combines data/observations and methods from probability and statistics, statistical physics, machine learning, and dynamics in order to propose new methods for answering questions in climate, ocean processes, disaster recovery and resilience in natural and man-made systems.
Benjamin Dalziel, Mathematics and Integrative Biology
Dalziel is a population biologist working at the interface of theory and data. He uses mathematical models to uncover causal connections among different types of times-series data, including high-resolution data on animal movement patterns, population density, and the incidence of infectious disease.
“I want to know how populations work: Why do epidemics of infectious diseases happen more often in some cities than others? In addition, what leads migratory animals to “flock” over long distances each year, and how does this affect their vulnerability in a changing world?
“To me, data science is about integrating diverse sources of information–such as environmental measurements, behavior and genetic data–to predict how complex adaptive systems like a group of interacting animals will respond. This is part of a systems–based approach to understanding nature, and it’s made possible by recent increases in the volume and quality of data available.
“But big data is noisy, and a challenge now is how to develop rigorous approaches for extracting “signals” from the all the noise. This isn’t the statistics you learned in school – it’s new, and it’s a bit wild. In a way, data science is about approaching wilderness – that which defies the mind’s attempts at appropriation, as the poet Don McKay says.”