Led by biochemist David Hendrix, Oregon State researchers have developed a computer program that represents a key step toward better understanding the connections between mutant genetic material and disease.
Known as bpRNA, the software is a big-data annotation tool for secondary structures in ribonucleic acids. The paper was published this month in Nucleic Acids Research.
“It’s capable of parsing RNA structures, including complex pseudoknot-containing RNAs, so you end up with an objective, precise, easily-interpretable description of all loops, stems and pseudoknots,” said corresponding author Hendrix, an assistant professor with joint appointments in biochemistry and biophysics and computer science.
“You also get the positions, sequence and flanking base pairs of each structural feature, which enables us to study RNA structure en masse at a large scale.”
RNA works with DNA, the other nucleic acid – so named because they were first discovered in the cell nuclei of living things – to produce the proteins needed throughout the body. DNA contains a person’s hereditary information, and RNA delivers the information’s coded instructions to the protein-manufacturing sites within the cells. Many RNA molecules do not encode a protein, and these are known as noncoding RNAs.
“There are plenty of examples of disease-associated mutations in noncoding RNAs that probably affect their structure, and in order to statistically analyze why those mutations are linked to disease we have to automate the analysis of RNA structure,” said Hendrix. “RNA is one of the fundamental, essential molecules for life, and we need to understand RNAs’ structure to understand how they function.”
Secondary structures are the base-pairing interactions within a single nucleic acid polymer or between two polymers. DNA has mainly fully base-paired double helices, but RNA is single stranded and can form complicated interactions.
Hendrix says bpRNA features the largest and most detailed database to date of secondary RNA structures.
“To be fair it’s a meta-database, but our special sauce is the tool to annotate everything,” said Hendrix.
“Before there was no way of saying where all the structural features were in an automated way. We provide a color-coded map of where everything is. These annotations will enable us to identify statistical trends that may shed light on RNA structure formation and may open the door for machine learning algorithms to predict secondary RNA structure in ways that haven’t been possible.”
Researchers have successfully tested the tool on more than 100,000 structures, “many of which are very complex, with lots of complex pseudoknots.”
“Every day new RNAs are discovered and researchers are making huge progress in understanding their function,” Hendrix said. “We’re starting to appreciate that the genome is full of noncoding RNAs in addition to messenger RNAs, and they’re important biological molecules with big effects on human health and disease.”
Hendrix collaborated with OSU researchers Padideh Danaee, Mason Rouches, Michelle Wiley, Dezhong Deng and Liang Huang.
The National Institutes of Health, the National Science Foundation and the Medical Research Foundation supported this research.