Biochemist and biophysicist David Hendrix and collaborators in the College of Science and College of Engineering have used deep learning to decipher which ribonucleic acids have the potential to encode proteins.
The gated recurrent neural network Hendrix and team developed is an important step toward better understanding RNA, one of life’s fundamental, essential molecules.
Unlocking the mysteries of RNA means knowing its connections to human health and disease.
Deep learning, a type of machine-learning not based on task-specific algorithms, is a powerful tool for solving the puzzle.
“Deep learning may seem scary to some people, but at the end of the day, it’s just crunching numbers,” said Hendrix, the study’s lead author, who has joint appointments in biochemistry/biophysics and computer science. “It’s a tool just like calculus or linear algebra, one that we can use to learn biological patterns. The amount of sequencing data we have now is huge, and deep learning is well suited to face the challenges associated with the vast amount of data and to learn new biological rules that characterize the function of these molecules.”
RNA is transcribed from DNA, the other nucleic acid – so named because they were first discovered in the cell nuclei of living things – to produce the proteins needed throughout the body.
DNA contains a person’s hereditary information, and RNA acts as the messenger that delivers the information’s coded instructions to the protein-manufacturing sites within the cells.
Some RNAs are functional molecules transcribed from DNA that aren’t translated into proteins. These are known as non-coding RNAs.
Every day, new RNAs are discovered, and gene sequencing technology has advanced to the point that molecular biologists are facing a “torrent” of new transcript annotations to glean information from, Hendrix said. “These vast datasets require new approaches.”