How Deep Learning is Transforming Bioinformatics

Author

Bioinformatics

Sign up for our newsletter

We care about the protection of your data. Read our Privacy Policy.

Introduction

I was introduced to bioinformatics in 2008 at the start of grad school. I’d transitioned from astrophysics, where I was fortunate enough to work in a lab curious about dry, distant asteroids, to biology. Not only did I switch fields, but I also switched data. I moved from nice, clean asteroid orbits to biology—a field defined by incomplete and uncertain data. The biological world is small and numerous, and data about it comes from instruments that stretch the limits of our detection. This means biological data analysis—bioinformatics—must handle big datasets where each individual data point might not carry much meaning, but where amazing insights can be gathered from wider patterns within the data.

Much of my professional life since then has been devoted to bioinformatics and analyzing messy, noisy biological data. Bioinformatics makes heavy use of sophisticated machine learning tools to tackle the Three Horsemen of Biological Data:

1. Size

2. Randomness

3. Complexity

Like much of the rest of the world, bioinformatics is shifting from classical ML—the stiff, rigid algorithms of a decade or so ago—to deep learning.

Classical AI Can Handle Size and Randomness, but It’s Not Very Good at Complexity

I picked up classical ML during my PhD. Fifteen years ago, I worked in a lab with a goal similar to modern-day Neuralink: helping paralyzed people use brain-computer interfaces. I worked on data in the lab, and there was a visual metaphor I liked that helped me picture the problem we were trying to tackle.

Imagine airdropping 1,000 microphones onto a city. Some would break on impact, get run over, or stepped on, but some subset would survive and pick up the noises around them: road traffic, conversations, birds, etc. Now imagine taking the recordings from these 1,000 microphones and using them to understand that city, to try and answer questions like:

“Is it morning or night?”

“Is rush hour very busy there?”

“How does its economy function?”

The lab I was in researched brain-machine interfaces, and instead of air-dropping microphones, we placed electrodes in a brain—but the analogy holds. The wires listened to the brain’s electrical activity, and like an airdrop, we couldn’t control exactly where they ended up. Instead of answering questions about a noisy city, we were answering questions about an electrically dynamic brain. By putting the wires in the parietal cortex, an area of the brain somewhat mysteriously involved in how we move our limbs, we hoped to “listen in” on a brain at work and glean information about what it was up to. If we could interpret those brain signals, we could pass that information on to a computer and—voila—a brain-computer interface.

In 2008, we attempted this problem with what is now called “classical ML,” although back then we just called it machine learning, and, if we were being fancy, “supervised” machine learning. In our lab, we applied classifiers like random forests and support vector machines to take the electrical signals from those wires and guess whether a brain was thinking “right” or “left.” These algorithms were cutting-edge, not only because they were new, but because they could handle two of the Biological Horsemen:

Size

Classical AI algorithms like classifiers are capable of handling big data. Many are inherently multivariate, i.e., able to take many dimensions of data. Size is an asset for even these early ML models.

Randomness

Since classical supervised machine learning can look at all the data, it can average out areas of local randomness. Randomness in data, or noise, often happens at the small scale, i.e., a single data point or slice of the data. As long as this local jitter isn’t so big it breaks the dataset, looking at all the data can mean trends or patterns can still be picked out. We might not know exactly what this one muffled microphone is saying, but if all microphones turn on at once, we can guess that something is indeed happening.

A limit of supervised machine learning—and one I definitely felt as I struggled with my data—is that these algorithms of classical AI still required a lot of human handholding. In order for them to make guesses about what’s going on in that brain, the neural signals had to be vastly simplified using heuristics, rubrics, and educated guesses. For example, instead of giving the algorithms the full “sound” of the wires, we watched for times that a neuron was “on” (i.e., firing an action potential), recorded it, and fed those times into the algorithms.

We assumed—probably rightly so—that the information was encoded in the timing of neurons, not in the electric signal itself, and so we extracted what we thought was meaningful from the data. That is, classical AI had a human upstream of it, sending it only the data the human thought was relevant. The supervised machine learning tools we used required carefully curated, simplified data, and the kinds of complexity they could pull out of datasets were often limited by that upstream human.

Yet, these rigid algorithms struggled hard with the last Biological Horseman: complexity.

Modern Deep Learning in Bioinformatics

Fast forward sixteen years later, and while supervised ML is still widely used in bioinformatics, it is increasingly being replaced by a new tool: deep learning. Deep learning can handle complexity because it doesn’t require the researcher to craft features. These neural networks learn features directly from the data, allowing them to model highly complex patterns and interactions without human intervention in feature engineering. This capability makes deep learning particularly well-suited for bioinformatics, where data is often high-dimensional and full of intricate relationships.

A prime example of deep learning in bioinformatics is AlphaFold, developed by DeepMind, which has made remarkable strides in predicting protein structures. In proteomics, deep learning is also making a significant impact. Tools like DI-ANN (Deep Learning-enabled Data Independent Acquisition for Neuroproteomics) utilize deep learning to analyze mass spectrometry data, a technique used to study the proteome—the entire set of proteins expressed in a cell, tissue, or organism.

Image analysis is another domain where deep learning has brought transformative changes to bioinformatics. High-throughput microscopy and medical imaging generate vast amounts of visual data, requiring sophisticated algorithms for effective analysis. Convolutional neural networks (CNNs), a type of deep learning model, have become the standard for image analysis due to their ability to automatically learn spatial hierarchies of features.

Unsurprisingly, modern brain-control interface tools are also deep learning-based. Advanced neural networks are now used to decode brain signals more accurately and efficiently. For example, research teams are developing deep learning models that can interpret complex neural data in real time, enabling more responsive and reliable brain-computer interfaces (BCIs).

Moreover, deep learning is improving the robustness and adaptability of BCIs. By continuously learning from new data, these systems can adapt to the user’s neural activity changes over time, ensuring consistent performance and usability. This adaptability is crucial for long-term applications of BCIs, as it addresses the challenge of neural signal variability.

Conclusion

The shift from classical AI to deep learning in bioinformatics marks a significant change in the field. While classical AI laid the foundation by handling large datasets and mitigating randomness, it struggled with the inherent complexity of biological data. Deep learning, with its ability to automatically learn complex patterns and interactions, has overcome these limitations, driving advancements across genomics, proteomics, image analysis, and brain-computer interfaces.

Want to learn more about how we’re leveraging cutting-edge AI and deep learning to transform bioinformatics? Explore our bioinformatics capabilities or contact us to discuss how we can support your next project.

Subscribe to our newsletter

Stay informed with the latest insights, industry trends, and expert tips delivered straight to your inbox. Sign up for our newsletter today and never miss an update!

We care about the protection of your data. Read our Privacy Policy.

Keep reading

Dig deeper into data development by browsing our blogs…

Get in Touch

Let us leverage your data so that you can make smarter decisions. Talk to our team of data experts today or fill in this form and we’ll be in touch.

Take a deeper dive

Locate Us

Follow Us

Contact Us