Briefings in Bioinformatics

sincFold: end-to-end learning of short- and long-range interactions in RNA secondary structure

Article


Motivation

Coding and noncoding RNA molecules participate in many important biological processes. Noncoding RNAs fold into well-defined secondary structures to exert their functions. However, the computational prediction of the secondary structure from a raw RNA sequence is a long-standing unsolved problem, which after decades of almost unchanged performance has now re-emerged due to deep learning. Traditional RNA secondary structure prediction algorithms have been mostly based on thermodynamic models and dynamic programming for free energy minimization. More recently deep learning methods have shown competitive performance compared with the classical ones, but there is still a wide margin for improvement.

Results

In this work we present sincFold, an end-to-end deep learning approach, that predicts the nucleotides contact matrix using only the RNA sequence as input. The model is based on 1D and 2D residual neural networks that can learn short- and long-range interaction patterns. We show that structures can be accurately predicted with minimal physical assumptions. Extensive experiments were conducted on several benchmark datasets, considering sequence homology and cross-family validation. sincFold was compared with classical methods and recent deep learning models, showing that it can outperform the state-of-the-art methods.

The source code is available at https://github.com/sinc-lab/sincFold (v0.16) and the web access is provided at https://sinc.unl.edu.ar/web-demo/sincFold

Bugnon LA, Di Persia L, Gerard M, Raad J, Prochetto S, Fenoy E, Chorostecki U, Ariel F, Stegmayer G, Milone DH. sincFold: end-to-end learning of short- and long-range interactions in RNA secondary structure. Brief Bioinform. 2024 May 23;25(4):bbae271. doi: 10.1093/bib/bbae271.

 

 

Key Points
  • sincFold is an end-to-end DL model that can accurately predict the secondary structure from an RNA sequence.

  • Local and distant relationships can be learnt effectively using a sequential 1D-2D architecture based on residual networks.

  • sincFold learns internal representations from 1D and converts them to a 2D representation with a tensorial product to learn the long-range interactions in the following layers.

  • Experimental setup includes random folds, low homology partitions and inter-family cross-validation.

  • sincFold performed better than other state-of-the-art DL approaches in several datasets.