Data‐driven Guideposts for Targeting RNA: Sequence, Structure and Interactions

August 1, 2018
Chris Burge, PhD, Professor of Biology and Biological Engineering at Massachusetts Institute of Technology

RNA’s fundamental role in biology has made it a burgeoning area for technology development and application, including as an intervention point for drug development. Virtually every step in the expression of a human gene involves RNA, from transcription through processing, localization and, ultimately, translation – the final step on the way from DNA to protein.

Recently, the generation and analysis of large‐scale datasets has led to new insights for therapeutically targeting RNA, including with new modalities such as RNA‐targeted small molecule (rSM) drugs. The RNA field is now empowered with new experimental and computational analysis tools that pave the way for rSM development. In this piece, I will describe some of the ways that our understanding of RNA is growing based on emerging data‐ driven knowledge in areas such as RNA splicing, transcript targetability, and interactions of RNA structure and RNA‐binding proteins.

Homing in on intervention points in RNA splicing and beyond

Splicing of primary transcripts is essential for expression of most human genes. Intervening in the splicing process with therapeutics – with oligonucleotide and small molecule drugs – is emerging as a new therapeutic approach for new RNA‐targeted medicines. As a quick foundation, the initial RNA transcript from a gene undergoes a series of cutting and pasting steps before it is translated. This process, known as RNA splicing, removes intervening sequences (introns), and joins together the remaining RNA segments (exons), to form the mature mRNA which codes for the protein.

The opportunity to therapeutically target RNA relies on understanding how and where to intervene in the splicing process. Perturbing splicing can shift it from productive to non‐ productive mRNA isoforms or vice versa, which can have therapeutic value. Indeed, an antisense molecule developed by Ionis that perturbs splicing of the SMN2 gene, promoting inclusion of an exon that is mostly skipped by default, has achieved excellent success in treating spinal muscular atrophy, and a small molecule drug for SMA developed by Novartis works by a similar mechanism. More general inhibitors of splicing, like those being developed by H3 Biomedicine, may have potential in cancer. The 5’ and 3’ untranslated regions (UTRs) at the beginning and ends of each mRNA offer equally attractive targets for intervention. Human 3’ UTRs contain as many evolutionarily conserved bases as promoters [Xie et al], and contribute to mRNA stability, localization, and translation. The 5’ UTR impacts a similar spectrum of mRNA functions and must be scanned by the small ribosomal subunit every time the message is translated, making the structure particularly dynamic and offering opportunities for perturbing protein output.

Transcript targetability is likely orthogonal to protein targetability

Most of a human primary transcript is non‐coding – intronic or untranslated region – and has no or minimal sequence relationship to the encoded protein. Thus, the transcript sequence properties that make a particular gene attractive or unattractive as a rSM target are expected to be entirely independent of the amino acid sequence properties that make the encoded protein well or poorly suited as a target of a conventional protein‐directed SM. This is fundamental to the promise of the rSM approach: just because a disease‐associated gene encodes a transcription factor or other traditionally undruggable category of protein says nothing about the structural properties of its primary transcript or messenger RNA, and so presents no intrinsic barrier to its potential targetability with an rSM.

RNA‐binding proteins are ubiquitous and are being mapped at large scale

In vitro, essentially any RNA more than ten or twenty bases long will interact with itself, forming a specific secondary structure (set of RNA helices) and tertiary (3D) structure, or sometimes multiple structures that interconvert. Structure appears to be fairly widespread in vivo as well, though precisely how much is still being worked out. In the cell, each RNA is escorted by RNA‐binding proteins (RBPs) from birth to death. RNA-RBP complexes are so ubiquitous that many RNA biochemists prefer to talk about mRNP (messenger ribonucleoprotein) rather than mRNA, miRNP rather than microRNA, etc. RBPs sculpt RNA function, directing the course of its processing, modification, localization and translation.

Methods to map protein‐RNA interactions in vitro and in vivo have progressed rapidly over the past decade. Pioneering methods like SELEX, RIP and HiTS‐CLIP have spawned new variants with increased sensitivity and throughput, which are starting to be applied to RBPs at larger scales, including a project on RBPs as part of the most recent phase of the ENCODE project [Van Nostrand et al]. These larger efforts are starting to uncover general patterns in protein‐RNA interactions. Analyzing a diverse set of ~150 RBPs by their enhanced version of crosslinking/immunoprecipitation‐sequencing (eCLIP), the Yeo lab at UCSD detected peaks of CLIP signal overlapping about one‐third of the bases in expressed exons, confirming the expectation that transcripts interact with very large numbers of RBPs during their lifetimes.

In vitro analysis of RBP affinity has shown that most canonical RBPs favor a specific primary sequence motif or motifs, with preference for specific structures appearing less commonly [Dominguez et al]. And yet, even proteins that bind with high affinity and specificity to a particular sequence motif detectably interact with only a small subset of the occurrences of this motif in expressed transcripts, even in cells where the RBP is highly expressed. Recent studies have shed light on this puzzle: RNA secondary structure appears to occlude the binding of many cognate motifs in the transcriptome, both in vitro and in vivo [Taliaferro et al]. This observation supports the “structure is widespread in transcripts” camp, implying an ample search space of RNA features for targeting by rSMs. Furthermore, if these structures are dynamic, and structure fundamentally alters the complement of RBPs bound to an RNA, as seems common, then there is broad potential for use of rSMs to stabilize one RNA conformation relative to

others. Artificially stabilizing a particular structure may impact the processing or translation of the RNA directly, or indirectly modulate RNA function by shifting the complement of RBPs bound.

Advances in experimental and computational analysis enable exploration of RNA structure

Rational perturbations of RNA structure and function by rSMs is fundamentally enabled not only by the improved mapping of protein‐RNA interactions discussed above, but also by recent advances in experimental and computational analysis of RNA structure (e.g., SHAPE, DYNAMO) and by large‐scale analyses of RBP function and localization. For example, knockdown/RNA‐seq analyses of more than 200 RBPs have recently been reported by the Graveley lab at UCHC, facilitating identification of transcripts impacted at the level of RNA processing or decay/stabilization (refs). GFP tagging of hundreds of RBPs by the Lecuyer lab (U. Montreal) has helped to pinpoint subcellular localization of RBP activity, providing additional insights into function (Lecuyer).

In summary, a convergence of new technologies and data – many enabled or enhanced by next‐ generation sequencing – has effectively built an outpost in the land of cellular RNA biology, from which campaigns can be launched toward exciting RNA targets.

References

Xie et al. Nature 434, 338‐45, 2005.
Van Nostrand ‐ https://www.biorxiv.org/content/early/2017/08/23/179648 Dominguez et al. Molecular Cell 70, 854‐8677, 2018.
Taliaferro et al. Molecular Cell 64, 294‐406, 2016.
Lecuyer ‐ http://rnabiology.ircm.qc.ca/RBPImage/