Introduction: Starting in the 1980s with the first catalytic RNAs (Kruger et
al. 1982; Guerrier-Takada et al. 1983), the discovery of novel biological functions
for RNA molecules continues unabated to the present (Doudna & Cech 2002).
It is now well established that RNA molecules play central roles in all aspects
of gene expression, far beyond the passive informational roles first delineated
with the founding of molecular biology as a science in the mid 20th century.
In the ribosome, RNA is responsible for decoding the codon-anticodon interaction
(Carter et al. 2000; Ogle et al. 2003) and catalyzing peptide bond formation
(Nissen et al. 2000; Hansen et al. 2002). Structured mRNAs regulate their own
translation and direct translational reprogramming. RNA-based machinery, the
spliceosome, is responsible for splicing (Murray & Jarrell 1999). RNA aptamers – RNA
molecules selected to specifically bind diverse substrates – started out
as laboratory curiosities but are now recognized as key components of natural
riboswitches, providing new paradigms for regulation of gene expression at the
translational level (Bittker et al. 2002). Telomerase RNA is directly involved
in regulating the aging process (Chan & Blackburn 2004). RNA molecules, especially
ribosomal RNAs, have served as valuable tools to trace the evolutionary history
of life on earth, while the discoveries of catalytic RNAs have fueled research
into plausible models for the origin of life itself.
The continuous discovery of new RNA molecules with novel biological functions
testifies to a major unsolved problem in the field: systematic identification
of all non-coding RNA genes in genomic sequences (Eddy 2002). In fact, it is
estimated that up to 98% of the transcriptional output of humans is non-coding
RNA (Mattick 2001). It appears that RNA-mediated gene regulation is widespread
in higher eukaryotes and that complex genetic phenomena like RNA interference,
co-suppression, transgene silencing, imprinting, methylation, and possibly position-effect
variegation and transvection, all involve intersecting RNA-containing pathways
(Mattick 2001; Mattick & Gagen 2001).
RNA science is playing an important role in understanding normal and abnormal
metabolism and physiology and in designing new strategies for intervention. For
example, novel methods have been proposed to redirect RNA mis-splicing (Sazani & Kole
2003) and to repair RNA sequences (Long et al. 2003). An imprinted snoRNA locus,
normally expressed in brain tissue, is not expressed in patients with Prader-Willi
syndrome and in mouse models of Prader-Willi (Nicholls & Knepper 2001; Gallagher
et al. 2002), suggesting that a defect in RNA modification may underlie the disease.
Evidence has also mounted that a transcribed RNA with long triplet repeats is
the agent that is toxic to muscle fibers in myotonic dystrophy (Mankodi & Thornton
2002; Ebralidze et al. 2003). In addition, the discovery of RNA interference
is opening whole new methodologies for targeting specific genes, which provide
powerful tools for doing wholesale molecular biology and also offer new ways
to develop novel therapeutic agents. However, the issues of delivery, stability,
and specificity of RNA interference agents will depend on the integration of
different kinds of data. Accurately describing and predicting RNA structure are
important goals that will aid in the design of novel RNA-based pharmaceuticals
that either target RNA or are composed of synthetic RNA sequences.
RNA is also an important pharmaceutical target. Antisense drugs target mRNA (Dias & Stein
2002). Also, many antibiotics target rRNA by exploiting the structural differences
between bacterial and eukaryotic ribosomes (Recht et al. 1999; Lynch & Puglisi
2001; Recht & Puglisi 2001; Hansen et al. 2002; Hansen et al. 2003; Pfister
et al. 2003; Vicens & Westhof 2003; Vicens & Westhof 2003). All existing
ribosome-binding drugs were isolated or developed before the ribosomal crystal
structures were available, and the availability of crystal structures offers
significant new targets for drug development. Ribosomal RNA sequences are known
for many bacterial pathogens and it is easy to determine the rRNA sequence for
most target organisms, but it is unlikely that high-resolution 3D structures
will become available for the ribosomes of all target bacteria of scientific
or medical interest. Rational drug design efforts will therefore depend on sophisticated
three-dimensional modeling of the antibiotic target sites. The ability to build
high quality RNA 3D models will depend, in turn, on a much deeper understanding
of RNA structural motifs than is currently available. Moreover, there is still
a paucity of tools for refining and analyzing RNA models. For example, interpretation
of the subtle structural changes during the ribosomal translation cycle, as visualized
in cryo-EM, requires the ability to manipulate the crystal structures in ways
that correspond to the natural deformational modes of the RNA and protein components.
There exists, therefore, a need for new ways to analyze and manipulate the 3D
structural RNA motifs involved in biological functions.
Molecular science is rapidly advancing the Darwinian imperative of accurately
defining the phylogenetic relationships of all living organisms of Earth. RNA
science contributes fundamentally to this endeavor because RNA molecules comprise
some of the most conserved and ubiquitous families of homologous molecules. Thus,
comparison of ribosomal RNA sequences made it possible to determine the deepest
branchings of the tree of life – and to establish the Archaea as the third
major phylogenetic domain (Pace et al. 1986). To fully exploit RNA sequence information
for constructing accurate phylogenies it is essential to have accurate alignments,
which in turn require deep understanding of RNA 3D structure and evolution at
the level of motifs. The possibility of life on Mars – a notion no longer
relegated to science fiction given recent evidence that water once flowed on
the surface of the Red Planet – poses new challenges for phylogenetics
(Kennedy 2002). Should life be discovered on Mars, the fundamental phylogenetic
question will be whether life arose independently more than once within our solar
system.
In summary, the last few years have seen rapid growth of the databases of RNA
sequences and experimentally determined RNA structures, especially atomic resolution
x-ray structures (Ban et al. 2000; Schluenzen et al. 2000; Wimberly et al. 2000;
Adams et al. 2004; Ke et al. 2004). There is now a critical need to identify
and classify 3D RNA motifs and to create searchable, usable databases to relate
3D structure and sequence if we are to effectively use these data to promote
a host of diverse research agendas. Currently, several databases have been developed
that annotate aspects of RNA structure, among them the Nucleic Acids Database
(http://ndbserver.rutgers.edu), the Comparative RNA Web site (http://www.rna.icmb.utexas.edu),
the Non-canonical Interactions in RNA site (http://prion.bchs.uh.edu/bp_type),
and the Structural Classification of RNA (http://scor.lbl.gov). While these databases
often refer to identical regions of RNA structure, there are few direct links
between these sites. For example, no links exist between the motifs in SCOR and
the two-dimensional representations and sequence information in the Comparative
RNA Web site, or between the analysis of nucleic acid structures provided by
the NDB and the analysis of the Non-Canonical Interactions in RNA site. This
is due to the lack of a standard approach to the distribution and annotation
of RNA structure.
The user community for integrated RNA databases and computational tools is very
large and includes: (1) Molecular biologists who need to interpret RNA sequence
and probing data to produce plausible 3D models for functional RNAs they study;
(2) biologists seeking to catalogue and understand the diversity of life and
the inter-relationships of living things; (3) biochemists and nano-technologists
seeking to understand the mechanisms of the most ancient “molecular machines” – RNA-containing
supermolecular structures such as the ribosome and splicesosome; (4) genomicists
seeking to discover non-coding RNAs in genomes; and (5) academic, government,
and industry scientists who research and develop RNA pharmaceuticals or drugs
that target RNA.