Introduction: Starting in the 1980s with the first catalytic RNAs (Kruger et al. 1982; Guerrier-Takada et al. 1983), the discovery of novel biological functions for RNA molecules continues unabated to the present (Doudna & Cech 2002). It is now well established that RNA molecules play central roles in all aspects of gene expression, far beyond the passive informational roles first delineated with the founding of molecular biology as a science in the mid 20th century. In the ribosome, RNA is responsible for decoding the codon-anticodon interaction (Carter et al. 2000; Ogle et al. 2003) and catalyzing peptide bond formation (Nissen et al. 2000; Hansen et al. 2002). Structured mRNAs regulate their own translation and direct translational reprogramming. RNA-based machinery, the spliceosome, is responsible for splicing (Murray & Jarrell 1999). RNA aptamers – RNA molecules selected to specifically bind diverse substrates – started out as laboratory curiosities but are now recognized as key components of natural riboswitches, providing new paradigms for regulation of gene expression at the translational level (Bittker et al. 2002). Telomerase RNA is directly involved in regulating the aging process (Chan & Blackburn 2004). RNA molecules, especially ribosomal RNAs, have served as valuable tools to trace the evolutionary history of life on earth, while the discoveries of catalytic RNAs have fueled research into plausible models for the origin of life itself.
      The continuous discovery of new RNA molecules with novel biological functions testifies to a major unsolved problem in the field: systematic identification of all non-coding RNA genes in genomic sequences (Eddy 2002). In fact, it is estimated that up to 98% of the transcriptional output of humans is non-coding RNA (Mattick 2001). It appears that RNA-mediated gene regulation is widespread in higher eukaryotes and that complex genetic phenomena like RNA interference, co-suppression, transgene silencing, imprinting, methylation, and possibly position-effect variegation and transvection, all involve intersecting RNA-containing pathways (Mattick 2001; Mattick & Gagen 2001).
      RNA science is playing an important role in understanding normal and abnormal metabolism and physiology and in designing new strategies for intervention. For example, novel methods have been proposed to redirect RNA mis-splicing (Sazani & Kole 2003) and to repair RNA sequences (Long et al. 2003). An imprinted snoRNA locus, normally expressed in brain tissue, is not expressed in patients with Prader-Willi syndrome and in mouse models of Prader-Willi (Nicholls & Knepper 2001; Gallagher et al. 2002), suggesting that a defect in RNA modification may underlie the disease. Evidence has also mounted that a transcribed RNA with long triplet repeats is the agent that is toxic to muscle fibers in myotonic dystrophy (Mankodi & Thornton 2002; Ebralidze et al. 2003). In addition, the discovery of RNA interference is opening whole new methodologies for targeting specific genes, which provide powerful tools for doing wholesale molecular biology and also offer new ways to develop novel therapeutic agents. However, the issues of delivery, stability, and specificity of RNA interference agents will depend on the integration of different kinds of data. Accurately describing and predicting RNA structure are important goals that will aid in the design of novel RNA-based pharmaceuticals that either target RNA or are composed of synthetic RNA sequences.
      RNA is also an important pharmaceutical target. Antisense drugs target mRNA (Dias & Stein 2002). Also, many antibiotics target rRNA by exploiting the structural differences between bacterial and eukaryotic ribosomes (Recht et al. 1999; Lynch & Puglisi 2001; Recht & Puglisi 2001; Hansen et al. 2002; Hansen et al. 2003; Pfister et al. 2003; Vicens & Westhof 2003; Vicens & Westhof 2003). All existing ribosome-binding drugs were isolated or developed before the ribosomal crystal structures were available, and the availability of crystal structures offers significant new targets for drug development. Ribosomal RNA sequences are known for many bacterial pathogens and it is easy to determine the rRNA sequence for most target organisms, but it is unlikely that high-resolution 3D structures will become available for the ribosomes of all target bacteria of scientific or medical interest. Rational drug design efforts will therefore depend on sophisticated three-dimensional modeling of the antibiotic target sites. The ability to build high quality RNA 3D models will depend, in turn, on a much deeper understanding of RNA structural motifs than is currently available. Moreover, there is still a paucity of tools for refining and analyzing RNA models. For example, interpretation of the subtle structural changes during the ribosomal translation cycle, as visualized in cryo-EM, requires the ability to manipulate the crystal structures in ways that correspond to the natural deformational modes of the RNA and protein components. There exists, therefore, a need for new ways to analyze and manipulate the 3D structural RNA motifs involved in biological functions.
      Molecular science is rapidly advancing the Darwinian imperative of accurately defining the phylogenetic relationships of all living organisms of Earth. RNA science contributes fundamentally to this endeavor because RNA molecules comprise some of the most conserved and ubiquitous families of homologous molecules. Thus, comparison of ribosomal RNA sequences made it possible to determine the deepest branchings of the tree of life – and to establish the Archaea as the third major phylogenetic domain (Pace et al. 1986). To fully exploit RNA sequence information for constructing accurate phylogenies it is essential to have accurate alignments, which in turn require deep understanding of RNA 3D structure and evolution at the level of motifs. The possibility of life on Mars – a notion no longer relegated to science fiction given recent evidence that water once flowed on the surface of the Red Planet – poses new challenges for phylogenetics (Kennedy 2002). Should life be discovered on Mars, the fundamental phylogenetic question will be whether life arose independently more than once within our solar system.
      In summary, the last few years have seen rapid growth of the databases of RNA sequences and experimentally determined RNA structures, especially atomic resolution x-ray structures (Ban et al. 2000; Schluenzen et al. 2000; Wimberly et al. 2000; Adams et al. 2004; Ke et al. 2004). There is now a critical need to identify and classify 3D RNA motifs and to create searchable, usable databases to relate 3D structure and sequence if we are to effectively use these data to promote a host of diverse research agendas. Currently, several databases have been developed that annotate aspects of RNA structure, among them the Nucleic Acids Database (http://ndbserver.rutgers.edu), the Comparative RNA Web site (http://www.rna.icmb.utexas.edu), the Non-canonical Interactions in RNA site (http://prion.bchs.uh.edu/bp_type), and the Structural Classification of RNA (http://scor.lbl.gov). While these databases often refer to identical regions of RNA structure, there are few direct links between these sites. For example, no links exist between the motifs in SCOR and the two-dimensional representations and sequence information in the Comparative RNA Web site, or between the analysis of nucleic acid structures provided by the NDB and the analysis of the Non-Canonical Interactions in RNA site. This is due to the lack of a standard approach to the distribution and annotation of RNA structure.
      The user community for integrated RNA databases and computational tools is very large and includes: (1) Molecular biologists who need to interpret RNA sequence and probing data to produce plausible 3D models for functional RNAs they study; (2) biologists seeking to catalogue and understand the diversity of life and the inter-relationships of living things; (3) biochemists and nano-technologists seeking to understand the mechanisms of the most ancient “molecular machines” – RNA-containing supermolecular structures such as the ribosome and splicesosome; (4) genomicists seeking to discover non-coding RNAs in genomes; and (5) academic, government, and industry scientists who research and develop RNA pharmaceuticals or drugs that target RNA.