Hawaii Notes Day 1

 

01/08/10

ROC meeting notes


Attendees:

Rob Knight: rna structure prediction, distribution of functional rnas in random sequences.

Jesse Stombaugh: (Rob) analyzing rna 3d structure

Jerry Kennedy: (Rob) DB

Lauren Neulander: (Alain) chemical mapping experiments

Alain Laederach: looking at unstructured RNAs with chemical mapping.  Generate a lot of data; how do we organize it?

Susanna Sansone : EBI; OBI; minimal information checklists

Phillipe Rocca-Serra: EBI; OBI, OBO foundry

Dave Mathews: RNA computational biology—best practices for using experimental data to improve structure prediction

Stas Bellaousov: (Dave) secondary structure prediction (pseudoknots)

Amanda Birmingham: MIARE & associated ontology


OBI & RNAO connections

What kinds of things should go into OBI vs RNAO vs CHEBI for chemical/enzymatic mapping expts on 2ndary structure?

OBI: 3 entities: material entities, processes (data transformation, material transformations, etc), dependent entities (used to qualify processes or material entities)

Running blast on sequences is a data transformation (input sequences, output result set)

Try to rely on sequence ontology, but there are some development issues; information artifact ontology, hopefully available next year, to address (differentiate between info about a sequence and the physical sequence itself)

Reagents should be submitted to chebi

Role (eg, alkylating agent) of chemical compound— goes to chebi rather than obi

Role of that alkylating agent as an agent for determining secondary structure would go into OBI, w/reference to chebi id

Colin Batchelor is maintaining both chebi and rnao.

OBI could define an rna structure mapping experiment, input an rna and a chemical agent, output is a vector of per nucleotide reactivity


Do we want this high level, or do we want the detail of each of the steps and their inputs and outputs (RNA to DNA, PCR, etc)?

Can have diff levels of granularity

Chemical mapping has synonym chemical probing—that info goes into obi, as does the list of kind of mapping experiments that there are … but probably not every step of every protocol. 

OBI records definition, source of definition, who submitted it originally, etc so can acknowledge/track people contributing.


New material entities related to chemistry go into chebi

New terms related to RNA, such as bases, go into RNAO according to Colin.

Enzymes will have pubmed id for sequence, GO function terms to describe function.

We would provide info about function like “this enzyme cleaves at Gs” … should we provide this to GO as specialization of Nuclease Activity? Or to protein ontology? [Don’t know, would have to ask them]


Cross products: this protein has this GO function, cleaves this particular location—might depend on sequence ontology to describe site (no, prob RNAO because gets down to atomic level, not described in sequence ontology and too super-detailed for chebi).


Get function of protein from GO, add if not there

Link to rnao for specific position

Put in OBI what purpose of the experiment is (enzymatic probing or some specialization of that, under chemical or enzymatic probing heading)


Obi release schedule: monthly.  Just had 1.0 last month.


Talking about adding a couple of dozen terms: can this be automated?  Use protégé with plugins; can send batch terms in excel [Please send batch templates]; classic templates or quick term templates.


How do we search to find out whether term is already there or not?  Who checks consistency?  We’ve got a reasoner that will catch logical mistake.  Also, protégé will let you search with partial queries directly on owl file.  Users who don’t want to learn protégé can search obi owl file at bioportal.


If you know what the synonyms are, search them first; put them in.  Have a feeling for what is the most commonly used term.  Preferred term vs synonyms.


Competency questions are the questions your ontology should help to answer; when ontology is done, does it really help answer those questions?


Could be that first attempt at modeling could be wrong, would have to revisit and change.

Planning to try to define MI standards tomorrow.

Try to represent an experiment that we all do … how long would it take to go through?  Maybe try to do that next.


---break---


Shape protocol: what would we need to do to upload that protocol into obi?

Units may be managed differently in the future (instances instead of classes)

Need to define objective of an rna mapping assay—can the objective be to predict the structure?  Or to determine shape reactivity?


Analyte assay: have to specify molecular entity that plays role of analyte, another that plays role of evaluant, output is some kind of measurement data.


An assay is a class in obi.

Quantity would be shape reactivity; quantitated by peak analysis, so is an absorbance that has been normalized.  We can define our own units—call it counts?  (if done on capillary electrophoresis, will be counts of luminescence).


Probably counts is a proxy variable for flexibility.


Can evaluant id be a protein?  Yes.

But shape assay is not an analyte assay.  Need to work on how we represent an rna mapping assay, figure out the pattern, from then can fill out others.


Input would be an RNA molecule, some kind of reacting agent (NMIA in shape; would come from Chebi); bypass all the steps of protocol; output would be counts, a proxy for local backbone flexibility.

Identify all the qualities we are measuring (are there others besides local flexibility)?  Reactive accessibility to NMIA is what we’re measuring;


For each probe, reaction to the probe.  Then for each probe, we have a relationship for what that measures; does that go into RNAO or OBI?  In general, shape measures some notion of structure, but that’s not really what it is.


Can we identify all the things that really matter to us?  Flexibility, reactivity of specific locations, etc?  Measurements in between are just a means to an end.


Let’s say we want to measure NMIA reactivity: where do we find the terms that we need to define the inputs?  Things need to be in chebi to get this far?


Pattern for chemical mapping experiment: Take an RNA, take a probe, probe reacts selectively with different parts of RNA, you measure the counts.


Shape reacts on 4 kinds of nucleotides, another reacts only with G … maybe the evaluant here is not an RNA, but just a single nucleotide?


Shape experiment to measure reactivity (autogenerated name).

What about enzymes that really don’t act on nucleotides, they act on chains.  Could we say “acts on G and A in RNA” or is that too much?  Don’t want to say just acts on a nucleotide because if just have a single nucleotide it won’t react with it.


Shape reactivity of a,c,g, and u in RNA –this is more the objective of the assay

DMS: reactivity of G and A in RNA


Shape reactivity:

Input molecule: RNA (always?)

Input probe: NMIA (is this in chebi?)

Reactivity of probe: A, C,G, and U

Output measurement: counts of luminescence

Proxy for: shape reactivity of a, c, g and u in RNA

RNAO has A,C,G and U

What makes the difference between the mapping experiments is the different probes you’re using?

What we really want to capture is that each reagent (enzyme or a chemical entity) reacts with a specific nucleotide (and within that nucleotide, it can react with a specific atom).  To date, most of the modeling is at the sequence level, so have to model at least to nucleotide level.  Maybe put the atomic info in the RNAO.  The readout is always per nucleotide; never get atomic readout in a mapping reaction.


NMIA reacts with A, C, G, and U: how do we define that in OBI?

I have a chemical reagents which binds with a nucleotide and I can measure that reactivity. 

Some probes have nucleotide specificity and some don’t.

Specificity of reactivity is defined as part of assay rather than part of probe (why?)


Right now, Phillipe can’t answer how we say that this assay surveys all four bases but will look into. 


Microarray is a northern blot times 10K; mapping is measuring a nucleotide times number of nucleotides. OBI characterizes microarrays, says output is a vector, doesn’t say what the types of vector elements are.


If you’re reporting high reactivity of a U to DMS, should tell you that something is wrong; if we can just even capture which probes hit which atoms, that will be a good thing.  Phillipe not sure how to represent that in OBI; gut would be that this is info about material entities, a union of all the nucleotides included, excluding the other ones.


Shape can have different probes, but all probes are really measuring the same thing (have different times you use them for, different kinetics).


Note that Alain says that DMS is used for DNA by some people.  Shape can only be used for RNA.


Can have multiple proxys … proxy for base pairing?  Phillipe doesn’t like “base pairing” because it sounds like a process … how about “presence of base pairing”, or “base pair”, or something?  Is “base pair” in the RNAO?  Yes, can encapsulate both WC and nonWC—this is good because DMS would potentially react with that. 


Don’t want to say “absence of base pair” because there could be a NONSTANDARD pair there.


Phillipe says Go ahead an put down as many proxies as we want for now, but may end up representing them differently later.


OBI meeting in march in Vancouver; maybe send a representative?  Alain volunteers to go.

Action items: We fill in spreadsheet laid out by Phillipe, then iteratively talk with him—need to decide how to handle multiple proxies, the set of nucleotides with reactivity—so probably not ready to generate OBI terms immediately on completing spreadsheet.


Can have pubmed ids for a technique.  For the moment, OBI parser can only deal with one, but there’s no reason why can’t theoretically have more than one.


Eventually will need free text for each of the assays—2-5 lines.


--break--


Should reactivity inhere in the assay or the agent?

Should we say that something is just a shape reactivity assay, or do we need to have “shape reactivity with NMIA” vs “shape reactivity with <other probe> X”


Shape reactivity assay

Input: RNA mol

Input: probe assaying reactivity

Output: data about shape reactivity

If anyone annotates an assay where input is an RNA and a probe assaying reactivity and outputs data about reactivity, classifier can say that this is a shape assay.


In spreadsheet, one row for each chemical probe used in assay.


Could if we want to define superset of “shape assay using a <material entity that’s a probe>” with sub-items of “shape assay using NMIA”, “shape assay using <other probe>X”, etc.  HOWEVER, that can be worked out after the spreadsheet is filled in and shouldn’t have to affect the format of the spreadsheet.


What methods are going to be covered?  And who will fill them in?

Let’s decide these as a group; then can do the work of filling them in in parallel/off-line.


Scope deliverables

Purpose of meeting: figure out …

How to capture structure mapping data

How to link up structure mapping data to alignments

How do you take info you have about different structures for different locations and combine them

But alignment people couldn’t make it to the meeting, so will handle this later at Boulder meeting

Decide when to have next meeting

Best practices recommendation on how people can improve on their structure prediction using experimental data (Dave’s interest)

Don’t just fold an rna in mfold, also include a multiple sequence alignment, etc

Should we get into 3-d aspects of chemical mapping? Does this fall under number 1?  What about under 3? (Dave says no: there will be so much to say about secondary structure).  Hydroxyl radical really only gives you info about 3d structure, not about 2d structure.  This is probably out of scope for *this* meeting, but will be in scope at some point.


For now, will describe agents like hydroxyl radical at a high level, but not go into details at this stage.  We should do what’s easy now, because a lot of what’s hard now will be easier *after* we get this stuff done.


Categories: Now (as much as can by Sunday, done by midnight Feb. 14); Later, Never

Focus on finishing a small number of things rather than starting a large number of things.


Action items (tentative):

Classification of type of struct probes (now)

Distinguish stuff telling you a region is base-paired from stuff telling you about distance constraints—a very high level

Will be contributed to OBI

Classification of per-base modification agents + format for describing (now)

format is spreadsheet

Go through list sent by Alain, see if any to be added

Probably will go into RNAO, maybe into OBI

Minimal info for probe experiment (now/later)

Only the experiments covered in #2

Needs to be consistent with/compliant with MIBBI project

AB thinks we can only get brainstorming on this done at this meeting

Classification of ways to improve on single-seq comp. struct. Pred. (now)

Comparative seq analysis vs incorporating chemical probing info vs ..?

Can do as part of document produced for #5

Recommendations on what users should do for better structs (now: see above)

System for evaluating evidence for/against struct. Hypotheses (global/local) (later)

Document describing how to contribute terms and where they go (now)

What we discussed earlier this morning

System for marking up protocols (never)

Input every chemical, every concentration, reagent (such as buffer, with many components), time period, etc, all linked to chebi, obi, etc

Database of struct map info (later)

Biggest, most visible impact to experimentalists who care about this

Evidence codes for struct info (later)

Is it experimentally determined?

Does it include per-nucleotide structure info?


Things you need to record

Purpose

To determine secondary structure

To find folding pathways

Type of experiment: OBI id for …

Shape

DMS

Important protocol steps

Folding protocol (annealing is too specific?)

Solvent concentrations

Salt

PH

RNA preparation

Transcription

In vivo

In vitro

Reaction could be done in cell …

Concentration of modifying agent

Length of exposure to modifying agent

Signal detection

Capillary

Gel

High-throughput sequencing

Readout method

RT

Direct

Exogenous molecules

Proteins, mapping of riboswitch in the presence of salt, etc

Incubation with ligand in RNA prep

Signal processing

Software

Peak integration

--lunch--


Probe DMS reacts with N1 of G (with N1 and G referencing the RNAO terms) rather than DMS reacts with G


#10 (Evidence codes): explain what codes are available and ask that, when papers are published, people indicate the evidence code for what they publish?


Structure/basepair supported:

Only by computational prediction

By chemical probing

By enzymatic probing


By crystallographic data


What kinds of evidence do we want to distinguish and which do we think are the same?  (eg, is it diff to have a prediction with mfold vs a prediction with Vienna package?  Prob not).

Proposed prioritization of tasks:

Classification of types of structure probes

Classification of per-base modification agents and format for describing them

Classification of diff ways to improve on single-seq comp. struct prediction/ recommendations on what users should do for better structures

Dave will outline this independently

Document describing how to contribute terms and where they go

AB and RK will work on this off-line


Classification of types of structure probes

Types of chemical and enzymatic probing that act per-nucleotide

Call this structure mapping?

Maybe nucleic acid structure mapping?


In-line probing (has primer/probes that either bind or not) … gives some info on structure, is Ron Breaker’s way of showing riboswitches and is extremely popular.

Action item: Alain (and maybe Dave) will research this


Nucleic acid structure mapping

Nucleic acid (backbone): DNA or RNA


Chemical and Enzyme at the top of the hierarchy, specificity below.

Is it true that all enzymes currently in use are nucleases?  Yes.  So should distinction instead be modification vs cleavage?  No, bc some of the chemical probes can cleave, so chem. Vs enzyme is more important.  Why?  Chem are small agents, some can be done in vivo, while no enzymes can.


Counterargument: hydroxyl radical is more similar to enzyme cleavage in terms of how you do expt afterwards than enzyme vs structure mapping (??)

But you can let DMS run to cleavage with dye system, so that would mean it could be BOTH modification and cleavage: same agent in both categories.


In principle all combinations of these things are possible, don’t impose a hierarchy, just allow decoration with a variety of tags: 1 from set of {chem, enzyme}, 1 from set of {DNA, RNA}, 1 from set of {modification, cleavage}


Need to capture whether modification or cleavage because affects whether you do RT vs direct readout.  Alain says no: can do RT on hydroxyl radical and get same output as direct readout.  Is it same or are there biases?


Can we think of any use case where we care whether method is cleavage or modification?  No, but that doesn’t mean we won’t care in 10 years.  Of course, that argument applies to everything, so slippery.  But in this case, there are only two choices and everyone knows which their assay is, so just leave it in, drop it, go on.


Wouldn’t want to exclude nmr as a nucleic acid structure mapping method, so what should we call these methods to distinguish them?  High-throughput? No, bc what if someone figures out how to do NMR high-throughput?  Only distinguishable because it gives per-nucleotide mapping data.


Nucleic acid structure mapping

Single nucleotide resolution nucleic acid structure mapping (SNR NASM)

each includes:

1: {chem., enzymatic} (snrnasm method)

1: {cleavage, modification} (snrnasm activity)

1+ : {DNA, RNA} (snrnasm specificity [list])


Note: user specifies for particular assay whether they provided DNA or RNA.  However, some assays are specific for just a particular nucleic acid (shape only works on RNA) but some are specific to both DNA and RNA (like DMS), so specificity does need to also be given at the assay definition level.


If you have something that makes an abasic site, that is included under the modification activity.


Is there anything about the snrnasms that would group them together into subgroups?  Would we want to designate a whole bunch of related chemical snrnasms as shape, for example?  Chemists would argue that should be grouped by amidation, acylation, etc … but they will be notated with that by what they modify in chebi (?) so we don’t need to include them.


Do we want to indicate shape as subcategory of chemical methods that are all together? Yes.  Is there anything else like that, or can everything but shape be lumped together?  (Don’t want “shape” vs “traditional” bc what is “traditional” depends on how old you are.)  Not unless you want to say single-strand specific or double-strand specific enzymes, but that will be in specificity.


Where does lead cleavage go? Chemical that does backbone cleavage.


Is there a difference between activity and specificity, or are they the same thing?


Is there any group of enzymes we’d want to group?  No.  Attributes of enzymes you might want to record (what species it came from, gene ontology category, etc), but don’t want to create sub-categories for this.


I agree this is artificial—all this is artificial.  The goal is to produce artifacts that are useful.


Possible aspects of Specificity:

RNA/DNA

Single stranded/double stranded

Cleavage/mod

Backbone/base


Which base it hits

Which atom it hits which location (bond or atom)


If you know which atom, then you know whether it is backbone or base, so should be able to get a lot of this automatically by annotating which atom it affects and then applying the RNAO.


Ss vs ds is independent of other specificity groups?  No, bc if reagent modifies a nucleotide involved in a base-pair, and a particular base is paired, then reagent will modify ds


Cannot figure out ss vs ds automatically from base, so always need to specify that.

Always need to specify cleavage vs modification, bc what part of RNA it affects is independent of how it affects the RNA.


If we have which atom, then we get for free whether backbone or base and (if base) which base.


What about for enzymes that cleave a bond rather than affect a particular atom?  Think of shape as modifying 2’ O, but if you do that, you remove 2’ H …always making or breaking a bond—that’s chemistry.  But would be really annoying if recorded all as bonds.


Which atom represents the part of the molecule that is still unchanged after you do the work—but then you have to know what the convention is.  Is easier just to read that “this bond is cleaved” or “this atom is modified”.  But we can modify an atom and use that for cleavage (like using DMS for cleavage).


What about describing bond-breaking as affecting two atoms?  Hydroxyl radical cleaves a phosphodiester bond (on either side of the phosphorus).


Need to record DNA vs RNA specifically (can’t get from cleavage vs mod)


For hydroxyl radical, not ss or ds but solvent accessibility of backbone, which has nothing to do with ss vs ds


So the info we need to record about specificity (can reason the rest):

RNA/DNA [backbone type]

Ss/ds/solvent accessibility [Structural restraint]

Cleavage/modification [snrnasm activity]

Which location (record the atom or the bond) [location]


Are all these independent of each other?  As far as we know.


What about recording if value is a comparative or absolute value?  Is that related to the assay, or to a particular experimental purpose?  Latter, rather than something intrinsic to, say, DMS.


To summarize:

SNRNASM
Method:

Chem./enz


Activity:

Cleavage/mod


Backbone type:

RNA/DNA/both


Location:

Specific atom or specific bond


Structural restraint:

Ss/ds/solvent access


What about range of experimental conditions under which you can do the experiment (in vivo vs in vitro, ph,etc)?

What about size of the molecule?  Some probes may not work on a small rna or a big rna … but this is actually more a property of the readout than the assay.

What about dynamic range?  Can you expect to get 5 bins of info, or 3 bins? Whether readout is binary or continuous?  All these are intrinsically continuous, it’s whether people use them to make judgment calls.


NOW WE HAVE DONE THE FIRST OF OUR DELIVERABLES: Classification of types of structure probes.


--break--

Next step: break up for individual work

Alain: fill in modified assay template

Dave: start drafting best practices guidelines

Rob and Amanda: draft guidance on which terms go where, have preliminary discussion of minimum information requirements

Jesse and Lauren: look into set-up and usage of isacreator, etc