Tasks

The overall goal of this contest is to produce visualizations or visualization tools that would help biologists develop hypotheses about what features of the RNA molecules described by the "observed" sequences are important to their function, and what features the non-observed sequences might lack, that would explain their absence from the human population. To get you started, we have some suggestions for things to do, and questions that biologists could use answered:

  • Compare and contrast the specific 2D structural elements predicted for the observed sequences, and for the non-observed sequences. Maybe the key to understanding is some structural element that's conserved in all of the observed sequences and missing in the non-observed sequences.
  • Compare and contrast the 2D structure ensemble predicted for each sequence. Molecules move and undergo slight rearrangements constantly, and even rearragements to quite distant conformations occur over time. The closer the energy of all of the structures in a predicted ensemble, the flatter the distribution of a population of sequentially-identical molecules will be, between the different structural states. Maybe the important feature isn't the existence or absence of a particular structural feature, maybe it's how often that feature is present or absent in the ensemble of possible structures.
  • Consider trying to visualize these properties for the 3D structural predictions for the RNAs.
  • Consider the properties of the DNA. There's a huge assumption that the function is occurring in an RNA structure, and there's no concerete evidence that this is true. Different DNA sequences induce different intrinsic bends to the DNA, and these bends can change the presentation of important functional elements like regulatory sequences or splice sites. Maybe the functional difference has to do with how the different combinations of variation in the intron, work in concert to produce the same, or different bend angles. (We'll help you out with data about intrinsic bends)
  • Also think about DNA melting. Just like different sequences produce functional effects by bending the DNA, different sequences can make the DNA easier or harder to "melt" (separating the double-stranded DNA into single strands, so that the transcription machinery can make RNA, is called melting). Maybe the different combinations of variation produce similar, or dissmilar melting profiles across the affected intron region.
  • Finally, use your system/approach to see if you can guess whether the unknown sequences are sequences that actually exist, or are part of the non-observed population.

This task list is neither necessary nor sufficient - if you have brilliant ideas about how to help biologists think about the similar and dissimilar features of these molecules in some other fashion, we're sure whatever you're thinking will make a great entry. If you have a brilliant idea about solving just one piece of the puzzle but can't tackle the whole thing, that's fine too - everything from purely theoretical design studies, to well-considered visual prototypes for specific features of the data, to complete tools ready to tackle it all, are all welcome.