Understanding Visualization Authoring for Genomics Data through User Interviews
Moderator(s): Zeynep Gümüş
- Astrid van den Brandt, Eindhoven University of Technology, Netherlands
- Sehi L'Yi, Harvard Medical School, United States
- Huyen N. Nguyen, Harvard Medical School, United States
- Etowah Adams, Harvard Medical School, United States
- Nils Gehlenborg, Harvard Medical School, United States
Presentation Overview: Show
Genomics experts use data visualization to extract and share insights from complex and large-scale
data sets. The complexity of genomics data analysis, the diversity of users, and the scope of
questions necessitate tools that support creating customized visualizations beyond off-the-shelf
visualizations for exploratory analysis. Despite a good understanding of visualization use for
genomics data exploration, research into these experts' specific authoring needs is limited.
Previous work in visualization research has explored various interactive authoring techniques—such
as template editing, shelf configuration, natural language input, and code editors—assessing their
usefulness and the trade-offs between expressiveness and learnability, primarily for statistical
visualizations. However, how genomics experts currently author visualizations and which techniques
best meet their needs remains unclear. To bridge this gap in understanding, we conducted two
connected user studies involving 33 genomics experts. The first consisted of semi-structured
interviews (n=20) to understand current visualization authoring practices, followed by exploratory
sessions with visual design probes (n=13) to gain insights about user intents and desired techniques
beyond the currently supported. We identified key tasks, user characteristics, and current
techniques used in tools, and discovered five personas to represent and discuss author diversity. By
integrating insights from interviews and exploratory sessions, we further pinpointed task- and
persona-specific patterns and used these to discuss design implications for genomics visualization
tools. Our findings highlight the need for visualization tools supporting multiple authoring
techniques to enhance both learnability and expressiveness, catering to the varied needs of users
aiming to create highly customized genomics data visualizations.
New BioCyc Visualization Tools for Genome Exploration and Comparison
Moderator(s): Zeynep Gümüş
- Suzanne Paley, SRI International, United States
- Markus Krummenacker, SRI International, United States
- James Herson, SRI International, United States
- Peter Karp, SRI International, United States
Presentation Overview: Show
We introduce two new BioCyc visualization tools for exploring genomes, with the goal of accelerating
science by helping users to rapidly navigate and understand complex biological systems.
The Genome Explorer is an information-dense genome browser with high-speed semantic zooming. As
magnification level increases, features such as promoters, transcription-factor binding sites, and
nucleotide and amino-acid sequences become visible. By displaying such features inline, positional
relationships are easy to discern visually, and the display can wrap across multiple lines to
present a longer genome region in a single screen, providing greater genomic context. A comparative
mode displays chromosomes from multiple organisms aligned at orthologous genes. The Genome Explorer
includes the capability to select arbitrary sequence regions for export, and provides a tracks
mechanism for depicting large-scale datasets against the genome.
The Comparative Genome Dashboard summarizes and compares predicted functional capabilities across a
selected set of organisms, utilizing a hierarchical framework of cellular function based on selected
concepts from MetaCyc and GO. Users begin with a one-screen overview consisting of a set of panels
for high-level functions such as Biosynthesis, Transport or Central Dogma, with individual plots for
subsystems such as Carbohydrate Biosynthesis or DNA Metabolism. Each plot graphs either the numbers
of relevant compounds predicted to be synthesized/degraded/transported based on pathway prediction
algorithms or the numbers of genes annotated to that subsystem. Users can click on any plot for more
detailed comparisons of specific synthesized/degraded/transported compounds or GO annotations, and
thereby rapidly explore specific subsystems of interest.
Matreex: compact and interactive visualisation of large gene families
Moderator(s): Zeynep Gümüş
- Victor Rossier, Department of Computational Biology and Department of Ecology and Evolution,
University of Lausanne, Switzerland
- Clément Marie Train, Université de Lausanne, Switzerland
- Yannis Nevers, Université de Lausanne, Switzerland
- Marc Robinson-Rechavi, Université de Lausanne, Switzerland
- Christophe Dessimoz, Université de Lausanne, Switzerland
Presentation Overview: Show
Studying gene family evolution strongly benefits from insightful visualisations. However, the
ever-growing number of sequenced genomes is leading to increasingly larger gene families, which
challenges existing gene tree visualisations. Indeed, most of them present users with a dilemma: to
display complete but intractable gene trees, or collapse subtrees, thereby hiding information.
We developed Matreex, a new dynamic tool to scale-up the visualisation of gene families. Matreex’s
key novelty is the use of “phylogenetic” profiles, i.e. condensed representation of gene
repertoires, to minimize information losses when collapsing branches of the gene tree. Moreover, the
gene tree is paired with a complementary interactive species tree which facilitates manipulation and
exploration of large, heavily duplicated gene families.
Matreex is user-friendly and provides an appealing and intuitive display which makes it ideal for
creating easily interpretable figures for teaching and outreach. It can also be used to automate
large-scale analyses of presence-absence of multiple gene families. For example, we used it to
simultaneously display 22 gene families involved in intraflagellar transport across 622 species,
cumulating 5,500 genes in a compact way. Doing this, we were able to report for the first time
complete loss of the intraflagellar transport machinery in the myxozoan Thelohanellus kitaue.
Matreex is available on the Python Package Index (pip install matreex), and source code and
documentation are available at https://github.com/DessimozLab/matreex.
Interactive visualisation of raw nanopore signal data with Squigualiser
Moderator(s): Zeynep Gümüş
- Hiruna Samarakoon, University of New South Wales, Sydney; Garvan
Institute of Medical Research, Sydney, Australia
- Kisaru Liyanage, University of New South Wales, Sydney; Garvan Institute of Medical Research,
Sydney, Australia
- James M. Ferguson, Garvan Institute of Medical Research, Sydney, Australia
- Sri Parameswaran, University of Sydney, Sydney, Australia
- Hasindu Gamaarachchi, University of New South Wales, Sydney; Garvan Institute of Medical
Research, Sydney, Australia
- Ira W. Deveson, Garvan Institute of Medical Research, Sydney, Australia
Presentation Overview: Show
Nanopore sequencing measures ionic current during the translocation of DNA, RNA or protein molecules
through a nanoscale protein pore. This raw current signal data can be ‘basecalled’ into sequence
information and has the potential to identify other diverse molecular features, such as base
modifications, secondary structures, etc. Despite the unique properties and potential utility of
nanopore signal data, there are currently limited options available for signal data visualisation.
To address this, we have developed Squigualiser, a toolkit for intuitive, interactive visualisation
of sequence-aligned signal data, which currently supports both DNA and RNA sequencing data from
Oxford Nanopore Technologies (ONT) instruments. A series of methodological innovations enable
efficient alignment of raw signal data to a reference genome/transcriptome with single-base
resolution. Squigualiser generates an interactive signal browser view (HTML file), in which the user
can navigate across a genome/transcriptome region and customise the display. Multiple independent
reads are integrated into a signal ‘pileup’ format and different datasets can be displayed as
parallel tracks to facilitate their comparison. Squigualiser provides the most sophisticated
framework for nanopore signal data visualisation to date and will catalyse new advances in signal
analysis. We provide Squigualiser as an open-source tool for the nanopore community:
https://github.com/hiruna72/squigualiser
Aggregate Annotated Single-Cell Heatmap Visualizations
Moderator(s): Zeynep Gümüş
- Devin Lange, University of Utah, United States
- Greg Finak, Ozette Technologies, United States
- Evan Greene, Ozette Technologies, United States
- Fritz Lekschas, Ozette Technologies, United States
Presentation Overview: Show
Heatmaps are commonly used in the biomedical field to visualize tabular data. In the single-cell
domain, heatmaps are frequently used to visualize cell counts of phenotypes (rows) across samples
(columns). Clinical applications may involve thousands or even tens of thousands of samples. With
modern clustering techniques, several hundred to thousands of phenotypes can be detected.
Additionally, each sample and phenotype can be associated with various metadata properties that are
necessary for sense-making. As datasets grow, increasing the number of rows and columns in the
heatmap can become overwhelming due to limitations in screen size and human perception. Fortunately,
there is often redundancy within phenotypes and samples. For instance, two phenotypes that only
differ by a single marker (e.g., a surface protein or gene) could be considered the same if that
marker is immaterial. Similarly, two samples may be so alike that they could be combined without a
noticeable difference in the emerging pattern. In other words, the data can often be aggregated. We
introduce a visualization technique that supports displaying annotated heatmaps at different levels
of detail. This framework provides support for constructing complex interactive collapsible heatmaps
while allowing enough customization to support different use cases.