BioVis@ISMB 2024 Program

July 14, 2024

Invited Speakers

Melanie Tory

Melanie Tory
Melanie Tory

Melanie Tory, The Roux Institute, Northeastern University

Speaker Biography: Melanie Tory is the director of human data interaction research at the Roux Institute. She is also a professor of the practice at Northeastern, with appointments in the Khoury College of Computer Sciences and the College of Arts, Media and Design. Tory’s research focuses on helping people and businesses to do more with data through the design and evaluation of novel visualization techniques and human-data interactions. This is exactly the type of expertise the Roux Institute brings to the Portland area, state of Maine, and the Northeast.

Prior to joining Northeastern and the Roux Institute, Tory worked at Tableau Software, an interactive data visualization software company that was acquired by Salesforce for over $15 billion. Her work at Tableau focused on enabling natural language interaction through visualizations, as well as on how people use analytics and business intelligence tools within organizations. Tory also served for nine years as a computer science faculty member at the University of Victoria.

She earned her PhD in computer science from Simon Fraser University and her Bachelor of Science from the University of British Columbia. She is the associate editor of several visualization journals and serves on the steering committee of the IEEE VIS Conference.

Tory lives in Portland’s East End. An outdoors enthusiast, she explores Maine’s many parks, trails, and islands.

Fritz Lekschas

Fritz Lekschas
Fritz Lekschas

Fritz Lekschas, Ozette Technologies

Speaker Biography: Fritz Lekschas is a computer scientist researching scalable visual exploration techniques for biomedical data. As the Head of Visualization Research at Ozette Technologies, he is leading the development of ML-powered data visualization and exploration tools for analyzing high-dimensional single-cell data on the web. Fritz earned his PhD in computer science from Harvard University, where he was advised by Hanspeter Pfister and Nils Gehlenborg. Prior to his PhD, Fritz was a visiting postgrad research fellow in the Department of Biomedical Informatics at Harvard Medical School and obtained a Bachelor and Master degree in bioinformatics from the Freie Universität Berlin. Fritz has published more than twenty peer-reviewed papers in renowned biomedical journals and computer science conferences, and his work has been recognized with several prestigious awards.

Program

All times listed are in EDT.

10:40-10:45
Opening
  • Jan Byška
  • Qianwen Wang
10:45-11:40
Invited Presentation: When Visualization Meets AI: Exploring Opportunities
Moderator(s): Jan Byška
  • Melanie Tory, The Roux Institute, Northeastern University

Presentation Overview: Hide

Visualization and AI are increasingly intertwined. AI is being employed to enhance visualization design and to simplify interaction with data. Meanwhile, visualization can help people interpret and debug AI models, whether they are AI modelers or end users. Yet opportunities at the intersection of AI and visualization are in their infancy. This talk will explore the opportunity space in two directions: AI-for-VIS and VIS-for-AI, drawing on existing examples including my own research on natural language interaction, visualization recommendations, and medical applications.

11:40-12:00
PRIMAVO: Precision Immune Monitoring Assay Visualization Online
Moderator(s): Jan Byška
  • Osho Rawal, Icahn School Of Medicine At Mount Sinai, United States
  • Edgar Gonzalez-Kozlova, Icahn School Of Medicine At Mount Sinai, United States
  • Sacha Gnjatic, Icahn School Of Medicine At Mount Sinai, United States
  • Zeynep H. Gümüş, Icahn School Of Medicine At Mount Sinai, United States

Presentation Overview: Show

Cancer immunotherapies are revolutionizing clinical practice, yet only a fraction of patients derive clinical benefit, and some experience adverse events. To understand immune markers that impact clinical care, comprehensive data analyses across assay types and patient cohorts are being performed by NCI-supported Cancer Immune Monitoring and Analysis Centers (CIMACs) and Cancer Immunology Data Commons (CIDC). These represent 30+ cancer immunotherapy trials with longitudinal correlative data assayed using harmonized technology platforms. However, there is an unmet need for visual exploration of multi-scale multi-omic datasets. We introduce PRIMAVO, a unified platform that empowers users to visually explore, query, and subset the results of large-scale immune monitoring multi-omics datasets, spanning transcriptomics, proteomics, genomics, metagenomics, and multiplex immunohistochemistry (mIHC). Users can select specific clinical trials and patient subgroups based on key criteria including demographics, tumor, treatment, response and assay characteristics by leveraging interactive bar plots, pie charts, and scatter plots. For user-specified subgroups, PRIMAVO offers tailored visualizations of their multi-assay datasets: CyTOF, Olink, serology and RNA-seq data are represented within interactive heatmaps with on-the-fly filtering, querying, sorting and clustering. From mIHC imaging data, user-selected cell subsets and multiple markers can be explored simultaneously. Interactive oncoprints visually summarize genomic mutations. Developed in direct collaboration with CIMAC-CIDC scientists and utilizing React, TypeScript, Python, and Django frameworks, PRIMAVO provides domain-specific, downloadable, resizable, zoomable, and scrollable charts and figures. Overall, PRIMAVO lowers barriers between complex immune monitoring data in immunotherapy trials and researchers by providing intuitive, fast, and high-quality access to longitudinal multi-scale multi-omics datasets, facilitating research capabilities.

12:00-12:20
The Best of Both Worlds: Blending Mixed Reality and 2D displays in an Hybrid Approach for Visual analysis of 3D Tissue Maps
Moderator(s): Jan Byška
  • Eric Mörth, Harvard Medical School, United States
  • Cydney Nielsen, Independent Consultant, Canada
  • Morgan Turner, Harvard Medical School, United States
  • Liu Xianhao, Harvard University, United States
  • Mark S. Keller, Harvard Medical School, United States
  • Johanna Beyer, Harvard University, United States
  • Hanspeter Pfister, Harvard University, United States
  • Zhu-Tian Chen, Harvard University and University of Minnesota, United States
  • Nils Gehlenborg, Harvard Medical School, United States

Presentation Overview: Show

We introduce a novel hybrid system that combines a Mixed Reality (MR) stereoscopic view of 3D spatial data with linked views on a conventional 2D display. Our system addresses visualization challenges in spatial biology, focused on examining molecular components within a tissue’s native spatial context, resulting in complex and high-resolution 3D tissue maps. Given the diversity of techniques generating these maps, creating practical visualization tools is crucial. To tackle this, we collaborated closely with experts in spatial biology throughout the design process, conducting iterative development and testing through three case studies. The case studies include 1) single-cell cyclic immunofluorescence (CyCIF) imaging to investigate early melanoma development; 2) lightsheet microscopy of kidney tissue to understand the function of glomeruli, showcasing the insights gained into spatial relationships within kidney structures; and 3) multiplexed immunofluorescence imaging (MxIF) was used to study various structures in kidney tissue, highlighting the benefits of the hybrid approach in controlling channel information and conducting distance measurements. In extending the web-based Vitessce (http://vitessce.io) framework for single-cell analysis, a tool already familiar to domain experts, with a WebXR spatial view, we ensured usability and integration with existing workflows. A qualitative evaluation of our prototype demonstrated widespread recognition of the hybrid system's value, even among those initially skeptical of MR technology. Insights gathered from user feedback sessions strongly advocate for combining direct hand interaction in MR with traditional mouse input on a 2D display, highlighting the effectiveness of this approach in enhancing user experience and interaction with complex spatial data.

14:20-15:20
Understanding Visualization Authoring for Genomics Data through User Interviews
Moderator(s): Zeynep Gümüş
  • Astrid van den Brandt, Eindhoven University of Technology, Netherlands
  • Sehi L'Yi, Harvard Medical School, United States
  • Huyen N. Nguyen, Harvard Medical School, United States
  • Etowah Adams, Harvard Medical School, United States
  • Nils Gehlenborg, Harvard Medical School, United States

Presentation Overview: Show

Genomics experts use data visualization to extract and share insights from complex and large-scale data sets. The complexity of genomics data analysis, the diversity of users, and the scope of questions necessitate tools that support creating customized visualizations beyond off-the-shelf visualizations for exploratory analysis. Despite a good understanding of visualization use for genomics data exploration, research into these experts' specific authoring needs is limited. Previous work in visualization research has explored various interactive authoring techniques—such as template editing, shelf configuration, natural language input, and code editors—assessing their usefulness and the trade-offs between expressiveness and learnability, primarily for statistical visualizations. However, how genomics experts currently author visualizations and which techniques best meet their needs remains unclear. To bridge this gap in understanding, we conducted two connected user studies involving 33 genomics experts. The first consisted of semi-structured interviews (n=20) to understand current visualization authoring practices, followed by exploratory sessions with visual design probes (n=13) to gain insights about user intents and desired techniques beyond the currently supported. We identified key tasks, user characteristics, and current techniques used in tools, and discovered five personas to represent and discuss author diversity. By integrating insights from interviews and exploratory sessions, we further pinpointed task- and persona-specific patterns and used these to discuss design implications for genomics visualization tools. Our findings highlight the need for visualization tools supporting multiple authoring techniques to enhance both learnability and expressiveness, catering to the varied needs of users aiming to create highly customized genomics data visualizations.

New BioCyc Visualization Tools for Genome Exploration and Comparison
Moderator(s): Zeynep Gümüş
  • Suzanne Paley, SRI International, United States
  • Markus Krummenacker, SRI International, United States
  • James Herson, SRI International, United States
  • Peter Karp, SRI International, United States

Presentation Overview: Show

We introduce two new BioCyc visualization tools for exploring genomes, with the goal of accelerating science by helping users to rapidly navigate and understand complex biological systems. The Genome Explorer is an information-dense genome browser with high-speed semantic zooming. As magnification level increases, features such as promoters, transcription-factor binding sites, and nucleotide and amino-acid sequences become visible. By displaying such features inline, positional relationships are easy to discern visually, and the display can wrap across multiple lines to present a longer genome region in a single screen, providing greater genomic context. A comparative mode displays chromosomes from multiple organisms aligned at orthologous genes. The Genome Explorer includes the capability to select arbitrary sequence regions for export, and provides a tracks mechanism for depicting large-scale datasets against the genome. The Comparative Genome Dashboard summarizes and compares predicted functional capabilities across a selected set of organisms, utilizing a hierarchical framework of cellular function based on selected concepts from MetaCyc and GO. Users begin with a one-screen overview consisting of a set of panels for high-level functions such as Biosynthesis, Transport or Central Dogma, with individual plots for subsystems such as Carbohydrate Biosynthesis or DNA Metabolism. Each plot graphs either the numbers of relevant compounds predicted to be synthesized/degraded/transported based on pathway prediction algorithms or the numbers of genes annotated to that subsystem. Users can click on any plot for more detailed comparisons of specific synthesized/degraded/transported compounds or GO annotations, and thereby rapidly explore specific subsystems of interest.

Matreex: compact and interactive visualisation of large gene families
Moderator(s): Zeynep Gümüş
  • Victor Rossier, Department of Computational Biology and Department of Ecology and Evolution, University of Lausanne, Switzerland
  • Clément Marie Train, Université de Lausanne, Switzerland
  • Yannis Nevers, Université de Lausanne, Switzerland
  • Marc Robinson-Rechavi, Université de Lausanne, Switzerland
  • Christophe Dessimoz, Université de Lausanne, Switzerland

Presentation Overview: Show

Studying gene family evolution strongly benefits from insightful visualisations. However, the ever-growing number of sequenced genomes is leading to increasingly larger gene families, which challenges existing gene tree visualisations. Indeed, most of them present users with a dilemma: to display complete but intractable gene trees, or collapse subtrees, thereby hiding information. We developed Matreex, a new dynamic tool to scale-up the visualisation of gene families. Matreex’s key novelty is the use of “phylogenetic” profiles, i.e. condensed representation of gene repertoires, to minimize information losses when collapsing branches of the gene tree. Moreover, the gene tree is paired with a complementary interactive species tree which facilitates manipulation and exploration of large, heavily duplicated gene families. Matreex is user-friendly and provides an appealing and intuitive display which makes it ideal for creating easily interpretable figures for teaching and outreach. It can also be used to automate large-scale analyses of presence-absence of multiple gene families. For example, we used it to simultaneously display 22 gene families involved in intraflagellar transport across 622 species, cumulating 5,500 genes in a compact way. Doing this, we were able to report for the first time complete loss of the intraflagellar transport machinery in the myxozoan Thelohanellus kitaue. Matreex is available on the Python Package Index (pip install matreex), and source code and documentation are available at https://github.com/DessimozLab/matreex.

Interactive visualisation of raw nanopore signal data with Squigualiser
Moderator(s): Zeynep Gümüş
  • Hiruna Samarakoon, University of New South Wales, Sydney; Garvan Institute of Medical Research, Sydney, Australia
  • Kisaru Liyanage, University of New South Wales, Sydney; Garvan Institute of Medical Research, Sydney, Australia
  • James M. Ferguson, Garvan Institute of Medical Research, Sydney, Australia
  • Sri Parameswaran, University of Sydney, Sydney, Australia
  • Hasindu Gamaarachchi, University of New South Wales, Sydney; Garvan Institute of Medical Research, Sydney, Australia
  • Ira W. Deveson, Garvan Institute of Medical Research, Sydney, Australia

Presentation Overview: Show

Nanopore sequencing measures ionic current during the translocation of DNA, RNA or protein molecules through a nanoscale protein pore. This raw current signal data can be ‘basecalled’ into sequence information and has the potential to identify other diverse molecular features, such as base modifications, secondary structures, etc. Despite the unique properties and potential utility of nanopore signal data, there are currently limited options available for signal data visualisation. To address this, we have developed Squigualiser, a toolkit for intuitive, interactive visualisation of sequence-aligned signal data, which currently supports both DNA and RNA sequencing data from Oxford Nanopore Technologies (ONT) instruments. A series of methodological innovations enable efficient alignment of raw signal data to a reference genome/transcriptome with single-base resolution. Squigualiser generates an interactive signal browser view (HTML file), in which the user can navigate across a genome/transcriptome region and customise the display. Multiple independent reads are integrated into a signal ‘pileup’ format and different datasets can be displayed as parallel tracks to facilitate their comparison. Squigualiser provides the most sophisticated framework for nanopore signal data visualisation to date and will catalyse new advances in signal analysis. We provide Squigualiser as an open-source tool for the nanopore community: https://github.com/hiruna72/squigualiser

Aggregate Annotated Single-Cell Heatmap Visualizations
Moderator(s): Zeynep Gümüş
  • Devin Lange, University of Utah, United States
  • Greg Finak, Ozette Technologies, United States
  • Evan Greene, Ozette Technologies, United States
  • Fritz Lekschas, Ozette Technologies, United States

Presentation Overview: Show

Heatmaps are commonly used in the biomedical field to visualize tabular data. In the single-cell domain, heatmaps are frequently used to visualize cell counts of phenotypes (rows) across samples (columns). Clinical applications may involve thousands or even tens of thousands of samples. With modern clustering techniques, several hundred to thousands of phenotypes can be detected. Additionally, each sample and phenotype can be associated with various metadata properties that are necessary for sense-making. As datasets grow, increasing the number of rows and columns in the heatmap can become overwhelming due to limitations in screen size and human perception. Fortunately, there is often redundancy within phenotypes and samples. For instance, two phenotypes that only differ by a single marker (e.g., a surface protein or gene) could be considered the same if that marker is immaterial. Similarly, two samples may be so alike that they could be combined without a noticeable difference in the emerging pattern. In other words, the data can often be aggregated. We introduce a visualization technique that supports displaying annotated heatmaps at different levels of detail. This framework provides support for constructing complex interactive collapsible heatmaps while allowing enough customization to support different use cases.

15:20-15:40
Aardvark: Composite Visualizations of Trees, Time-Series, and Images
Moderator(s): Zeynep Gümüş
  • Devin Lange, University of Utah, United States
  • Robert Judson-Torres, University of Utah, United States
  • Thomas A. Zangle, University of Utah, United States
  • Alexander Lex, University of Utah, United States

Presentation Overview: Show

How do cancer cells grow, divide, proliferate and die? How do drugs influence these processes? These are difficult questions that we can attempt to answer with a combination of time-series microscopy experiments, classification algorithms, and data visualization. However, collecting this type of data and applying algorithms to segment and track cells and construct lineages of proliferation is error-prone; and identifying the errors can be challenging since it often requires cross-checking multiple data types. Similarly, analyzing and communicating the results necessitates synthesizing different data types into a single narrative. State-of-the-art visualization methods for such data use independent line charts, tree diagrams, and images in separate views. However, this spatial separation requires the viewer of these charts to combine the relevant pieces of data in memory. To simplify this challenging task, we describe design principles for weaving cell images, time-series data, and tree data into a cohesive visualization. Our design principles are based on choosing a primary data type that drives the layout and integrates the other data types into that layout. We then introduce Aardvark, a system that uses these principles to implement novel visualization techniques. Based on Aardvark, we demonstrate the utility of each of these approaches for discovery, communication, and data debugging in a series of case studies.

15:40-16:00
Boosting Data Interpretation with GIBOOST to Enhance Visualization of High-Dimensional Data
Moderator(s): Zeynep Gümüş
  • Komlan Atitey, National Institutes Health (NIH), United States
  • Benedict Anchang, National Institutes Health (NIH), United States

Presentation Overview: Show

Effective visualization of current biomedical data from high-throughput technologies is essential for proper interpretability of complex biological processes but challenging due to its high-dimensionality, increasing volume and underlying biological complexity. This requires advanced dimensionality reduction methods (DRM) like t-SNE, UMAP, PHATE for optimal data reduction. However, different DRM may produce varied outputs, hindering consistent interpretation as well as making benchmarking challenging. Recently, we published MIBCOVIS, a Bayesian framework integrating five robust metrics with different features to enhance visualization and interpretability of high-dimensional data without relying on ground truth. We demonstrated that each visualization tool uniquely optimizes specific features and no tool was able to optimize all features jointly. Leveraging on this observation, we propose GIBOOST, a new visualization tool aimed at synergizing information from disparate sources for optimal data reduction and visualization. Given high-dimensional data, GIBOOST uses an optimized integrative autoencoder to integrate and select the two best data reduction methods from a pool of methods with the maximum additive clustering sensitivity effect condition on other visualization features. We apply GIBOOST to enhance the clustering separability of four distinct dynamic biological processes; EMT, Spermatogenesis, induced stem-cell pluripotency and placenta development. From all datasets, on average GIBOOST improved clustering sensitivity by 76% compared to the individual methods.

16:40-17:00
Proceedings Presentation: Unveil Cis-acting Combinatorial mRNA Motifs by Interpreting Deep Neural Network
Moderator(s): Qianwen Wang
  • Xiaocheng Zeng, Dept. of Automation, Tsinghua University, China
  • Zheng Wei, Tsinghua Univ., China
  • Qixiu Du, Department of Automation, Tsinghua University, China
  • Jiaqi Li, Tsinghua University, China
  • Zhen Xie, Tsinghua University, China
  • Xiaowo Wang, Tsinghua University, China

Presentation Overview: Show

Cis-acting mRNA elements play a key role in the regulation of mRNA stability and translation efficiency. Revealing the interactions of these elements and their impact plays a crucial role in understanding the regulation of the mRNA translation process, which supports the development of mRNA-based medicine or vaccines. Deep neural networks (DNN) can learn complex cis-regulatory codes from RNA sequences. However, extracting these cis-regulatory codes efficiently from DNN remains a significant challenge. Here we propose a method based on our toolkit NeuronMotif and motif mutagenesis, which not only enables the discovery of diverse and high-quality motifs but also efficiently reveals motif interactions. By interpreting deep-learning models, we have discovered several crucial motifs that impact mRNA translation efficiency and stability, as well as some unknown motifs or motif syntax, offering novel insights for biologists. Furthermore, we note that it is challenging to enrich motif syntax in datasets composed of randomly generated sequences, and they may not contain sufficient biological signals.

17:00-17:50
Invited Presentation: The Insight's in the Details: Challenges and Opportunities for BioVis Software Tools
Moderator(s): Qianwen Wang
  • Fritz Lekschas, Ozette Technologies

Presentation Overview: Hide

Biological data visualizations often deal with datasets resulting from complex analytical workflows or large-scale experiments. These aspects complicate visual exploration, as insights are frequently found in the details. To surface these insights effectively, BioVis software tools must address several key challenges: integrating closely with computation and data, being composable, scaling to the task, and offering bi-directional interactions for AI/ML guidance.

Fortunately, software best-practices and new frameworks make it easier than ever to overcome these challenges. In this talk, Fritz Lekschas will discuss these practices and frameworks using examples from his research in genomics and single-cell biology and the broader BioVis community.

17:50-18:00
Award Ceremony and Closing
  • Jan Byška
  • Qianwen Wang