BioVis@ISMB 2025 Program

July 24, 2025

Invited Speakers

The Visual Genome: An attempt to classify multi-omics visualization

Kay Nieselt, University of Tübingen, Germany

Abstract: Over the past decades, advances in biology and medicine—driven by high-throughput and high-resolution experimental methods—have underscored the critical role of visualization in interpreting and communicating complex biological data. The interplay between life sciences and the visualization domain has revealed a deep and natural synergy, where visual analytics has become indispensable for discovery and insight.

In this talk, I will reflect on nearly 30 years of experience in developing visual analytics solutions for large-scale biological data, with a particular focus on multi-omics visualization. I will present a conceptual framework for classifying multi-omics visualizations and illustrate it through selected examples from tools developed by my research group. These range from genome-level visualizations to tools for exploring quantitative omics and epiproteomics data.

I will also introduce TueVis, a web-based resource developed and maintained by my group, offering interactive, user-friendly visualization tools spanning multiple omics layers. Designed for researchers in bioinformatics and the life sciences, TueVis aims to lower the barrier to high-quality data exploration and interpretation. The talk will conclude with a perspective on emerging challenges and opportunities in the evolving field of multi-omics visualization.

Speaker Bio: Kay Nieselt is a Professor of Bioinformatics at the University of Tübingen, where she leads the research group Integrative Transcriptomics. She earned her Ph.D. in Mathematics from the University of Bielefeld, Germany. During her doctoral work on modeling virus evolution, she began developing visual analytics methods for large-scale biological data—an area that would become a central theme of her research.

Her work spans a broad range of bioinformatics domains, including integrative analysis of genomics (with a focus on paleogenomics), transcriptomics, and other omics data types. She is particularly recognized for her contributions to the visualization of large-scale biological datasets and the application and development of machine learning methods for omics data interpretation. In 2012, her group was awarded the Illumina iDEA Challenge Award for the most creative algorithm handling large-scale next-generation sequencing data. Over the years, her team has developed numerous visual analytics tools tailored to multi-omics analysis, with a consistent emphasis on creating innovative yet user-friendly visualizations. These tools support diverse applications such as large-scale gene expression profiling, multiple genome alignments, pan-genome exploration, and integrative multi-omics data analysis.

Kay Nieselt has been actively involved in the BioVis community since its inception in 2011, serving on both the program and steering committees. She chaired the BioVis Special Interest Group (SIG) at ISMB in 2014 and 2015 and subsequently served as the spokesperson for the BioVis COSI.

Visual Data Analysis Research in Biomedical Applications: Navigating the Line Between Scientific Novelty and Practical Impact

Ingrid Hotz, Linköping University, Sweden

Abstract: Visualization has a long-standing tradition in biomedical research, yet its potential as a tool for data exploration and analytical reasoning remains underused. In this talk, I will share results and experiences from recent interdisciplinary collaborations in this area, including projects on molecular dynamics, electronic structure modeling, and hypothesis generation in medicine. In addition to presenting results, I will reflect on the challenges of working across domains, the sometimes slow but often rewarding process of building trust, and the tension between scientific innovation in both fields and real-world applicability. These reflections also raise broader questions about research sustainability: When is a project complete, and when is it time to move on

Speaker bio: Ingrid Hotz is a professor of scientific visualization at Linköping University in Sweden. She received her M.S. degree in theoretical physics from Ludwig Maximilian University in Munich, Germany, and her Ph.D. in computer science from TU Kaiserslautern, Germany. After a postdoctoral position at the Institute for Data Analysis and Visualization (IDAV) at the University of California, she started an Emmy Noether research group at the Zuse Institute in Berlin. She then served for several years as the head of the scientific visualization group at the German Aerospace Center (DLR). The main focus of her research lies in the area of data analysis and scientific visualization, encompassing both fundamental research questions and practical solutions to visualization challenges in applications including physics, chemistry and medical imaging, and mechanical engineering—from small- to large-scale simulations. Her work draws on ideas and methods from various fields within computer science and mathematics, including computer graphics, computer vision, dynamical systems, computational geometry, and combinatorial topology.

Program

All times listed are in BST.

8:40-8:45

Opening

Qianwen Wang, Zeynep Gumus

8:45-9:40

Invited Presentation: The Visual Genome: An attempt to classify multi-omics visualization

Kay Nieselt

Presentation Overview: Show

9:40-10:00

GENET: AI-Powered Interactive Visualization Workflows to Explore Biomedical Entity Networks

Bum Chul Kwon, IBM Research, United States
Natasha Mulligan, IBM Research, Ireland
Joao Bettencourt-Silva, IBM Research, Ireland
Ta-Hsin Li, IBM Research, United States
Bharath Dandala, IBM Research, United States
Feng Lin, Cleveland Clinic, United States
Pablo Meyer, IBM Research, United States
Ching-Huei Tsou, IBM Research, United States

Presentation Overview: Show

Formulating experimental hypotheses that test the association between SNPs and diseases involves logical reasoning derived from prior observations, followed by the labor-intensive process of collecting and analyzing relevant literature to test the scientific plausibility and viability. AI models trained with previous association data (e.g., GWAS Catalog) can help infer potential associations between SNPs and diseases, but scientists still need to manually collect and inspect the evidence for such predictions from prior literature. To alleviate this burden, we introduce an AI-enhanced, end-to-end visual analytics workflow called GENET, which aims to help scientists discover the SNP-Target associations, collect evidence from scientific literature, extract knowledge as biomedical entity networks, and interactively explore them using visualizations. The workflow consists of the following four steps, where each step’s output serves as the input for the next step: 1) biomedical network analysis: identify interesting genes/SNPs that are associated with a target disease through indirectly connected genes/SNPs using a neural network; 2) literature evidence mining pipeline: collect relevant literature on the target diseases or the infered genes/SNPs, and extract biomedical entities and their relations from the collection using large language models; 3) clustering: cluster the extracted entities and relations by generating the embeddings using pre-trained biomedical language models (e.g., BioBERT, BioLinkBERT); 4) interactive visualizations: visualize the clusters of biomedical entities and their networks and provide interactive handles for exploration. The workflow enables users to iteratively formulate and test hypotheses involving SNPs/genes and diseases against evidence from scientific literature and databases and gain novel insights.

11:20-11:40

Prostruc: an open-source tool for 3D structure prediction using homology modeling

Shivani Pawar, Department of Biotechnology and Bioinformatics, Deogiri College, Auranagabad, Maharashtra, India
Wilson Sena Kwaku Banini, Department of Theoretical and Applied Biology, Kwame Nkrumah University of Science and Technology, Ghana
Musa Muhammad Shamsuddeen, Department of Public Health, Faculty of Health Sciences, National Open University of Nigeria, Abuja, Nigeria
Toheeb A Jumah, School of Collective Intelligence, University Mohammed VI Polytechnic, Rabat, Morocco
Nigel N O Dolling, Department of Parasitology, Noguchi Memorial Institute for Medical Research, University of Ghana, Accra, Ghana
Abdulwasiu Tiamiyu, School of Collective Intelligence, University Mohammed VI Polytechnic, Rabat, Morocco
Olaitan I. Awe, African Society for Bioinformatics and Computational Biology, Cape Town, South Africa

Presentation Overview: Show

Homology modeling is a widely used computational technique for predicting the three-dimensional (3D) structures of proteins based on known templates,evolutionary relationships to provide structural insights critical for understanding protein function, interactions, and potential therapeutic targets. However, existing tools often require significant expertise and computational resources, presenting a barrier for many researchers.

Prostruc is a Python-based homology modeling tool designed to simplify protein structure prediction through an intuitive, automated pipeline. Integrating Biopython for sequence alignment, BLAST for template identification, and ProMod3 for structure generation, Prostruc streamlines complex workflows into a user-friendly interface. The tool enables researchers to input protein sequences, identify homologous templates from databases such as the Protein Data Bank (PDB), and generate high-quality 3D structures with minimal computational expertise. Prostruc implements a two-stage vSquarealidation process: first, it uses TM-align for structural comparison, assessing Root Mean Deviations (RMSD) and TM scores against reference models. Second, it evaluates model quality via QMEANDisCo to ensure high accuracy.

The top five models are selected based on these metrics and provided to the user. Prostruc stands out by offering scalability, flexibility, and ease of use. It is accessible via a cloud-based web interface or as a Python package for local use, ensuring adaptability across research environments. Benchmarking against existing tools like SWISS-MODEL,I-TASSER and Phyre2 demonstrates Prostruc's competitive performance in terms of structural accuracy and job runtime, while its open-source nature encourages community-driven innovation.

Prostruc is positioned as a significant advancement in homology modeling, making high-quality protein structure prediction more accessible to the scientific community.

11:40-12:00

ClusterChirp: A GPU-Accelerated Web Platform for AI-Supported Interactive Exploration of High-Dimensional Omics Data

Osho Rawal, Icahn School Of Medicine At Mount Sinai, United States
Edgar Gonzalez-Kozlova, Icahn School Of Medicine At Mount Sinai, United States
Sacha Gnjatic, Icahn School Of Medicine At Mount Sinai, United States
Zeynep H. Gümüş, Icahn School Of Medicine At Mount Sinai, United States

Presentation Overview: Show

12:00-12:20

Can LLMs Bridge Domain and Visualization? A Case Study onHigh-Dimension Data Visualization in Single-Cell Transcriptomics

Qianwen Wang, University of Minnesota, United States
Xinyi Liu, University of Northeastern, United States
Nils Gehlenborg, Harvard Medical School, United States

Presentation Overview: Show

While many visualizations are built for domain users (biologists), understanding how visualizations are used in the domain has long been a challenging task. Previous research has relied on either interviewing a limited number of domain users or reviewing relevant application papers in the visualization community, neither of which provides comprehensive insight into visualizations in the wild of a specific domain. This paper aims to fill this gap by examining the potential of using Large Language Models (LLM) to analyze visualization usage in domain literature. We use high-dimension (HD) data visualization in sing-cell transcriptomics as a test case, analyzing 1,203 papers that describe 2,056 HD visualizations with highly specialized domain terminologies (e.g., biomarkers, cell lineage). To facilitate this analysis, we introduce a human-in-the-loop LLM workflow that can effectively analyze a large collection of papers and translate domain-specific terminology into standardized data and task abstractions. Instead of relying solely on LLMs for end-to-end analysis, our workflow enhances analytical quality through 1) integrating image processing and traditional NLP methods to prepare well-structured inputs for three targeted LLM subtasks (\ie, translating domain terminology, summarizing analysis tasks, and performing categorization), and 2) establishing checkpoints for human involvement and validation throughout the process.

The analysis results, validated with expert interviews and a test set, revealed three often overlooked aspects in HD visualization: trajectories in HD spaces, inter-cluster relationships, and dimension clustering.

This research provides a stepping stone for future studies seeking to use LLMs to bridge the gap between visualization design and domain-specific usage.

12:20-12:40

Automatic Generation of Natural Language Descriptions of Genomics Data Visualizations for Accessibility and Machine Learning

Thomas C. Smits, Harvard Medical School, MA, USA, United States
Sehi L'Yi, Harvard Medical School, MA, USA, United States
Andrew P. Mar, University of California, Berkeley, USA, United States
Nils Gehlenborg, Harvard Medical School, MA, USA, United States

Presentation Overview: Show

Availability of multimodal representations, i.e., visual and textual, is crucial for both information accessibility and construction of retrieval systems and machine learning (ML) models. Interactive data visualizations, omnipresent in data analysis tools and data portals, are key to accessing biomedical knowledge and detecting patterns in large datasets. However, large-scale ML models for generating descriptions of visualizations are limited and cannot handle the complexity of data and visualizations in fields like genomics. Generating accurate descriptions of complex interactive genomics visualizations remains an open challenge. This limits both access for blind and visually impaired users, and the development of multimodal datasets for ML applications. Grammar-based visualizations offer a unique opportunity. Since specifications of visualization grammars contain structured information about visualizations, they can be used to generate text directly, rather than interpreting the rendered visualization, potentially resulting in more precise descriptions.

We present AltGosling, an automated description generation tool focused on interactive visualizations of genome-mapped data, created with grammar-based toolkit Gosling. AltGosling uses a logic-based algorithm to create descriptions in various forms, including a tree-structured navigable panel for keyboard accessibility, and visualization-text pairs for ML training. We show that AltGosling outperforms state-of-the-art large language models and image-based neural networks for text generation of genomics data visualizations. AltGosling was adopted in our follow-up study to construct a retrieval system for genomics visualizations combining different modalities (specification, image, and text). As a first in genomics research, we lay the groundwork for building multimodal resources, improving accessibility, and enabling integration of biomedical visualizations and ML.

12:40-13:00

Sketch, capture and layout Phylogenies

Daniel Huson, University of Tuebingen, Germany

Presentation Overview: Show

PhageExpressionAtlas - a comprehensive transcriptional atlas of phage infections of bacteria

Maik Wolfram-Schauerte, University of Tübingen, Institute for Bioinformatics and Medical Informatics, Germany
Caroline Trust, University of Tübingen, Institute for Bioinformatics and Medical Informatics, Germany
Nils Waffenschmidt, University of Tübingen, Institute for Bioinformatics and Medical Informatics, Germany
Kay Nieselt, University of Tübingen, Institute for Bioinformatics and Medical Informatics, Germany

Presentation Overview: Show

Bacteriophages (phages) are bacterial viruses that infect and lyse their hosts. Phages shape microbial ecosystems and have contributed essential tools for biotechnology and applications in medical research. Their enzymes, takeover mechanisms, and interactions with their bacterial hosts are increasingly relevant, especially as phage therapy emerges to combat antibiotic resistances.

Therefore, a thorough understanding of phage-host interactions, especially on the transcriptional level, is key to unlocking their full potential. Dual RNA sequencing (RNA-seq) enables such insight by capturing gene expression in both phages and hosts across infection stages. While individual studies have revealed host responses and phage takeover strategies, comprehensive and systematic analyses remain scarce.

To fill this gap, we present the PhageExpressionAtlas, the first interactive resource for exploring phage-host interactions at the transcriptome level. We developed a unified analysis pipeline to process over 20 public dual RNA-seq datasets, covering diverse phage-host systems, including therapeutic and model phages infecting ESKAPE pathogens like Staphylococcus aureus and Pseudomonas aeruginosa. Users can visualize gene expression across infection phases, download datasets, and classify phage genes as early, middle, or late expressed using customizable criteria. Expression data can be explored via heat maps, profile plots, and in genome context, aiding functional gene characterization and phage genome analysis.

The PhageExpressionAtlas will continue to grow, integrating new datasets and features, including cross-phage/host comparisons and host transcriptome architecture analysis. We envision the PhageExpressionAtlas to become a central resource for the phage research community, fostering data-driven insights and interdisciplinary collaboration. The resource is available at phageexpressionatlas.cs.uni-tuebingen.de.

14:00-14:40

SEAL: Spatially-resolved Embedding Analysis with Linked Imaging Data

Simon Warchol, Harvard School of Engineering and Applied Sciences, Laboratory of Systems Pharmacology, Harvard Medical School, United States
Grace Guo, Harvard School of Engineering and Applied Sciences, Laboratory of Systems Pharmacology, Harvard Medical School, United States
Johannes Knittel, Harvard John A. Paulson School of Engineering and Applied Sciences, United States
Dan Freeman, Laboratory of Systems Pharmacology, Harvard Medical School, United States
Usha Bhalla, Harvard John A. Paulson School of Engineering and Applied Sciences, United States
Jeremy Muhlich, Laboratory of Systems Pharmacology, Harvard Medical School, United States
Peter K Sorger, Laboratory of Systems Pharmacology, Harvard Medical School, United States
Hanspeter Pfister, Harvard John A. Paulson School of Engineering and Applied Sciences, United States

Presentation Overview: Show

Dimensionality reduction techniques help analysts interpret complex, high-dimensional spatial datasets by projecting data attributes into two-dimensional space. For instance, when investigating multiplexed tissue imaging, these techniques help researchers identify and differentiate cell types and states. However, they abstract away crucial spatial, positional, and morphological contexts, complicating interpretation and limiting deeper biological insights. To address these limitations, we present SEAL, an interactive visual analytics system designed to bridge the gap between abstract 2D embeddings and their rich spatial imaging context. SEAL introduces a novel hybrid-embedding visualization that preserves morphological and positional information while integrating critical high-dimensional feature data. By adapting set visualization methods, SEAL allows analysts to identify, visualize, and compare selections—defined manually or algorithmically—in both the embedding and original spatial views, enabling richer interpretation of the spatial arrangement and morphological characteristics of entities of interest. To elucidate differences between selected sets, SEAL employs a scalable surrogate model to calculate feature importance scores, identifying the most influential features governing the position of objects within embeddings. These importance scores are visually summarized across selections, with mathematical set operations enabling detailed comparative analyses. We demonstrate SEAL’s effectiveness through two case studies with cancer researchers: colorectal cancer analysis with a pharmacologist and melanoma investigation with a cell biologist. We then illustrate broader cross-domain applicability by exploring multispectral astronomical imaging data with an astronomer. Implemented as a standalone tool or integrated seamlessly with computational notebooks, SEAL provides an interactive platform for spatially informed exploration of high-dimensional datasets, significantly enhancing interpretability and insight generation.

Nightingale - A collection of web components for visualizing protein related data

Swaathi Kandasaamy, UniProt - EMBL-EBI, United Kingdom
Daniel Rice, UniProt - EMBL-EBI, United Kingdom
Aurélien Luciani, UniProt - EMBL-EBI, United Kingdom
Adam Midlik, PDBe - EMBL-EBI, United Kingdom
Maria Martin, UniProt - EMBL-EBI, United Kingdom

Presentation Overview: Show

A Multimodal Search and Authoring System for Genomics Data Visualizations

Huyen N. Nguyen, Harvard Medical School, United States
Sehi L'Yi, Harvard Medical School, United States
Thomas C. Smits, Harvard Medical School, United States
Shanghua Gao, Harvard Medical School, United States
Marinka Zitnik, Harvard Medical School, United States
Nils Gehlenborg, Harvard Medical School, United States

Presentation Overview: Show

We present a database system for retrieving interactive genomics visualizations through multimodal search capabilities. Our system offers users flexibility through three query methods: example images, natural language, or grammar-based queries, via a user interface. For each visualization in our database, we generate three complementary representations: a declarative specification using the Gosling visualization grammar, a pixel-based image, and a natural language description. To support investigation of multiple embeddings and retrieval strategies, we implement three embedding methods that capture different aspects of these visualizations: (1) Context-free grammar embeddings specifically designed for genomics visualizations, addressing specialized features like genomic tracks, views, and interactivity, (2) Multimodal embeddings derived from a state-of-the-art biomedical vision-language foundation model, and (3) Textual embeddings generated by our fine-tuned specification-to-text large language model. We evaluated the proposed embedding strategies across different modality variations using top-k retrieval accuracy. Notably, our findings demonstrate that context-free grammar embedding approaches achieve comparable retrieval results with lower computational demands. Our current collection contains over three thousand visualization examples spanning approximately 50 categories, from basic to scalable encodings, from single- to coordinated multi-view visualizations, supporting diverse genomics applications including gene annotations and single-cell epigenomics analysis. Retrieved visualizations serve as ready-to-use scaffolds for authoring: they are templates that users can modify with their data and customize to their visual preferences. This approach provides researchers with reusable examples, allowing them to concentrate on meaningful data analysis and interpretation instead of the technicalities of building visualizations from scratch.

Tersect Browser: characterising introgressions through interactive visualisation of large numbers of resequenced genomes

Tomasz Kurowski, Cranfield University, United Kingdom
Fady Mohareb, Cranfield University, United Kingdom

Presentation Overview: Show

14:40-15:40

Invited Presentation: Visual Data Analysis Research in Biomedical Applications: Navigating the Line Between Scientific Novelty and Practical Impact

Ingrid Hotz

Presentation Overview: Show

15:40-

Best Abstract Award Announcement and Ceremony