Abstract:
Over the past decades, advances in biology and medicine—driven by
high-throughput and high-resolution experimental methods—have underscored the
critical role of visualization in interpreting and communicating complex
biological data. The interplay between life sciences and the visualization
domain has revealed a deep and natural synergy, where visual analytics has
become indispensable for discovery and insight.
In this talk, I will reflect on nearly 30 years of experience in developing
visual analytics solutions for large-scale biological data, with a particular
focus on multi-omics visualization. I will present a conceptual framework for
classifying multi-omics visualizations and illustrate it through selected
examples from tools developed by my research group. These range from
genome-level visualizations to tools for exploring quantitative omics and
epiproteomics data.
I will also introduce TueVis, a web-based
resource developed and maintained by my group, offering interactive,
user-friendly visualization tools spanning multiple omics layers. Designed for
researchers in bioinformatics and the life sciences, TueVis aims to lower the
barrier to high-quality data exploration and interpretation. The talk will
conclude with a perspective on emerging challenges and opportunities in the
evolving field of multi-omics visualization.
Speaker Bio:
Kay Nieselt is a Professor of Bioinformatics at the University of Tübingen,
where she leads the research group Integrative Transcriptomics. She earned her
Ph.D. in Mathematics from the University of Bielefeld, Germany. During her
doctoral work on modeling virus evolution, she began developing visual analytics
methods for large-scale biological data—an area that would become a central
theme of her research.
Her work spans a broad range of bioinformatics domains, including integrative
analysis of genomics (with a focus on paleogenomics), transcriptomics, and other
omics data types. She is particularly recognized for her contributions to the
visualization of large-scale biological datasets and the application and
development of machine learning methods for omics data interpretation. In 2012,
her group was awarded the Illumina iDEA Challenge Award for the most creative
algorithm handling large-scale next-generation sequencing data. Over the years,
her team has developed numerous visual analytics tools tailored to multi-omics
analysis, with a consistent emphasis on creating innovative yet user-friendly
visualizations. These tools support diverse applications such as large-scale
gene expression profiling, multiple genome alignments, pan-genome exploration,
and integrative multi-omics data analysis.
Kay Nieselt has been actively involved in the BioVis community since its
inception in 2011, serving on both the program and steering committees. She
chaired the BioVis Special Interest Group (SIG) at ISMB in 2014 and 2015 and
subsequently served as the spokesperson for the BioVis COSI.
Visual Data Analysis Research in Biomedical Applications: Navigating the Line Between Scientific Novelty and Practical Impact
Abstract: Visualization has a long-standing tradition in biomedical
research, yet its potential as a tool for data exploration and analytical
reasoning remains underused. In this talk, I will share results and experiences
from recent interdisciplinary collaborations in this area, including projects
on molecular dynamics, electronic structure modeling, and hypothesis generation
in medicine. In addition to presenting results, I will reflect on the
challenges of working across domains, the sometimes slow but often rewarding
process of building trust, and the tension between scientific innovation in
both fields and real-world applicability. These reflections also raise broader
questions about research sustainability: When is a project complete, and when
is it time to move on
Speaker bio: Ingrid Hotz is a professor of scientific visualization at Linköping
University in Sweden. She received her M.S. degree in theoretical physics from
Ludwig Maximilian University in Munich, Germany, and her Ph.D. in computer
science from TU Kaiserslautern, Germany. After a postdoctoral position at the
Institute for Data Analysis and Visualization (IDAV) at the University of
California, she started an Emmy Noether research group at the Zuse Institute in
Berlin. She then served for several years as the head of the scientific
visualization group at the German Aerospace Center (DLR). The main focus of her
research lies in the area of data analysis and scientific visualization,
encompassing both fundamental research questions and practical solutions to
visualization challenges in applications including physics, chemistry and
medical imaging, and mechanical engineering—from small- to large-scale
simulations. Her work draws on ideas and methods from various fields within
computer science and mathematics, including computer graphics, computer vision,
dynamical systems, computational geometry, and combinatorial topology.
Program
All times listed are in BST.
8:40-8:45
Opening
Qianwen Wang, Zeynep Gumus
8:45-9:40
Invited Presentation: The Visual Genome: An attempt to classify multi-omics
visualization
Over the past decades, advances in biology and medicine—driven by high-throughput and
high-resolution experimental methods—have underscored the critical role of visualization in
interpreting and communicating complex biological data. The interplay between life sciences and the
visualization domain has revealed a deep and natural synergy, where visual analytics has become
indispensable for discovery and insight.
In this talk, I will reflect on nearly 30 years of experience in developing visual analytics
solutions for large-scale biological data, with a particular focus on multi-omics visualization. I
will present a conceptual framework for classifying multi-omics visualizations and illustrate it
through selected examples from tools developed by my research group. These range from genome-level
visualizations to tools for exploring quantitative omics and epiproteomics data.
I will also introduce TueVis (https://tuevis.cs.uni-tuebingen.de), a web-based resource developed
and maintained by my group, offering interactive, user-friendly visualization tools spanning
multiple omics layers. Designed for researchers in bioinformatics and the life sciences, TueVis aims
to lower the barrier to high-quality data exploration and interpretation.
The talk will conclude with a perspective on emerging challenges and opportunities in the evolving
field of multi-omics visualization.
9:40-10:00
GENET: AI-Powered Interactive Visualization Workflows to Explore Biomedical Entity
Networks
Formulating experimental hypotheses that test the association between SNPs and diseases involves
logical reasoning derived from prior observations, followed by the labor-intensive process of
collecting and analyzing relevant literature to test the scientific plausibility and viability. AI
models trained with previous association data (e.g., GWAS Catalog) can help infer potential
associations between SNPs and diseases, but scientists still need to manually collect and inspect
the evidence for such predictions from prior literature. To alleviate this burden, we introduce an
AI-enhanced, end-to-end visual analytics workflow called GENET, which aims to help scientists
discover the SNP-Target associations, collect evidence from scientific literature, extract knowledge
as biomedical entity networks, and interactively explore them using visualizations. The workflow
consists of the following four steps, where each step’s output serves as the input for the next
step: 1) biomedical network analysis: identify interesting genes/SNPs that are associated with a
target disease through indirectly connected genes/SNPs using a neural network; 2) literature
evidence mining pipeline: collect relevant literature on the target diseases or the infered
genes/SNPs, and extract biomedical entities and their relations from the collection using large
language models; 3) clustering: cluster the extracted entities and relations by generating the
embeddings using pre-trained biomedical language models (e.g., BioBERT, BioLinkBERT); 4) interactive
visualizations: visualize the clusters of biomedical entities and their networks and provide
interactive handles for exploration. The workflow enables users to iteratively formulate and test
hypotheses involving SNPs/genes and diseases against evidence from scientific literature and
databases and gain novel insights.
11:20-11:40
Prostruc: an open-source tool for 3D structure prediction using homology modeling
Shivani Pawar, Department of Biotechnology and Bioinformatics, Deogiri College, Auranagabad,
Maharashtra, India
Wilson Sena Kwaku Banini, Department of Theoretical and Applied Biology, Kwame Nkrumah
University of Science and Technology, Ghana
Musa Muhammad Shamsuddeen, Department of Public Health, Faculty of Health Sciences, National
Open University of Nigeria, Abuja, Nigeria
Toheeb A Jumah, School of Collective Intelligence, University Mohammed VI Polytechnic, Rabat,
Morocco
Nigel N O Dolling, Department of Parasitology, Noguchi Memorial Institute for Medical Research,
University of Ghana, Accra, Ghana
Abdulwasiu Tiamiyu, School of Collective Intelligence, University Mohammed VI Polytechnic,
Rabat, Morocco
Olaitan I. Awe, African Society for Bioinformatics and
Computational Biology, Cape Town, South Africa
Homology modeling is a widely used computational technique for predicting the three-dimensional (3D)
structures of proteins based on known templates,evolutionary relationships to provide structural
insights critical for understanding protein function, interactions, and potential therapeutic
targets. However, existing tools often require significant expertise and computational resources,
presenting a barrier for many researchers.
Prostruc is a Python-based homology modeling tool designed to simplify protein structure prediction
through an intuitive, automated pipeline. Integrating Biopython for sequence alignment, BLAST for
template identification, and ProMod3 for structure generation, Prostruc streamlines complex
workflows into a user-friendly interface. The tool enables researchers to input protein sequences,
identify homologous templates from databases such as the Protein Data Bank (PDB), and generate
high-quality 3D structures with minimal computational expertise. Prostruc implements a two-stage
vSquarealidation process: first, it uses TM-align for structural comparison, assessing Root Mean
Deviations (RMSD) and TM scores against reference models. Second, it evaluates model quality via
QMEANDisCo to ensure high accuracy.
The top five models are selected based on these metrics and provided to the user. Prostruc stands
out by offering scalability, flexibility, and ease of use. It is accessible via a cloud-based web
interface or as a Python package for local use, ensuring adaptability across research environments.
Benchmarking against existing tools like SWISS-MODEL,I-TASSER and Phyre2 demonstrates Prostruc's
competitive performance in terms of structural accuracy and job runtime, while its open-source
nature encourages community-driven innovation.
Prostruc is positioned as a significant advancement in homology modeling, making high-quality
protein structure prediction more accessible to the scientific community.
11:40-12:00
Automatic Generation of Natural Language Descriptions of Genomics Data Visualizations for
Accessibility and Machine Learning
Thomas C. Smits, Harvard Medical School, MA, USA, United States
Sehi L'Yi, Harvard Medical School, MA, USA, United States
Andrew P. Mar, University of California, Berkeley, USA, United States
Nils Gehlenborg, Harvard Medical School, MA, USA, United States
Availability of multimodal representations, i.e., visual and textual, is crucial for both
information accessibility and construction of retrieval systems and machine learning (ML) models.
Interactive data visualizations, omnipresent in data analysis tools and data portals, are key to
accessing biomedical knowledge and detecting patterns in large datasets. However, large-scale ML
models for generating descriptions of visualizations are limited and cannot handle the complexity of
data and visualizations in fields like genomics. Generating accurate descriptions of complex
interactive genomics visualizations remains an open challenge. This limits both access for blind and
visually impaired users, and the development of multimodal datasets for ML applications.
Grammar-based visualizations offer a unique opportunity. Since specifications of visualization
grammars contain structured information about visualizations, they can be used to generate text
directly, rather than interpreting the rendered visualization, potentially resulting in more precise
descriptions.
We present AltGosling, an automated description generation tool focused on interactive
visualizations of genome-mapped data, created with grammar-based toolkit Gosling. AltGosling uses a
logic-based algorithm to create descriptions in various forms, including a tree-structured navigable
panel for keyboard accessibility, and visualization-text pairs for ML training. We show that
AltGosling outperforms state-of-the-art large language models and image-based neural networks for
text generation of genomics data visualizations. AltGosling was adopted in our follow-up study to
construct a retrieval system for genomics visualizations combining different modalities
(specification, image, and text). As a first in genomics research, we lay the groundwork for
building multimodal resources, improving accessibility, and enabling integration of biomedical
visualizations and ML.
12:00-12:20
Can LLMs Bridge Domain and Visualization? A Case Study onHigh-Dimension Data Visualization in
Single-Cell Transcriptomics
Qianwen Wang, University of Minnesota, United States
Xinyi Liu, University of Northeastern, United States
Nils Gehlenborg, Harvard Medical School, United States
While many visualizations are built for domain users (biologists), understanding how visualizations
are used in the domain has long been a challenging task. Previous research has relied on either
interviewing a limited number of domain users or reviewing relevant application papers in the
visualization community, neither of which provides comprehensive insight into visualizations in the
wild of a specific domain. This paper aims to fill this gap by examining the potential of using
Large Language Models (LLM) to analyze visualization usage in domain literature. We use
high-dimension (HD) data visualization in sing-cell transcriptomics as a test case, analyzing 1,203
papers that describe 2,056 HD visualizations with highly specialized domain terminologies (e.g.,
biomarkers, cell lineage). To facilitate this analysis, we introduce a human-in-the-loop LLM
workflow that can effectively analyze a large collection of papers and translate domain-specific
terminology into standardized data and task abstractions. Instead of relying solely on LLMs for
end-to-end analysis, our workflow enhances analytical quality through 1) integrating image
processing and traditional NLP methods to prepare well-structured inputs for three targeted LLM
subtasks (\ie, translating domain terminology, summarizing analysis tasks, and performing
categorization), and 2) establishing checkpoints for human involvement and validation throughout the
process.
The analysis results, validated with expert interviews and a test set, revealed three often
overlooked aspects in HD visualization: trajectories in HD spaces, inter-cluster relationships, and
dimension clustering.
This research provides a stepping stone for future studies seeking to use LLMs to bridge the gap
between visualization design and domain-specific usage.
12:20-12:40
ClusterChirp: A GPU-Accelerated Web Platform for AI-Supported Interactive Exploration of
High-Dimensional Omics Data
Osho Rawal, Icahn School Of Medicine At Mount Sinai, United States
Edgar Gonzalez-Kozlova, Icahn School Of Medicine At Mount Sinai, United States
Sacha Gnjatic, Icahn School Of Medicine At Mount Sinai, United States
Zeynep H. Gümüş, Icahn School Of Medicine At Mount Sinai, United
States
Modern omics technologies generate high-dimensional datasets that overwhelm traditional
visualization tools, requiring computational tradeoffs that risk losing important patterns.
Researchers without computational expertise face additional barriers when tools demand specialized
syntax or command-line proficiency, while connecting visual patterns to biological meaning typically
requires manual navigation across platforms. To address these challenges, we developed ClusterChirp,
a GPU-accelerated web platform for real-time exploration of data matrices containing up to 10
million values. The platform leverages deck.gl for hardware-accelerated rendering and optimized
multi-threaded clustering algorithms that significantly outperform conventional methods. Its
intuitive interface features interactive heatmaps and correlation networks that visualize
relationships between biomarkers, with capabilities to dynamically cluster or sort data by various
metrics, search for specific biomarkers, and adjust visualization parameters. Uniquely, ClusterChirp
includes a natural language interface powered by an Artificial Intelligence (AI)-supported Large
Language Model (LLM), enabling interactions through conversational commands. The platform connects
with biological knowledge-bases for pathway and ontology enrichment analyses. ClusterChirp is being
developed through iterative feedback from domain experts while adhering to FAIR principles, and will
be freely available upon publication. By uniting performance, usability, and biological context,
ClusterChirp empowers researchers to extract meaningful insights from complex omics data with
unprecedented ease.
Phylogenetic trees and networks play a central role in biology, bioinformatics, and mathematical
biology, and producing clear, informative visualizations of them is an important task. We present
new algorithms for visualizing rooted phylogenetic networks as either combining or transfer
networks, in both cladogram and phylogram style. In addition, we introduce a layout algorithm that
aims to improve clarity by minimizing the total stretch of reticulate edges. To address the common
issue that biological publications often omit machine-readable representations of depicted trees and
networks, we also provide an image-based algorithm for extracting their topology from figures. All
algorithms are implemented in our new PhyloSketch app, which is open source and freely available
at: https://github.com/husonlab/phylosketch2.
PhageExpressionAtlas - a comprehensive transcriptional atlas of phage infections of
bacteria
Maik Wolfram-Schauerte, University of Tübingen, Institute for
Bioinformatics and Medical Informatics, Germany
Caroline Trust, University of Tübingen, Institute for Bioinformatics and Medical Informatics,
Germany
Nils Waffenschmidt, University of Tübingen, Institute for Bioinformatics and Medical
Informatics, Germany
Kay Nieselt, University of Tübingen, Institute for Bioinformatics and Medical Informatics,
Germany
Bacteriophages (phages) are bacterial viruses that infect and lyse their hosts. Phages shape
microbial ecosystems and have contributed essential tools for biotechnology and applications in
medical research. Their enzymes, takeover mechanisms, and interactions with their bacterial hosts
are increasingly relevant, especially as phage therapy emerges to combat antibiotic resistances.
Therefore, a thorough understanding of phage-host interactions, especially on the transcriptional
level, is key to unlocking their full potential. Dual RNA sequencing (RNA-seq) enables such insight
by capturing gene expression in both phages and hosts across infection stages. While individual
studies have revealed host responses and phage takeover strategies, comprehensive and systematic
analyses remain scarce.
To fill this gap, we present the PhageExpressionAtlas, the first interactive resource for exploring
phage-host interactions at the transcriptome level. We developed a unified analysis pipeline to
process over 20 public dual RNA-seq datasets, covering diverse phage-host systems, including
therapeutic and model phages infecting ESKAPE pathogens like Staphylococcus aureus and Pseudomonas
aeruginosa. Users can visualize gene expression across infection phases, download datasets, and
classify phage genes as early, middle, or late expressed using customizable criteria. Expression
data can be explored via heat maps, profile plots, and in genome context, aiding functional gene
characterization and phage genome analysis.
The PhageExpressionAtlas will continue to grow, integrating new datasets and features, including
cross-phage/host comparisons and host transcriptome architecture analysis. We envision the
PhageExpressionAtlas to become a central resource for the phage research community, fostering
data-driven insights and interdisciplinary collaboration. The resource is available at
phageexpressionatlas.cs.uni-tuebingen.de.
14:00-14:40
SEAL: Spatially-resolved Embedding Analysis with Linked Imaging Data
Simon Warchol, Harvard School of Engineering and Applied Sciences,
Laboratory of Systems Pharmacology, Harvard Medical School, United States
Grace Guo, Harvard School of Engineering and Applied Sciences, Laboratory of Systems
Pharmacology, Harvard Medical School, United States
Johannes Knittel, Harvard John A. Paulson School of Engineering and Applied Sciences, United
States
Dan Freeman, Laboratory of Systems Pharmacology, Harvard Medical School, United States
Usha Bhalla, Harvard John A. Paulson School of Engineering and Applied Sciences, United States
Jeremy Muhlich, Laboratory of Systems Pharmacology, Harvard Medical School, United States
Peter K Sorger, Laboratory of Systems Pharmacology, Harvard Medical School, United States
Hanspeter Pfister, Harvard John A. Paulson School of Engineering and Applied Sciences, United
States
Dimensionality reduction techniques help analysts interpret complex, high-dimensional spatial
datasets by projecting data attributes into two-dimensional space. For instance, when investigating
multiplexed tissue imaging, these techniques help researchers identify and differentiate cell types
and states. However, they abstract away crucial spatial, positional, and morphological contexts,
complicating interpretation and limiting deeper biological insights. To address these limitations,
we present SEAL, an interactive visual analytics system designed to bridge the gap between abstract
2D embeddings and their rich spatial imaging context. SEAL introduces a novel hybrid-embedding
visualization that preserves morphological and positional information while integrating critical
high-dimensional feature data. By adapting set visualization methods, SEAL allows analysts to
identify, visualize, and compare selections—defined manually or algorithmically—in both the
embedding and original spatial views, enabling richer interpretation of the spatial arrangement and
morphological characteristics of entities of interest. To elucidate differences between selected
sets, SEAL employs a scalable surrogate model to calculate feature importance scores, identifying
the most influential features governing the position of objects within embeddings. These importance
scores are visually summarized across selections, with mathematical set operations enabling detailed
comparative analyses. We demonstrate SEAL’s effectiveness through two case studies with cancer
researchers: colorectal cancer analysis with a pharmacologist and melanoma investigation with a cell
biologist. We then illustrate broader cross-domain applicability by exploring multispectral
astronomical imaging data with an astronomer. Implemented as a standalone tool or integrated
seamlessly with computational notebooks, SEAL provides an interactive platform for spatially
informed exploration of high-dimensional datasets, significantly enhancing interpretability and
insight generation.
Nightingale - A collection of web components for visualizing protein related data
Swaathi Kandasaamy, UniProt - EMBL-EBI, United Kingdom
Daniel Rice, UniProt - EMBL-EBI, United Kingdom
Aurélien Luciani, UniProt - EMBL-EBI, United Kingdom
Nightingale is an open-source web visualization library for rendering protein-related data including
domains, sites, variants, structures, and interactions using reusable web components. It employs a
track-based approach, where sequences are represented horizontally, and multiple tracks can be
stacked vertically to visualize different annotations at the same position, aiding in the discovery
of relationships across annotations. This intuitive approach enhances the exploration and
interpretation of complex biological data. It leverages the HTML5 Canvas API for improved
performance, handling large datasets efficiently in the most used tracks, while still using SVG as a
layer on top of canvas for interactivity which is not critical for performance.
It is a collaborative effort by UniProt, InterPro, and PDBe to provide a unified set of components
for their websites, including UniProt’s ProtVista, while allowing flexibility for specific needs. As
a collection of standard web components, Nightingale integrates seamlessly into any web application,
ensuring compatibility with various frameworks and libraries. It utilizes standard DOM event
propagation and attribute-based communication to facilitate interoperability between Nightingale
components and other web components, irrespective of their internal implementation details. As an
evolving platform, we aim to engage with parallel visualization projects to identify and promote
best practices in the application of web standards, with a focus on advancing the adoption and
integration of web components within the domain of biological data visualization.
A Multimodal Search and Authoring System for Genomics Data Visualizations
Huyen N. Nguyen, Harvard Medical School, United States
Sehi L'Yi, Harvard Medical School, United States
Thomas C. Smits, Harvard Medical School, United States
Shanghua Gao, Harvard Medical School, United States
Marinka Zitnik, Harvard Medical School, United States
Nils Gehlenborg, Harvard Medical School, United States
We present a database system for retrieving interactive genomics visualizations through multimodal
search capabilities. Our system offers users flexibility through three query methods: example
images, natural language, or grammar-based queries, via a user interface. For each visualization in
our database, we generate three complementary representations: a declarative specification using the
Gosling visualization grammar, a pixel-based image, and a natural language description. To support
investigation of multiple embeddings and retrieval strategies, we implement three embedding methods
that capture different aspects of these visualizations: (1) Context-free grammar embeddings
specifically designed for genomics visualizations, addressing specialized features like genomic
tracks, views, and interactivity, (2) Multimodal embeddings derived from a state-of-the-art
biomedical vision-language foundation model, and (3) Textual embeddings generated by our fine-tuned
specification-to-text large language model. We evaluated the proposed embedding strategies across
different modality variations using top-k retrieval accuracy. Notably, our findings demonstrate that
context-free grammar embedding approaches achieve comparable retrieval results with lower
computational demands. Our current collection contains over three thousand visualization examples
spanning approximately 50 categories, from basic to scalable encodings, from single- to coordinated
multi-view visualizations, supporting diverse genomics applications including gene annotations and
single-cell epigenomics analysis. Retrieved visualizations serve as ready-to-use scaffolds for
authoring: they are templates that users can modify with their data and customize to their visual
preferences. This approach provides researchers with reusable examples, allowing them to concentrate
on meaningful data analysis and interpretation instead of the technicalities of building
visualizations from scratch.
Tersect Browser: characterising introgressions through interactive visualisation of large
numbers of resequenced genomes
Tomasz Kurowski, Cranfield University, United Kingdom
Fady Mohareb, Cranfield University, United Kingdom
Introgressive hybridisation has long been a major source of genetic variation in plant genomes, and
the ability to precisely identify and delimit intervals of DNA originating from wild species or
cultivars of interest is of great importance to both researchers seeking insights into the evolution
and breeding of crops, and to plant breeders seeking to protect their intellectual property. The low
cost of genome resequencing and the public availability of large sets of resequenced genomes for
many species of commercial importance, as well as for their wild relatives, have made it possible to
reliably characterise the origins of specific genomic intervals. However, such analyses are often
hampered by the same large volume of data that enables them. They generally take a long time to
execute, and their results are difficult to visualise in an easily explorable manner.
We present Tersect Browser, a Web-based tool that leverages a novel, multi-tier indexing and
pre-calculation scheme to allow biologists to explore the relationships between large sets of
resequenced genomes in a fully interactive fashion. Users have the option to freely adjust interval
size and resolution while navigating through detailed genetic distance heatmaps and phylogenies for
genomes and regions of interest, smoothly zooming in and out depending on the needs of their
exploratory data analysis, aided by extendable plugins and annotations. Results and visualisations
can also be shared with others and downloaded as high-resolution figures for use outside the
application, placing the researcher best prepared to interpret the results in full control.
14:40-15:40
Invited Presentation: Visual Data Analysis Research in Biomedical Applications: Navigating the
Line Between Scientific Novelty and Practical Impact
Visualization has a long-standing tradition in biomedical research, yet its potential as a tool for
data exploration and analytical reasoning remains underused. In this talk, I will share results and
experiences from recent interdisciplinary collaborations in this area, including projects on
molecular dynamics, electronic structure modeling, and hypothesis generation in medicine. In
addition to presenting results, I will reflect on the challenges of working across domains, the
sometimes slow but often rewarding process of building trust, and the tension between scientific
innovation in both fields and real-world applicability. These reflections also raise broader
questions about research sustainability: When is a project complete, and when is it time to move on?