Marc Baaden, Institut de Biologie Physico-Chimique, Paris
Abstract: To what extent do we still need molecular graphics in today’s
scientific landscape? Despite its long history and high level of maturity,
molecular graphics remains an indispensable tool for scientific understanding
and discovery, constantly facing new challenges. These currently include the
ability to visualize complex molecular relationships and interactions, with the
goal of scaling the size and scope of systems currently under investigation. In
addition, molecular graphics programs must adapt to experimental (r)evolutions
to remain relevant in the field.
However, it is not enough to simply produce high quality visualizations.
Scientists must be able to share these complex visualizations with their
colleagues and with experts in other fields. The latest tools in molecular
graphics, such as AI- and data-driven approaches, interactive simulations,
augmented, mixed, and virtual reality, offer new ways to visualize and interact
with molecular data and models. These advances enable researchers to explore
scientific questions in new and innovative ways, but need to be more widely
adopted to become routine in scientific investigations.
Speaker Biography:
Marc Baaden, research director at the CNRS in Paris, is a computational chemist
working in the field of structural bioinformatics. His research focuses on
interactive molecular modeling approaches for biological systems and has
included virtual reality approaches since 2007, then Citizen Science, and more
recently the Internet of Things. He develops scientific visualization approaches
as well as original tools related to Big Data and immersive analytics using
virtual reality equipment. Using the Unity game engine, he has designed the
UnityMol platform as a development framework for academic contexts and for
collaboration with industry partners and the public. His research combines
simulations of biological macromolecules and bioinformatics with
high-performance computing, virtual reality, visualization, and dissemination
activities.
A view on Visual Analytics for Biomedical Applications
Anna Vilanova, Eindhoven University of Technology, Eindhoven
Abstract: Visual analytics is a branch of visualization that focuses on
analytical reasoning facilitated by interaction and visual representations.
Visual analytics is an extension to AI methods. It is also a complement to the
already existing visualization techniques by the introduction of the concepts
of reasoning and AI. Interaction and enhancement of human reasoning and
decision making are central. The research in my group has focused on visual
analytics for data exploration, hypothesis generation and understanding for
biomedical applications, such as, virtual colonoscopy, diffusion weighted
imaging for brain white-matter and muscle, 4D blood flow analysis,
radiotherapy, single cell analysis and Pangenomes. For these purposes, we
developed interactive visual analysis strategies, including uncertainties, and
facilitating the analysis of cohort data. We incorporated concepts of
progressive visual analytics and the use of dimensionality reduction as an
effective VA component for large data visual analysis for these applications.
In my talk, I will present multiple examples of our developments on biomedical
applications, the common visual analytics concepts and design strategies we
took for the design of those applications, and a glance on lessons learned and
open challenges.
Speaker Biography:
Prof.Dr. Anna Vilanova is full professor in visual analytics (vis.win.tue.nl)
since October 2019, at the department of Mathematics and Computer Science, at
the Eindhoven University of Technology (TU/e). She is also associated to the
Electrical Engineering department within the Signal Processing Systems at TU/e.
Previously she was associate professor for 6 years at the Computer Graphics &
Visualization Group at EEMCS at the University of Deft, the Netherlands. From
2002 to August 2013, she was Assistant Professor at the Biomedical Image
Analysis group of the Biomedical Engineering Department at TU/e. She is leading
a research group in the subject of visual analytics and multivalued image
analysis and visualization, focusing on Visual Analytics for high dimensional
data. She focuses on Biomedical applications, Diffusion Weighted Imaging and 4D
Flow. Her research interests include visual analytics, medical visualization,
volume visualization, multivalued visualization, and medical image analysis. In
2005, she was awarded a NWO-Veni personal grant with title “Visualization of
global tensor information for diffusion tensor imaging”. In 2013 she got a
NWO-Aspasia. She is member of the international program committee of several
conferences (e.g., IEEE Visualization and EG- IEEE VGTC-EuroVis). She has been
chair and editor of relevant conferences and journals in her field of research
(e.g., EuroVis 2008, Computer & Graphics, Computer Graphics Forum, IEEE Vis).
She was member of the steering committee of IEEE VGTC EuroVis (2014 -2018) and
VCBM since 2018. She is elected member of the EUROGRAPHICS executive committee
since 2015 and vice president of EUROGRAPHICS since 2019. She also became
EUROGRAPHICS fellow in 2019. She is elected member of IEEE VIS Steering
Committee (VSC) since 2021.
Program
Schedule subject to change. All times listed are in CEST
Tuesday, July 25th
10:30-10:35
Opening
Format: Live from venue
Speakers: Jan Byška and Michael Krone
10:35-11:30
Invited Presentation: Keynote Presentation
Format: Live from venue
Moderator(s): Michael Krone
Anna Vilanova
11:30-11:50
The Molecular Control Toolkit: Controlling 3D molecular graphics via gesture and voice (Test of Time Award)
Format: Live from venue
Moderator(s): Michael Krone
Kenneth Sabir
Christian Stolte
Bruce Tabor
Seán I. O'Donoghue
11:50-12:10
Effective Comparison of Single-Cell Embedding
Visualizations
Format: Live from venue
Moderator(s): Michael Krone
Trevor Manz, Harvard Medical School, United States
Fritz Lekschas, Ozette Technologies, United States
Evan Greene, Ozette Technologies, United States
Greg Finak, Ozette Technologies, United States
Nils Gehlenborg, Harvard Medical School, United States
Visualizing high-dimensional single-cell data with low-dimensional embedding spaces uncovers complex
cell phenotype relationships and provides dataset overviews. However, current pairwise comparison
methods for embeddings require shared point correspondences and struggle to reveal meaningful
similarities and differences between embeddings.
We introduce a novel hierarchical framework that enables comparison of distinct single-cell datasets
without point correspondence and alternative embeddings of the same data. By leveraging shared label
hierarchies, our approach allows for global and local difference comparison between cell types in
different embeddings.
Our framework utilizes a three-step process to analyze label confusion, neighborhood stability, and
relative cell type abundance. These properties help scientists effectively discover differences
between embeddings. For instance, label confusion highlights intermixed cell types, neighborhood
stability characterizes neighboring cell type composition, and cell type abundance surfaces
differentially-abundant cell types. We derive these properties using set-based similarity metrics
and implement them via a Delaunay graph traversal.
We developed a Python-based prototype for Jupyter Notebook-like coding environments to demonstrate
our framework's usefulness. In our talk, we will showcase single-cell surface proteomics embedding
use cases, comparing different embedding methods of the same experiment and highlighting cell type
abundance changes.
12:10-12:20
scDEED: a statistical method for detecting dubious 2D
single-cell embeddings
Format: Live from venue
Moderator(s): Michael Krone
Lucy Xia, Hong Kong University of Science and Technology, Hong Kong
Christy Lee, University of California, Los Angeles, United States
Jingyi Jessica Li, University of California, Los Angeles, United States
Two-dimensional (2D) embedding methods are crucial for single-cell data visualization. Popular
methods such as t-SNE and UMAP are commonly used for visualizing cell clusters; however, it is well
known that t-SNE and UMAP’s 2D embedding might not reliably inform the similarities among cell
clusters. Motivated by this challenge, we developed a statistical method, scDEED, for detecting
dubious cell embeddings output by any 2D embedding method. By calculating a reliability score for
every cell embedding, scDEED identifies the cell embeddings with low reliability scores as dubious
and those with high reliability scores as trustworthy. Moreover, by minimizing the number of dubious
cell embeddings, scDEED provides intuitive guidance for optimizing the hyperparameters of an
embedding method. Applied to multiple scRNA-seq datasets, scDEED demonstrates its effectiveness for
detecting dubious cell embeddings and optimizing the hyperparameters of t-SNE and UMAP.
12:20-12:30
GAZE-Shiny: comprehensive and interactive visualization of
transcriptional regulation in single-cell resolution
Format: Live from venue
Moderator(s): Michael Krone
Shamim Ashrafiyan, Goethe University Frankfurt, Germany
Fatemeh Behjati Ardakani, Goethe University Frankfurt, Germany
Dennis Hecker, Goethe University Frankfurt, Germany
Marcel Schulz, Goethe University Frankfurt, Germany
Data visualization and exploration are crucial for the interpretation of vast biological datasets
obtained through high-throughput assays, such as single-cell sequencing. single-cell data (sc-data)
provides valuable insights into cellular function, phenotypic heterogeneity, and tissue development.
Therefore, there is a need for an interactive and user-friendly software tool that enables
biologists and scientists to work with such data efficiently.
Hence, we have developed a user-friendly web application named GAZE-Shiny that enables easy
visualization and exploration of (sc-data). GAZE-Shiny is based on the GAZE statistical framework
that aggregates single cells into meta-cells and uses Machine Learning to infer transcription factor
(TF) regulation from transcriptome and epigenome sc-data.
GAZE-Shiny offers the user basic data representation elements for a specific gene or all clusters at
the single-cell level or meta-cell level.
Plus, it provides more intricate visualization of the TF-gene-cell associations calculated within
the GAZE pipeline. For instance, it enables easy visualization and retrieval of important TFs or
genes involving in transcriptional regulation at the single-cell level. Further, it can list the
candidate regulators defined based on our specialized statistical tests and allows users to explore
the single-cell TF regulation using interactive panels that focus on either genes or TFs.
13:50-14:10
Poincaré maps for visualization of large protein
families
Format: Live from venue
Moderator(s): Aditeya Pandey
Anna Susmelj, Biognosys AG, Wagistrasse 21, 8952 Schlieren, Switzerland, Switzerland
Yani Ren, Université Paris Cité and Université des Antilles and Université de la Réunion,
INSERM, BIGR, F-75014 Paris, France, France
Yann Vander Meersche, Université Paris Cité and Université des Antilles and Université de la
Réunion, INSERM, BIGR, F-75014 Paris, France, France
Jean-Christophe Gelly, Université Paris Cité and Université des Antilles and Université de la
Réunion, INSERM, BIGR, F-75014 Paris, France, France
Tatiana Galochkina, Université Paris Cité and Université des
Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France, France
Due to constantly increasing amounts of the available protein data, comprehensive visualization of
large protein families has become crucial for the analysis of protein evolution, function as well as
for characterization of poorly described proteins. While classical protein family representation by
a multiple sequence alignment (MSA) contains a great amount of information, visual analysis of MSA
becomes quite challenging once the number of the considered proteins reaches important values. In
the current study, we developed a new approach for protein family visualization named PoincaréMSA
based on Poincaré maps. Poincaré maps projection combines hyperbolic embedding with geodesic
distances calculated on the k-nearest neighbors graph, thus successfully reproducing complex
hierarchies contained in the protein data. We demonstrate that PoincaréMSA preserves the structure
of protein sequence space better than classical projection methods such as tSNE and UMAP. As we show
on several examples of different protein families, PoincaréMSA is very efficient for visualization
of complex protein family topologies as well as for evolutionary and functional annotation of
uncharacterized sequences. PoincaréMSA is implemented as an open source Python code with available
interactive Google Colab notebooks as described at https://www.dsimb.inserm.fr/POINCARE_MSA.
14:10-14:20
Seeing Beyond the Surface: The Continuous Development of Protein
Design with Dalton
Proteins are life's building blocks with structure and function linked intricately. New computational
tools are creating novel proteins, leading to an explosive growth in protein design. Dalton is work
in progress, cross-platform desktop application for protein design and visualization with powerful
features to explore the boundless possibilities of the protein universe. This presentation showcases
Dalton's capabilities for designing and visualizing custom proteins, highlighting its user-friendly
interface and tools for designing and editing protein structures. It demonstrates how Dalton works
with ColabFold to design proteins with specific features, analyzes predicted structures, and refine
designs. Dalton also integrates with the KEGG API and KGML, visualizing metabolic pathways and
protein-protein interaction networks. It can design proteins to interact with enzymes or pathways,
offering potential for designing new enzymes with specific metabolic functions. Overall, Dalton is
being developed to be a valuable asset for researchers and designers, providing powerful tools, a
user-friendly interface, and flexible design features for exploring the complex world of protein
design.
14:20-14:30
GVViZ: A physician-friendly bioinformatics application enabling
interactive gene-disease data annotation, expression analysis, and visualization for translational
research
Format: Live from venue
Moderator(s): Aditeya Pandey
Zeeshan Ahmed, Institute for Health, Health Care Policy and Aging
Research. Rutgers, The State University of New Jersey., United States
We emphasize that automated and interactive visualization should be an indispensable component of
modern RNA-seq analysis, which is currently not the case. We introduce GVViZ; a new, robust, and
user-friendly platform for RNA-seq-driven gene-disease data annotation, and expression analysis with
dynamic heat map visualization. With successful deployment in clinical settings, GVViZ will enable
high-throughput correlations between patient diagnoses based on clinical and transcriptomics data.
Experts in clinics and researchers in life sciences can use GVViZ to visualize and interpret the
transcriptomics data making it a powerful tool to study the dynamics of gene expression and
regulation. GVViZ can assess genotype-phenotype associations among multiple complex diseases to find
novel highly expressed genes. We have evaluated its clinical impact for different chronic diseases
including Alzheimer’s disease, arthritis, asthma, diabetes mellitus, heart failure, hypertension,
obesity, osteoporosis, and multiple cancer disorders.
14:30-14:50
Visualizing (differential) expression patterns with fuzzy
concepts as FlowSets
Format: Live from venue
Moderator(s): Aditeya Pandey
Felix Offensperger, Ludwig Maximilian University of Munich, Germany
Markus Joppich, Ludwig Maximilian University of Munich, Germany
Ralf Zimmer, Ludwig Maximilian University of Munich, Germany
High-throughput (sequencing) data set are becoming increasingly popular, and there is a need for more
advanced tools to analyze large amounts of data with complex dimensions, e.g., from scRNA-seq or
bulk RNA-seq. Here, we present the FlowSets framework as a new method for analyzing expression data
from ordered and unordered measurements of (possibly) multiple modalities, using fuzzy concepts to
encode signal as linguistic variables. FlowSets provides a time- and order-independent analysis of
gene expression data, focusing on specific genes or expression patterns of interest. Using fuzzy
concepts allows for easier interpretation of gene expression values, avoiding the use of
thresholding, and making gene set over-representation analysis more efficient. We compared FlowSets
to a WGCNA-based analysis on a simulated dataset and found that all fuzzy FlowSets methods could
identify genes belonging to all regulated patterns with high precision and recall. We also applied
the method to a public scRNA-seq dataset of monocytes from (non-)pneumonic COVID patients and
successfully recapitulated previous findings. The FlowSets framework is a promising tool for
analyzing complex gene expression data and may prove useful in large-scale studies with many
replicates.
14:50-15:00
RIVET: A visual interactive browser for tracking and curating
SARS-CoV-2 recombinants
Format: Live from venue
Moderator(s): Aditeya Pandey
Kyle Smith, University of California, San Diego, United States
Cheng Ye, University of California, San Diego, United States
Yatish Turakhia, University of California, San Diego, United States
Recombination has been shown to be a significant contributor to the genetic diversity in SARS-CoV-2,
however the task of manually curating putative recombinants from thousands of new sequences being
uploaded online daily suffers from weeks of delay, and poses a major bottleneck to real-time
surveillance efforts. RIVET is a software pipeline and visualization platform that builds on recent
algorithmic advances in recombination inference to comprehensively and sensitively search for
potential SARS-CoV-2 recombinants. RIVET's public web interface provides a suite of interactive
visualization and analysis tools that allows expert curators to visually scan through thousands of
newly detected putative recombinants and quickly prioritize high confidence recombinants of interest
to track or designate. RIVET provides integration with several other established tools, such as
UShER and Taxonium, and in the future will be combined with Autolin to completely automate the
process of lineage designation, including recombinant lineages.
15:00-15:10
PhosNetVis: A Web-Based Platform for Kinase Enrichment Analysis
and Visualizing Phosphoproteomics Networks
Format: Live from venue
Moderator(s): Aditeya Pandey
Berk Turhan, Icahn School of Medicine at Mount Sinai, United States
Irene Font Peradejordi, Jacobs Technion-Cornell Institute at Cornell Tech and Department of
Information Science at Cornell University, United States
Shreya Chandrasekar, Jacobs Technion-Cornell Institute at Cornell Tech and Department of
Information Science at Cornell University, United States
Selim Kalayci, Icahn School of Medicine at Mount Sinai, United States
Jeffrey Johnson, Icahn School of Medicine at Mount Sinai, United States
Mehdi Bouhaddou, Institute for Quantitative and Computational Biosciences, University of
California, Los Angeles, United States
Zeynep Gümüş, Mount Sinai School of Medicine, United States
Protein phosphorylation is a crucial cellular signaling process, where a kinase modifies a protein
residue. Multiple kinases can alter various sites on a substrate protein. To better understand human
cellular systems in health and disease, researchers are gathering extensive data on the abundance
and phosphorylation sites and states of thousands of proteins, as we analyze within the Human
Immunology Project Consortium (HIPC) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC).
Although there are tools available to infer kinase-substrate interactions (KSIs) from proteomics
datasets, there is a need for interactive exploration of the resulting KSI networks, simultaneously
with the phosphorylation sites and states of each substrate protein across multiple experiments and
time points. To address this need, we present PhosNetVis, a web-based tool that streamlines multiple
phosphoproteomics data analysis steps within a single platform to enable easy inference, generation,
and interactive exploration of KSI networks. With PhosNetVis, users can run Kinase Enrichment
Analysis (KEA) to detect significantly enriched kinases in their datasets and visually explore their
resulting networks. This helps in identifying key kinase-substrate interactions (KSIs) and lower the
difficulties of interpreting phosphoproteomics data, leading to faster and better biological
insights. PhosNetVis is open-sourced on GitHub, and available at phosnetvis.app
15:10-15:30
Visualizing temporal and multi-regional evolution of tumor
subclones with Jellyfish plots
Format: Live from venue
Moderator(s): Aditeya Pandey
Kari Lavikka, University of Helsinki, Finland
Ilari Maarala, University of Helsinki, Finland
Jaana Oikkonen, University of Helsinki, Finland
Yilin Li, University of Helsinki, Finland
Alexandra Lahtinen, University of Helsinki, Finland
Sampsa Hautaniemi, University of Helsinki, Finland
As a tumor in cancer grows and spreads, it undergoes a process of clonal evolution, resulting in
multiple subclones with different genetic and phenotypic characteristics. Here, we present
Jellyfish, a visualization design and a software package that automates the visualization of the
evolutionary relationship of these subclones and their contribution to the clonal compositions of
tumor samples. Unlike other visualization designs, such as fishplot, Jellyfish incorporates both the
temporal and spatial dimensions, allowing for the comparison of multiple tissue samples taken
multi-regionally from different parts of the patient's body at the same or different time points.
Such visualization allows for gaining an overview of the evolution, dispersal, and coexistence of
the subclones in metastasized tumors. To render the plots, Jellyfish uses a graph-theory-based
method to process phylogenetic and clonal composition data, which can be generated using external
tools like ClonEvol. We provide an overview of the design elements and the software architecture of
Jellyfish and present examples of how it has been used to visualize subclonal evolution in
high-grade serous carcinoma patients belonging to the DECIDER clinical trial
(https://www.deciderproject.eu/).
16:00-16:10
Best Practices for the Design of Health Dashboards
Since 2020, health informaticians have developed and enhanced public-facing health dashboards
worldwide. The improvement of dashboards implemented by health informaticians will ultimately
benefit the public in making better healthcare decisions and improve population-level healthcare
outcomes. The authors evaluated 100 US city, county, and state government health dashboards and
identified the top 10 best practices to be considered when creating a public health dashboard. These
features include 1) easy navigation, 2) high usability, 3) use of adjustable thresholds, 4) use of
diverse chart selection, 5) compliance with the Americans with Disabilities Act, 6) use of charts
with tabulated data, 7) incorporated user feedback, 8) simplicity of design, 9) adding clear
descriptions for charts, and 10) comparison data with other entities. To support their findings, the
authors also conducted a survey of 118 randomly selected individuals in six states and the District
of Columbia that supports these top 10 best practices for the design of health dashboards.
16:10-16:20
Interactive and effective visualization framework for
interpreting and exploring cellular communication data
The recent advance in single-cell transcriptomics has enabled the study of cellular communication and
several computational tools have been developed for inferring cell-cell interactions from
single-cell RNA sequencing data. On the other hand, visualization and interpretation of results is
still an open issue in this research area, due to the increasing complexity of datasets, comprising
different conditions, different subjects and time series studies, and the inherently
multi-dimensional characteristics of cellular communication data including both intercellular and
intracellular signalling.
Here we present CClens, an interactive Rshiny app that supports scientists in analysing and
exploring cell-cell communication results. The app can handle data from all the main bioinformatics
tool for inferring and scoring cell-cell communication (e.g. scSeqComm, SingleCellSignalR,
CellChat). Moreover, it includes i) multiple filtering options to dynamically and interactively
inspect data, ii) a powerful and effective visualization framework for summarizing cellular
communication data, and iii) advanced visualization tools to analyse even complex datasets (e.g.
multi-condition) on all their dimensions (e.g. ligand-receptor binding, patient-specific
interactions).
This interactive and visual way to inspect data provide a user-friendly, accessible (no-code),
flexible and powerful tool for exploring the richness of current cell-cell communication data, and
easily extract the biological information contained in such data.
16:20-16:22
Automated diagnosis of ear disease using ensemble deep learning
with a big otoendoscopy image database
Format: Live from venue
Moderator(s): Jan Byška
Shin Mi Hwa, Department of Otorhinolaryngology, Yonsei University
College of Medicine, South Korea
Choi Jae Young, Department of Otorhinolayngology, Yonsei University College of Medicine, South
Korea
Shin Mi Hwa, Department of Otorhinolaryngology, Yonsei University College of Medicine, South
Korea
Park Haeng Ran, Department of Otorhinolaryngology, Yonsei University College of Medicine, South
Korea
Background: Ear and mastoid disease can easily be treated by early detection and appropriate medical
care. However, short of specialists and relatively low diagnostic accuracy calls for a new way of
diagnostic strategy, in which deep learning may play a significant role. The current study presents
a machine learning model to automatically diagnose ear disease using a large database of
otoendoscopic images acquired in the clinical environment.
Methods: Total 10,544 otoendoscopic images were used to train nine public convolution-based deep
neural networks to classify eardrum and external auditory canal features into six categories of ear
diseases, covering most ear diseases. After evaluating several optimization schemes, two
best-performing models were selected to compose an ensemble classifier, by combining classification
scores of each classifier.
Findings: According to accuracy and training time, transfer learning models based on Inception-V3
and ResNet101 were chosen and the ensemble classifier using the two models yielded a significant
improvement over each model, the accuracy of which is in average 93·67% for the 5-folds
cross-validation. database.
Interpretation: The current study is unprecedented in terms of both disease diversity and diagnostic
accuracy, which is compatible or even better than an average otolaryngologist.
16:23-16:25
Topological Data Analysis and Persistence Theory Applications to
Heart Arrhythmia
Format: Live from venue
Moderator(s): Jan Byška
Justin Zhang, Bergen County Academies, United States
William Song, Bergen County Academies, United States
Giacomo Pugliese, Bergen County Academies, United States
Our research project utilizes rigorous techniques from topological data analysis, a type of data
analysis based on a mathematical field known as algebraic topology and machine learning, to
computationally visualize and analyze electrocardiogram (ECG) data of patients with various heart
conditions, including Ventricular Tachycardia, Ventricular Flutter, and Ventricular Fibrillation. By
leveraging sophisticated analysis tools such as persistent homology and simplicial complexes,
including the Vietoris Rips Complex, we obtain a highly precise modeling of the ECG data, enabling
us to distinguish between the ECG data of patients suffering from these debilitating illnesses from
healthy individuals. Our novel approach involves extracting critical geometric features from the
persistence diagrams and images obtained from our persistent homologies of the patient ECG data.
These features are then input into an algorithm based on our topological data analysis, enabling us
to classify, with virtually complete accuracy, which of these three conditions a patient is
suffering from. Our research project represents a key step forward in the field of heart disease
diagnosis, potentially offering a non-invasive, highly accurate method for diagnosing the hundreds
of thousands of patients suffering from these conditions.
16:26-16:28
Single Cell Data Analysis Made Easy: scDisco an App for
Non-Experts
Background: As single cell data analysis becomes increasingly prevalent, the challenge of exploring,
summarizing, and communicating results grows with the sheer volume of information. While existing
software tends to focus on analysis around 2D scatterplots based on dimensionality reduction
algorithms, custom-made comparative analyses are often required for more complex analyses.
Results: To address the challenges in the currently available software for analyzing single cell RNA
sequencing data, we developed an app that is easy to install and use, allowing non-experts to
explore and consolidate results. The app generates visualizations of various types, like dot plots,
to compare the expression of multiple genes across conditions or quantify the differences in cell
proportions between samples, among multiple other functionalities. The software, named scDisco for
single cell discovery app, has been tested for a year and is frequently used in our organization,
making it a mature product that we believe can benefit the single cell community.
Conclusion: Our app provides a user-friendly solution for exploring and comparing single cell data,
making it accessible to non-experts. We hope that this tool will facilitate the analysis and
interpretation of single cell data, ultimately leading to new insights and discoveries.
16:29-16:31
Automated Acute Lymphoblastic Leukemia Detection and
classification using Saliency Map
We propose an automated method of segmentation and classification of white blood cells for the
diagnosis of Acute Lymphoblastic Leukemia (ALL).
The first step of pre-processing allows to eliminate the noise of the acquisition image using the
median filter. The second step is to extract the area of interest of the cell from the filtered
image. After the segmentation of the image using the Saliency map method, a post-processing is
necessary to make the salient region clearer, this step is based on morphological opening and
thresholding Then, a feature extraction step is performed to compute feature vectors for each
salient region. Finally these feature vectors are used to determine the presence or absence of the
LLA based on the SVM.
The tests performed on the ALL-IDB1 and ALL-IDB2 databases proved the performance of the proposed
method for the recognition and classification of different white blood cells.The accuracy reaches an
average rate of 97% for ALL-IDB1 and 100% for ALL-IDB2. Moreover, we have recorded, for the same
bases ALL-IDB1 and ALL-IDB2, a value of the area under the curve (AUC) equal to 0.951 and 0.984
respectively. These values are also higher than those obtained by the methods used in the
literature.
16:32-16:34
cfDNAPro: An R/Bioconductor package for robust and reproducible
data analysis of cell-free DNA fragmentomic features
Format: Live from venue
Moderator(s): Jan Byška
Haichao Wang, Cancer Research UK Cambridge Institute, University of
Cambridge and Cancer Research UK Cambridge Centre, Cambridge, United Kingdom
Paulius Mennea, Cancer Research UK Cambridge Institute, University of Cambridge and Cancer
Research UK Cambridge Centre, Cambridge, United Kingdom
Elkie Chan, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, Hong
Kong
Wendy N Cooper, Cancer Research UK Cambridge Institute, University of Cambridge and Cancer
Research UK Cambridge Centre, Cambridge, United Kingdom
Nitzan Rosenfeld, Cancer Research UK Cambridge Institute, University of Cambridge and Cancer
Research UK Cambridge Centre, Cambridge, United Kingdom
Hui Zhao, Cancer Research UK Cambridge Institute, University of Cambridge and Cancer Research UK
Cambridge Centre, Cambridge, United Kingdom
Background
Cell-free DNA (cfDNA) in human body fluids exhibits characteristic fragmentation patterns, which can
be exploited to support sensitive cancer detection and monitoring. However, fragmentomic analysis is
easily biased by various biological experimental and analytical factors such as choice of library
preparation kit and data processing software configuration. The field lacks specialized tools that
attenuate biases and standardise the cfDNA fragmentomic analysis. Here, we present an open-source
Bioconductor/R package, cfDNAPro, which provides feature curation and visualization utilities for
fragmentomic analysis of paired-end sequencing data from cfDNA.
Results
The cfDNAPro R package allows users to produce visualisations which assist analysis whilst
controlling for input biases. It implements parameterised quality control and bias curation steps,
to ensure reproducibility and comparability of results. Starting from bam file, cfDNAPro annotates
each fragment with meta information, which not only establishes essential foundation for fragment
length, fragment motif and copy number analysis, but also supports a framework for cfDNA
fragmentomics studies within the Bioconductor ecosystem.
Conclusion
cfDNAPro clarifies analytical challenges in the liquid biopsy field and proposes a standard for bias
correction of cfDNA data within Bioconductor/R ecosystem. By offering fundamental utilities, it
empowers further advanced methodological development in the broader study area of liquid biopsy.
16:35-16:37
PICKLUSTER: A protein-interface clustering and analysis plug-in
for UCSF ChimeraX
Format: Live from venue
Moderator(s): Jan Byška
Luca Genz, Leibniz Institut für Virologie, Universität Hamburg,
Centre for Structural Systems Biology, Germany
Thomas Mulvaney, 1Leibniz Institut für Virologie,Universitätsklinikum Hamburg Eppendorf,Centre
for Structural Systems Biology, Germany
Maya Topf, 1Leibniz Institut für Virologie,Universitätsklinikum Hamburg Eppendorf,Centre for
Structural Systems Biology, Germany
Protein complexes are key components in the majority of biological processes within the cells. The
identification and characterization of the protein interfaces in these complexes is crucial for
understanding of the mechanisms of molecular recognition. Furthermore, the inhibition of the protein
complex formation by targeting the interface has the potential of becoming important in drug
development.
However, large protein interfaces can consist of multiple interacting domains that are geometrically
separated, posing a challenge for targeting the entire interface using drugs. In addition, previous
research has shown the importance of small binding pockets in the protein interface to increase the
selectivity of protein interface binders.
Therefore, a division of the interface into smaller sub-interfaces based on their spatial properties
could facilitate targeting the protein interface.
Here, we developed PICKLUSTER, a plug-in for the molecular visualization program UCSF ChimeraX 1.4
that clusters protein interfaces based on distance and provides various scoring metrics for the
analysis of the interface, not only of structures of protein complexes but also of models generated
by AlphaFold. By fragmentation of the protein interface, PICKLUSTER offers a more focused and
potentially more useful approach for targeting protein-protein interfaces.
16:38-16:40
Latent State Estimation of Cancer Patients Treated with
Nivolumab Using Deep State Space Model
Format: Live from venue
Moderator(s): Jan Byška
Aya Nakamura, Graduate School of Medicine, Kyoto University, Japan
Ryosuke Kojima, Graduate School of Medicine, Kyoto University, Japan
Yuji Okamoto, Graduate School of Medicine, Kyoto University, Japan
Yohei Mineharu, Graduate School of Medicine, Kyoto University, Japan
Yohei Harada, Graduate School of Medicine, Kyoto University, Japan
Mayumi Kamada, Graduate School of Medicine, Kyoto University, Japan
Yasushi Okuno, Graduate School of Medicine, Kyoto University, Japan
Analyzing electronic health records (EHRs) considering temporal changes in patient status has long
been addressed to improve strategies of patient treatment. Our goal is to estimate informative
latent states from EHR laboratory data and analyze typical time-series patterns of patient status
changes, focusing on cancer patients treated with nivolumab, an effective anticancer drug. We
propose a framework that includes a method for visualizing and interpreting the latent space using a
deep state space model. Our framework consists of state-space model training using deep Kalman
filter (DKF), latent state estimation, clustering, and visualization. Such clustering and
visualization help users interpret the latent space.
We applied our framework to time-series data of cancer patients who received the nivolumab
anticancer drug at Kyoto University Hospital. By comparing the results retroactively from death, we
succeeded in capturing special situations such as relatively safe situations and emergencies. Also,
compared with other methods such as the variational autoencoder (VAE) and principal component
analysis (PCA), our framework achieved clearer latent states. In addition to these results, we
successfully extracted specific laboratory items characterizing the change of latent states such as
lymphocytes, neutrophils, and items related to anemia.
16:41-16:43
Interactive Visualization of Gene Sets in Pangenomes
Format: Live from venue
Moderator(s): Jan Byška
Astrid van den Brandt, Eindhoven University of Technology,
Netherlands
Sandra Smit, Wageningen University & Research, Netherlands
Eef M. Jonkheer, Wageningen University & Research, Netherlands
Huub van de Wetering, Eindhoven University of Technology, Netherlands
Anna Vilanova, Eindhoven University of Technology, Netherlands
Comparing the way genes are organized on genomes is common in comparative genomics to understand
their evolutionary history and potential functional variations. Pangenomes are beneficial for such
comparisons but require visual analytics to support exploration of gene organization in a genomic
context.
Genomics researchers often characterize gene organization by conservation of gene order across
genomes (i.e., synteny) and sequence similarity. Other features of the genes are also considered,
such as their orientation, presence-absence variations, and the sequence context of neighboring
genes. Their combination yields valuable insights into gene organization patterns. Deviations from
the majority can reveal important biological variations or indicate annotation errors, both valuable
to discover.
Typical synteny tools use gene order and sequence similarity to compute and visualize conserved gene
order as blocks, enabling global pattern inspection. However, scientists need tools that allow for
interactive exploration of gene sets based on various gene and sequence features to understand
different arrangements and conservation relations.
We present GeneSets, an interactive visual interface for exploring gene organizations in genomic
neighborhoods. With various feature parameter settings matched to different visual representations,
users can compare the sets' arrangement from multiple perspectives. GeneSets is demonstrated using
an important gene family in a potato pangenome.
Phylogenetic analysis often results in numerous phylogenetic trees, generated by utilizing multiple
genes or methods or by conducting bootstrapping or Bayesian analysis, say. A consensus tree is used
to summarize a set of trees. Consensus networks are also proposed to summarize the collection of
phylogenetic trees. Such networks have the potential to display incompatibilities among the input
trees. However, interpreting those networks can be challenging due to the significant number of
nodes and edges and their non-planar structure. Here, we present the new concept of a phylogenetic
consensus outline which is a new type of network but is significantly less complex than previous
ones. We introduce an efficient algorithm for its computation. The main idea is to use a PQ-tree
data structure to decide which splits to keep in the consensus. This consensus uses a phylogenetic
outline for visualization. A phylogenetic outline is a network that represents a set of circular
splits as an outer-labeled planar graph. We illustrate phylogenetic outline and consensus outline
and their usage and explore how they compare to other methods on draft genomes of different assembly
qualities, and on multiple gene trees from a published study on water lilies, respectively.
16:47-16:49
CCPlotR: An R package for the visualisation of cell-cell
interactions
The increasing availability of single-cell RNA-seq data in recent years has led to the development of
multiple tools for predicting cellular crosstalk. These tools typically work by analysing the
expression of genes that code for ligands and receptors from pairs of cell types that are known to
interact with each other. Most tools will return a table of predicted interactions depicting the
ligand, receptor, sending and receiving cell types for each interaction, as well as a score to rank
important interactions. Some tools also generate plots to visualise the predicted interactions but
these are not consistent across tools and since most datasets include several cell populations,
visualisation can be challenging. Here we present CCPlotR - an R package that contains functions to
generate cell-cell interaction plots. CCPlotR can generate several types of plots such as heatmaps,
dotplots, circos plots and network diagrams, and works with the output of any cell-cell interaction
prediction tool, requiring only a table of predicted interactions as input. The package is available
on GitHub: https://github.com/Sarah145/CCPlotR and comes with a toy dataset to demonstrate the
different functions. We anticipate that this will be a useful resource for single-cell researchers
working on cell-cell interactions.
16:50-16:52
VIBE: An R package for the Visualization and Exploration of Bulk
mRNA Expression data to prioritize cancer types for drug discovery
The public databases for bulk mRNA sequencing data (e.g., TCGA and GTEx) offer a basic visualization
to rank tumor types based on the expression of single genes. However, no statistical or visual
understanding of content beyond a single gene is provided, and neither are biological pathway- or
cell-specific gene signatures, which are often of great relevance to characterize the tumor
microenvironment relative to the actual target expression. Here, we demonstrate VIBE: an R package
that offers a wide range of functions to allow researchers to visualize, explore, and interpret bulk
mRNA sequencing data sets (Figure 1). Furthermore, this package presents users with in-depth
visualizations of individual and cohort-level summaries, such as concordant or discordant over- or
under-expression of two genes or pathways using FACS-like scatterplots, the prevalence of subjects
within and across tumor types exhibiting over- or under-expression, and correlation or co-expression
between genes and gene signatures. In contrast to manual analysis, VIBE provides a comprehensive
view of targeted pathways that can better improve the understanding of patient subsets and reduce
the time and effort spent on assessing expression patterns of one or two genes or pathways in
multiple tumor types.
16:53-16:55
Interactive visualisation for chromatin interaction
networks
Format: Live from venue
Moderator(s): Jan Byška
Sandra Siliņa, Institute of Mathematics and Computer Science, University of Latvia, Latvia
Andrejs Sizovs, Institute of Mathematics and Computer Science, University of Latvia, Latvia
Gatis Melkus, Institute of Mathematics and Computer Science,
University of Latvia, Latvia
Peteris Rucevskis, Institute of Mathematics and Computer Science, University of Latvia, Latvia
Juris Viksna, Institute of Mathematics and Computer Science, University of Latvia, Latvia
In recent years a lot of research has focused on chromatin interactions, and Hi-C data analysis plays
an important role in understanding gene regulation. Various software tools have been developed to
analyse chromatin interaction data, including visualisations that allow a more rapid understanding
of the overall information on these interactions.
We present a software tool designed to aid analysis of chromatin interaction data represented as
graphs. It focuses on identification and exploration of connected components found in some tissues
but not others. The visualisation consists of two main parts – a crossfilter and a graph section.
The crossfilter displays general information about the connected components and enables users to
quickly assess the overall characteristics of the identified components. Users can also filter data
based on multiple parameters to see information about a subset of the components. The graph section
reveals more detailed information – interactions present in the component, genes and proteins
associated to chromatin segments, and more.
Currently preprocessed data for two datasets of 10 human tissue types is available. The preprocessed
data sets have an average of about 100000 links per tissue type depending on chosen threshold. A
more universal pipeline for other datasets is in active development.
16:56-16:58
3D modeling of Hi-C contacts: seeing the spatial organization of
fungal chromosomes
Format: Live from venue
Moderator(s): Jan Byška
Thibault Poinsignon, Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université
Paris‐Saclay, 91198 10 Gif‐sur‐Yvette, France
Mélina Gallopin, Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université
Paris‐Saclay, 91198 10 Gif‐sur‐Yvette, France
Pierre Grognet, Institute for Integrative Biology of the Cell
(I2BC), CEA, CNRS, Université Paris‐Saclay, 91198 10 Gif‐sur‐Yvette, France
Fabienne Malagnac, Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université
Paris‐Saclay, 91198 10 Gif‐sur‐Yvette, France
Gaëlle Lelandais, Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université
Paris‐Saclay, 91198 10 Gif‐sur‐Yvette, France
Pierre Poulain, Université Paris Cité, CNRS, Institut Jacques Monod, F-75013 Paris, France
The spatial architecture of genomes in the nuclei is essential to study their function. The number of
studies incorporating Hi-C approaches to probe interactions inside the chromatin is growing and, at
the same time, the field for developing 3D modeling computational methods offers numerous solutions.
However, this opportunity for original and insightful modeling is under exploited, and most fungal
Hi-C studies don’t propose 3D models of chromatin contact networks. Our aim was to further promote
the association between 3D modeling of the chromatin along contact maps in fungal Hi-C studies. To
do so, we present here precise examples of 3D modeling of entire genomes and a software to automate
this approach. Our analysis process was assembled into a modular workflow based on state-of-the-art
analysis and modeling software that goes from Hi-C raw data to fully annotated 3D models. With this
workflow, we re-analysed public Hi-C datasets available in three model species to visualize them in
a 3D context. New models of the organization of those three emblematic fungal genomes were
generated, summarizing the known features of fungal genomes into rich and integrated illustrations
while also offering new insight into Saccharomyces cerevisiae cohesin anchor regions organization.
16:58-17:00
Understanding the contribution of immature myeloid cells to
early melanoma establishment
Format: Live from venue
Moderator(s): Jan Byška
Xiaoyu Hou, University of Queensland Diamantina Institute (UQDI)
and University of Queensland Faculty of Medicine, Australia
James Wells, University of Queensland Diamantina Institute (UQDI) and University of Queensland
Faculty of Medicine, Australia
In cancer, immature myeloid cells are known to be recruited to the tumour microenvironment and to
differentiate into myeloid-derived suppressor cells with a potent ability to suppress various types
of immune responses. The precise mechanisms that drive this differentiation remain unclear. In the
laboratory, myeloid-derived suppressor cell differentiation is known to be induced through the
interaction of peripheral blood mononuclear cells with melanoma cell lines. However, the absolute
requirement for these cells in supporting the establishment of melanoma is not well understood.
In my project, I am investigating whether immature myeloid cells play a fundamental role in tumour
development by exploring the direct and indirect capacity of immature myeloid cells to impact tumour
establishment and early growth. During this project, the immature myeloid cells were depleted before
the tumour challenge to determine the effects of depletion on tumour establishment. The Nanostring
GeoMx digital spatial profiling is used to discovery an approach to defining where immature myeloid
cells are physically located within emerging tumours, which cells they interact with, and which
growth and angiogenic factors they produce. These observations could lead to novel approaches to
exploit these insights to stop melanoma at an early stage.
17:00-17:50
Invited Presentation: Keynote Presentation: Do we still need
molecular graphics?
To what extent do we still need molecular graphics in today’s scientific landscape? Despite its long
history and high level of maturity, molecular graphics remains an indispensable tool for scientific
understanding and discovery, constantly facing new challenges. These currently include the ability
to visualize complex molecular relationships and interactions, with the goal of scaling the size and
scope of systems currently under investigation. In addition, molecular graphics programs must adapt
to experimental (r)evolutions to remain relevant in the field.
However, it is not enough to simply produce high quality visualizations. Scientists must be able to
share these complex visualizations with their colleagues and with experts in other fields. The
latest tools in molecular graphics, such as AI- and data-driven approaches, interactive simulations,
augmented, mixed, and virtual reality, offer new ways to visualize and interact with molecular data
and models. These advances enable researchers to explore scientific questions in new and innovative
ways, but need to be more widely adopted to become routine in scientific investigations.