BioVis@ISMB/ECCB 2023 Program

July 25, 2023

Invited Speakers

Do we still need molecular graphics?

Marc Baaden
Marc Baaden

Marc Baaden, Institut de Biologie Physico-Chimique, Paris

Abstract: To what extent do we still need molecular graphics in today’s scientific landscape? Despite its long history and high level of maturity, molecular graphics remains an indispensable tool for scientific understanding and discovery, constantly facing new challenges. These currently include the ability to visualize complex molecular relationships and interactions, with the goal of scaling the size and scope of systems currently under investigation. In addition, molecular graphics programs must adapt to experimental (r)evolutions to remain relevant in the field.

However, it is not enough to simply produce high quality visualizations. Scientists must be able to share these complex visualizations with their colleagues and with experts in other fields. The latest tools in molecular graphics, such as AI- and data-driven approaches, interactive simulations, augmented, mixed, and virtual reality, offer new ways to visualize and interact with molecular data and models. These advances enable researchers to explore scientific questions in new and innovative ways, but need to be more widely adopted to become routine in scientific investigations.

Speaker Biography: Marc Baaden, research director at the CNRS in Paris, is a computational chemist working in the field of structural bioinformatics. His research focuses on interactive molecular modeling approaches for biological systems and has included virtual reality approaches since 2007, then Citizen Science, and more recently the Internet of Things. He develops scientific visualization approaches as well as original tools related to Big Data and immersive analytics using virtual reality equipment. Using the Unity game engine, he has designed the UnityMol platform as a development framework for academic contexts and for collaboration with industry partners and the public. His research combines simulations of biological macromolecules and bioinformatics with high-performance computing, virtual reality, visualization, and dissemination activities.

A view on Visual Analytics for Biomedical Applications

Anna Vilanova
Prof. Anna Vilanova

Anna Vilanova, Eindhoven University of Technology, Eindhoven

Abstract: Visual analytics is a branch of visualization that focuses on analytical reasoning facilitated by interaction and visual representations. Visual analytics is an extension to AI methods. It is also a complement to the already existing visualization techniques by the introduction of the concepts of reasoning and AI. Interaction and enhancement of human reasoning and decision making are central. The research in my group has focused on visual analytics for data exploration, hypothesis generation and understanding for biomedical applications, such as, virtual colonoscopy, diffusion weighted imaging for brain white-matter and muscle, 4D blood flow analysis, radiotherapy, single cell analysis and Pangenomes. For these purposes, we developed interactive visual analysis strategies, including uncertainties, and facilitating the analysis of cohort data. We incorporated concepts of progressive visual analytics and the use of dimensionality reduction as an effective VA component for large data visual analysis for these applications. In my talk, I will present multiple examples of our developments on biomedical applications, the common visual analytics concepts and design strategies we took for the design of those applications, and a glance on lessons learned and open challenges.

Speaker Biography: Prof.Dr. Anna Vilanova is full professor in visual analytics (vis.win.tue.nl) since October 2019, at the department of Mathematics and Computer Science, at the Eindhoven University of Technology (TU/e). She is also associated to the Electrical Engineering department within the Signal Processing Systems at TU/e. Previously she was associate professor for 6 years at the Computer Graphics & Visualization Group at EEMCS at the University of Deft, the Netherlands. From 2002 to August 2013, she was Assistant Professor at the Biomedical Image Analysis group of the Biomedical Engineering Department at TU/e. She is leading a research group in the subject of visual analytics and multivalued image analysis and visualization, focusing on Visual Analytics for high dimensional data. She focuses on Biomedical applications, Diffusion Weighted Imaging and 4D Flow. Her research interests include visual analytics, medical visualization, volume visualization, multivalued visualization, and medical image analysis. In 2005, she was awarded a NWO-Veni personal grant with title “Visualization of global tensor information for diffusion tensor imaging”. In 2013 she got a NWO-Aspasia. She is member of the international program committee of several conferences (e.g., IEEE Visualization and EG- IEEE VGTC-EuroVis). She has been chair and editor of relevant conferences and journals in her field of research (e.g., EuroVis 2008, Computer & Graphics, Computer Graphics Forum, IEEE Vis). She was member of the steering committee of IEEE VGTC EuroVis (2014 -2018) and VCBM since 2018. She is elected member of the EUROGRAPHICS executive committee since 2015 and vice president of EUROGRAPHICS since 2019. She also became EUROGRAPHICS fellow in 2019. She is elected member of IEEE VIS Steering Committee (VSC) since 2021.

Program

Schedule subject to change. All times listed are in CEST

Tuesday, July 25th

10:30-10:35
Opening
Format: Live from venue
Speakers: Jan Byška and Michael Krone
10:35-11:30
Invited Presentation: Keynote Presentation
Format: Live from venue
Moderator(s): Michael Krone
  • Anna Vilanova
11:30-11:50
The Molecular Control Toolkit: Controlling 3D molecular graphics via gesture and voice (Test of Time Award)
Format: Live from venue
Moderator(s): Michael Krone
  • Kenneth Sabir
  • Christian Stolte
  • Bruce Tabor
  • Seán I. O'Donoghue
11:50-12:10
Effective Comparison of Single-Cell Embedding Visualizations
Format: Live from venue
Moderator(s): Michael Krone
  • Trevor Manz, Harvard Medical School, United States
  • Fritz Lekschas, Ozette Technologies, United States
  • Evan Greene, Ozette Technologies, United States
  • Greg Finak, Ozette Technologies, United States
  • Nils Gehlenborg, Harvard Medical School, United States

Presentation Overview: Show

Visualizing high-dimensional single-cell data with low-dimensional embedding spaces uncovers complex cell phenotype relationships and provides dataset overviews. However, current pairwise comparison methods for embeddings require shared point correspondences and struggle to reveal meaningful similarities and differences between embeddings. We introduce a novel hierarchical framework that enables comparison of distinct single-cell datasets without point correspondence and alternative embeddings of the same data. By leveraging shared label hierarchies, our approach allows for global and local difference comparison between cell types in different embeddings. Our framework utilizes a three-step process to analyze label confusion, neighborhood stability, and relative cell type abundance. These properties help scientists effectively discover differences between embeddings. For instance, label confusion highlights intermixed cell types, neighborhood stability characterizes neighboring cell type composition, and cell type abundance surfaces differentially-abundant cell types. We derive these properties using set-based similarity metrics and implement them via a Delaunay graph traversal. We developed a Python-based prototype for Jupyter Notebook-like coding environments to demonstrate our framework's usefulness. In our talk, we will showcase single-cell surface proteomics embedding use cases, comparing different embedding methods of the same experiment and highlighting cell type abundance changes.

12:10-12:20
scDEED: a statistical method for detecting dubious 2D single-cell embeddings
Format: Live from venue
Moderator(s): Michael Krone
  • Lucy Xia, Hong Kong University of Science and Technology, Hong Kong
  • Christy Lee, University of California, Los Angeles, United States
  • Jingyi Jessica Li, University of California, Los Angeles, United States

Presentation Overview: Show

Two-dimensional (2D) embedding methods are crucial for single-cell data visualization. Popular methods such as t-SNE and UMAP are commonly used for visualizing cell clusters; however, it is well known that t-SNE and UMAP’s 2D embedding might not reliably inform the similarities among cell clusters. Motivated by this challenge, we developed a statistical method, scDEED, for detecting dubious cell embeddings output by any 2D embedding method. By calculating a reliability score for every cell embedding, scDEED identifies the cell embeddings with low reliability scores as dubious and those with high reliability scores as trustworthy. Moreover, by minimizing the number of dubious cell embeddings, scDEED provides intuitive guidance for optimizing the hyperparameters of an embedding method. Applied to multiple scRNA-seq datasets, scDEED demonstrates its effectiveness for detecting dubious cell embeddings and optimizing the hyperparameters of t-SNE and UMAP.

12:20-12:30
GAZE-Shiny: comprehensive and interactive visualization of transcriptional regulation in single-cell resolution
Format: Live from venue
Moderator(s): Michael Krone
  • Shamim Ashrafiyan, Goethe University Frankfurt, Germany
  • Fatemeh Behjati Ardakani, Goethe University Frankfurt, Germany
  • Dennis Hecker, Goethe University Frankfurt, Germany
  • Marcel Schulz, Goethe University Frankfurt, Germany

Presentation Overview: Show

Data visualization and exploration are crucial for the interpretation of vast biological datasets obtained through high-throughput assays, such as single-cell sequencing. single-cell data (sc-data) provides valuable insights into cellular function, phenotypic heterogeneity, and tissue development. Therefore, there is a need for an interactive and user-friendly software tool that enables biologists and scientists to work with such data efficiently. Hence, we have developed a user-friendly web application named GAZE-Shiny that enables easy visualization and exploration of (sc-data). GAZE-Shiny is based on the GAZE statistical framework that aggregates single cells into meta-cells and uses Machine Learning to infer transcription factor (TF) regulation from transcriptome and epigenome sc-data. GAZE-Shiny offers the user basic data representation elements for a specific gene or all clusters at the single-cell level or meta-cell level. Plus, it provides more intricate visualization of the TF-gene-cell associations calculated within the GAZE pipeline. For instance, it enables easy visualization and retrieval of important TFs or genes involving in transcriptional regulation at the single-cell level. Further, it can list the candidate regulators defined based on our specialized statistical tests and allows users to explore the single-cell TF regulation using interactive panels that focus on either genes or TFs.

13:50-14:10
Poincaré maps for visualization of large protein families
Format: Live from venue
Moderator(s): Aditeya Pandey
  • Anna Susmelj, Biognosys AG, Wagistrasse 21, 8952 Schlieren, Switzerland, Switzerland
  • Yani Ren, Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France, France
  • Yann Vander Meersche, Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France, France
  • Jean-Christophe Gelly, Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France, France
  • Tatiana Galochkina, Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France, France

Presentation Overview: Show

Due to constantly increasing amounts of the available protein data, comprehensive visualization of large protein families has become crucial for the analysis of protein evolution, function as well as for characterization of poorly described proteins. While classical protein family representation by a multiple sequence alignment (MSA) contains a great amount of information, visual analysis of MSA becomes quite challenging once the number of the considered proteins reaches important values. In the current study, we developed a new approach for protein family visualization named PoincaréMSA based on Poincaré maps. Poincaré maps projection combines hyperbolic embedding with geodesic distances calculated on the k-nearest neighbors graph, thus successfully reproducing complex hierarchies contained in the protein data. We demonstrate that PoincaréMSA preserves the structure of protein sequence space better than classical projection methods such as tSNE and UMAP. As we show on several examples of different protein families, PoincaréMSA is very efficient for visualization of complex protein family topologies as well as for evolutionary and functional annotation of uncharacterized sequences. PoincaréMSA is implemented as an open source Python code with available interactive Google Colab notebooks as described at https://www.dsimb.inserm.fr/POINCARE_MSA.

14:10-14:20
Seeing Beyond the Surface: The Continuous Development of Protein Design with Dalton
Format: Live from venue
Moderator(s): Aditeya Pandey
  • Mo Rahman, GeneDrop Inc., United States
  • Pascale Marill, GeneDrop Inc., United States

Presentation Overview: Show

Proteins are life's building blocks with structure and function linked intricately. New computational tools are creating novel proteins, leading to an explosive growth in protein design. Dalton is work in progress, cross-platform desktop application for protein design and visualization with powerful features to explore the boundless possibilities of the protein universe. This presentation showcases Dalton's capabilities for designing and visualizing custom proteins, highlighting its user-friendly interface and tools for designing and editing protein structures. It demonstrates how Dalton works with ColabFold to design proteins with specific features, analyzes predicted structures, and refine designs. Dalton also integrates with the KEGG API and KGML, visualizing metabolic pathways and protein-protein interaction networks. It can design proteins to interact with enzymes or pathways, offering potential for designing new enzymes with specific metabolic functions. Overall, Dalton is being developed to be a valuable asset for researchers and designers, providing powerful tools, a user-friendly interface, and flexible design features for exploring the complex world of protein design.

14:20-14:30
GVViZ: A physician-friendly bioinformatics application enabling interactive gene-disease data annotation, expression analysis, and visualization for translational research
Format: Live from venue
Moderator(s): Aditeya Pandey
  • Zeeshan Ahmed, Institute for Health, Health Care Policy and Aging Research. Rutgers, The State University of New Jersey., United States

Presentation Overview: Show

We emphasize that automated and interactive visualization should be an indispensable component of modern RNA-seq analysis, which is currently not the case. We introduce GVViZ; a new, robust, and user-friendly platform for RNA-seq-driven gene-disease data annotation, and expression analysis with dynamic heat map visualization. With successful deployment in clinical settings, GVViZ will enable high-throughput correlations between patient diagnoses based on clinical and transcriptomics data. Experts in clinics and researchers in life sciences can use GVViZ to visualize and interpret the transcriptomics data making it a powerful tool to study the dynamics of gene expression and regulation. GVViZ can assess genotype-phenotype associations among multiple complex diseases to find novel highly expressed genes. We have evaluated its clinical impact for different chronic diseases including Alzheimer’s disease, arthritis, asthma, diabetes mellitus, heart failure, hypertension, obesity, osteoporosis, and multiple cancer disorders.

14:30-14:50
Visualizing (differential) expression patterns with fuzzy concepts as FlowSets
Format: Live from venue
Moderator(s): Aditeya Pandey
  • Felix Offensperger, Ludwig Maximilian University of Munich, Germany
  • Markus Joppich, Ludwig Maximilian University of Munich, Germany
  • Ralf Zimmer, Ludwig Maximilian University of Munich, Germany

Presentation Overview: Show

High-throughput (sequencing) data set are becoming increasingly popular, and there is a need for more advanced tools to analyze large amounts of data with complex dimensions, e.g., from scRNA-seq or bulk RNA-seq. Here, we present the FlowSets framework as a new method for analyzing expression data from ordered and unordered measurements of (possibly) multiple modalities, using fuzzy concepts to encode signal as linguistic variables. FlowSets provides a time- and order-independent analysis of gene expression data, focusing on specific genes or expression patterns of interest. Using fuzzy concepts allows for easier interpretation of gene expression values, avoiding the use of thresholding, and making gene set over-representation analysis more efficient. We compared FlowSets to a WGCNA-based analysis on a simulated dataset and found that all fuzzy FlowSets methods could identify genes belonging to all regulated patterns with high precision and recall. We also applied the method to a public scRNA-seq dataset of monocytes from (non-)pneumonic COVID patients and successfully recapitulated previous findings. The FlowSets framework is a promising tool for analyzing complex gene expression data and may prove useful in large-scale studies with many replicates.

14:50-15:00
RIVET: A visual interactive browser for tracking and curating SARS-CoV-2 recombinants
Format: Live from venue
Moderator(s): Aditeya Pandey
  • Kyle Smith, University of California, San Diego, United States
  • Cheng Ye, University of California, San Diego, United States
  • Yatish Turakhia, University of California, San Diego, United States

Presentation Overview: Show

Recombination has been shown to be a significant contributor to the genetic diversity in SARS-CoV-2, however the task of manually curating putative recombinants from thousands of new sequences being uploaded online daily suffers from weeks of delay, and poses a major bottleneck to real-time surveillance efforts. RIVET is a software pipeline and visualization platform that builds on recent algorithmic advances in recombination inference to comprehensively and sensitively search for potential SARS-CoV-2 recombinants. RIVET's public web interface provides a suite of interactive visualization and analysis tools that allows expert curators to visually scan through thousands of newly detected putative recombinants and quickly prioritize high confidence recombinants of interest to track or designate. RIVET provides integration with several other established tools, such as UShER and Taxonium, and in the future will be combined with Autolin to completely automate the process of lineage designation, including recombinant lineages.

15:00-15:10
PhosNetVis: A Web-Based Platform for Kinase Enrichment Analysis and Visualizing Phosphoproteomics Networks
Format: Live from venue
Moderator(s): Aditeya Pandey
  • Berk Turhan, Icahn School of Medicine at Mount Sinai, United States
  • Irene Font Peradejordi, Jacobs Technion-Cornell Institute at Cornell Tech and Department of Information Science at Cornell University, United States
  • Shreya Chandrasekar, Jacobs Technion-Cornell Institute at Cornell Tech and Department of Information Science at Cornell University, United States
  • Selim Kalayci, Icahn School of Medicine at Mount Sinai, United States
  • Jeffrey Johnson, Icahn School of Medicine at Mount Sinai, United States
  • Mehdi Bouhaddou, Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, United States
  • Zeynep Gümüş, Mount Sinai School of Medicine, United States

Presentation Overview: Show

Protein phosphorylation is a crucial cellular signaling process, where a kinase modifies a protein residue. Multiple kinases can alter various sites on a substrate protein. To better understand human cellular systems in health and disease, researchers are gathering extensive data on the abundance and phosphorylation sites and states of thousands of proteins, as we analyze within the Human Immunology Project Consortium (HIPC) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC). Although there are tools available to infer kinase-substrate interactions (KSIs) from proteomics datasets, there is a need for interactive exploration of the resulting KSI networks, simultaneously with the phosphorylation sites and states of each substrate protein across multiple experiments and time points. To address this need, we present PhosNetVis, a web-based tool that streamlines multiple phosphoproteomics data analysis steps within a single platform to enable easy inference, generation, and interactive exploration of KSI networks. With PhosNetVis, users can run Kinase Enrichment Analysis (KEA) to detect significantly enriched kinases in their datasets and visually explore their resulting networks. This helps in identifying key kinase-substrate interactions (KSIs) and lower the difficulties of interpreting phosphoproteomics data, leading to faster and better biological insights. PhosNetVis is open-sourced on GitHub, and available at phosnetvis.app

15:10-15:30
Visualizing temporal and multi-regional evolution of tumor subclones with Jellyfish plots
Format: Live from venue
Moderator(s): Aditeya Pandey
  • Kari Lavikka, University of Helsinki, Finland
  • Ilari Maarala, University of Helsinki, Finland
  • Jaana Oikkonen, University of Helsinki, Finland
  • Yilin Li, University of Helsinki, Finland
  • Alexandra Lahtinen, University of Helsinki, Finland
  • Sampsa Hautaniemi, University of Helsinki, Finland

Presentation Overview: Show

As a tumor in cancer grows and spreads, it undergoes a process of clonal evolution, resulting in multiple subclones with different genetic and phenotypic characteristics. Here, we present Jellyfish, a visualization design and a software package that automates the visualization of the evolutionary relationship of these subclones and their contribution to the clonal compositions of tumor samples. Unlike other visualization designs, such as fishplot, Jellyfish incorporates both the temporal and spatial dimensions, allowing for the comparison of multiple tissue samples taken multi-regionally from different parts of the patient's body at the same or different time points. Such visualization allows for gaining an overview of the evolution, dispersal, and coexistence of the subclones in metastasized tumors. To render the plots, Jellyfish uses a graph-theory-based method to process phylogenetic and clonal composition data, which can be generated using external tools like ClonEvol. We provide an overview of the design elements and the software architecture of Jellyfish and present examples of how it has been used to visualize subclonal evolution in high-grade serous carcinoma patients belonging to the DECIDER clinical trial (https://www.deciderproject.eu/).

16:00-16:10
Best Practices for the Design of Health Dashboards
Format: Live from venue
Moderator(s): Jan Byška
  • Melina Malkani, Bullis School, United States
  • Dillon Malkani, Bullis School, United States

Presentation Overview: Show

Since 2020, health informaticians have developed and enhanced public-facing health dashboards worldwide. The improvement of dashboards implemented by health informaticians will ultimately benefit the public in making better healthcare decisions and improve population-level healthcare outcomes. The authors evaluated 100 US city, county, and state government health dashboards and identified the top 10 best practices to be considered when creating a public health dashboard. These features include 1) easy navigation, 2) high usability, 3) use of adjustable thresholds, 4) use of diverse chart selection, 5) compliance with the Americans with Disabilities Act, 6) use of charts with tabulated data, 7) incorporated user feedback, 8) simplicity of design, 9) adding clear descriptions for charts, and 10) comparison data with other entities. To support their findings, the authors also conducted a survey of 118 randomly selected individuals in six states and the District of Columbia that supports these top 10 best practices for the design of health dashboards.

16:10-16:20
Interactive and effective visualization framework for interpreting and exploring cellular communication data
Format: Live from venue
Moderator(s): Jan Byška
  • Giulia Cesaro, University of Padua, Italy
  • Giacomo Baruzzo, University of Padova, Italy
  • Barbara Di Camillo, University of Padova, Italy

Presentation Overview: Show

The recent advance in single-cell transcriptomics has enabled the study of cellular communication and several computational tools have been developed for inferring cell-cell interactions from single-cell RNA sequencing data. On the other hand, visualization and interpretation of results is still an open issue in this research area, due to the increasing complexity of datasets, comprising different conditions, different subjects and time series studies, and the inherently multi-dimensional characteristics of cellular communication data including both intercellular and intracellular signalling. Here we present CClens, an interactive Rshiny app that supports scientists in analysing and exploring cell-cell communication results. The app can handle data from all the main bioinformatics tool for inferring and scoring cell-cell communication (e.g. scSeqComm, SingleCellSignalR, CellChat). Moreover, it includes i) multiple filtering options to dynamically and interactively inspect data, ii) a powerful and effective visualization framework for summarizing cellular communication data, and iii) advanced visualization tools to analyse even complex datasets (e.g. multi-condition) on all their dimensions (e.g. ligand-receptor binding, patient-specific interactions). This interactive and visual way to inspect data provide a user-friendly, accessible (no-code), flexible and powerful tool for exploring the richness of current cell-cell communication data, and easily extract the biological information contained in such data.

16:20-16:22
Automated diagnosis of ear disease using ensemble deep learning with a big otoendoscopy image database
Format: Live from venue
Moderator(s): Jan Byška
  • Shin Mi Hwa, Department of Otorhinolaryngology, Yonsei University College of Medicine, South Korea
  • Choi Jae Young, Department of Otorhinolayngology, Yonsei University College of Medicine, South Korea
  • Shin Mi Hwa, Department of Otorhinolaryngology, Yonsei University College of Medicine, South Korea
  • Park Haeng Ran, Department of Otorhinolaryngology, Yonsei University College of Medicine, South Korea

Presentation Overview: Show

Background: Ear and mastoid disease can easily be treated by early detection and appropriate medical care. However, short of specialists and relatively low diagnostic accuracy calls for a new way of diagnostic strategy, in which deep learning may play a significant role. The current study presents a machine learning model to automatically diagnose ear disease using a large database of otoendoscopic images acquired in the clinical environment. Methods: Total 10,544 otoendoscopic images were used to train nine public convolution-based deep neural networks to classify eardrum and external auditory canal features into six categories of ear diseases, covering most ear diseases. After evaluating several optimization schemes, two best-performing models were selected to compose an ensemble classifier, by combining classification scores of each classifier. Findings: According to accuracy and training time, transfer learning models based on Inception-V3 and ResNet101 were chosen and the ensemble classifier using the two models yielded a significant improvement over each model, the accuracy of which is in average 93·67% for the 5-folds cross-validation. database. Interpretation: The current study is unprecedented in terms of both disease diversity and diagnostic accuracy, which is compatible or even better than an average otolaryngologist.

16:23-16:25
Topological Data Analysis and Persistence Theory Applications to Heart Arrhythmia
Format: Live from venue
Moderator(s): Jan Byška
  • Justin Zhang, Bergen County Academies, United States
  • William Song, Bergen County Academies, United States
  • Giacomo Pugliese, Bergen County Academies, United States

Presentation Overview: Show

Our research project utilizes rigorous techniques from topological data analysis, a type of data analysis based on a mathematical field known as algebraic topology and machine learning, to computationally visualize and analyze electrocardiogram (ECG) data of patients with various heart conditions, including Ventricular Tachycardia, Ventricular Flutter, and Ventricular Fibrillation. By leveraging sophisticated analysis tools such as persistent homology and simplicial complexes, including the Vietoris Rips Complex, we obtain a highly precise modeling of the ECG data, enabling us to distinguish between the ECG data of patients suffering from these debilitating illnesses from healthy individuals. Our novel approach involves extracting critical geometric features from the persistence diagrams and images obtained from our persistent homologies of the patient ECG data. These features are then input into an algorithm based on our topological data analysis, enabling us to classify, with virtually complete accuracy, which of these three conditions a patient is suffering from. Our research project represents a key step forward in the field of heart disease diagnosis, potentially offering a non-invasive, highly accurate method for diagnosing the hundreds of thousands of patients suffering from these conditions.

16:26-16:28
Single Cell Data Analysis Made Easy: scDisco an App for Non-Experts
Format: Live from venue
Moderator(s): Jan Byška
  • Jakub Widawski, Boehringer Ingelheim Pharma GmbH & Co. KG; Ardigen, Poland
  • Christoph Ogris, Boehringer Ingelheim Pharma GmbH & Co. KG, Germany
  • Nathan Lawless, Boehringer Ingelheim Pharma GmbH & Co. KG, Germany
  • Jan Jensen, Boehringer Ingelheim Pharma GmbH & Co. KG, Germany
  • Fidel Ramirez, Boehringer Ingelheim Pharma GmbH & Co. KG, Germany

Presentation Overview: Show

Background: As single cell data analysis becomes increasingly prevalent, the challenge of exploring, summarizing, and communicating results grows with the sheer volume of information. While existing software tends to focus on analysis around 2D scatterplots based on dimensionality reduction algorithms, custom-made comparative analyses are often required for more complex analyses. Results: To address the challenges in the currently available software for analyzing single cell RNA sequencing data, we developed an app that is easy to install and use, allowing non-experts to explore and consolidate results. The app generates visualizations of various types, like dot plots, to compare the expression of multiple genes across conditions or quantify the differences in cell proportions between samples, among multiple other functionalities. The software, named scDisco for single cell discovery app, has been tested for a year and is frequently used in our organization, making it a mature product that we believe can benefit the single cell community. Conclusion: Our app provides a user-friendly solution for exploring and comparing single cell data, making it accessible to non-experts. We hope that this tool will facilitate the analysis and interpretation of single cell data, ultimately leading to new insights and discoveries.

16:29-16:31
Automated Acute Lymphoblastic Leukemia Detection and classification using Saliency Map
Format: Live from venue
Moderator(s): Jan Byška
  • Khaoula Elbedoui, ENICarthage- LMTIC Tunisia, Tunisia
  • Asma Bouhamed, ENICarthage Tunisia, Tunisia
  • Chokri Maktouf, Institut Pasteur de Tunis Tunisia, Tunisia

Presentation Overview: Show

We propose an automated method of segmentation and classification of white blood cells for the diagnosis of Acute Lymphoblastic Leukemia (ALL). The first step of pre-processing allows to eliminate the noise of the acquisition image using the median filter. The second step is to extract the area of interest of the cell from the filtered image. After the segmentation of the image using the Saliency map method, a post-processing is necessary to make the salient region clearer, this step is based on morphological opening and thresholding Then, a feature extraction step is performed to compute feature vectors for each salient region. Finally these feature vectors are used to determine the presence or absence of the LLA based on the SVM. The tests performed on the ALL-IDB1 and ALL-IDB2 databases proved the performance of the proposed method for the recognition and classification of different white blood cells.The accuracy reaches an average rate of 97% for ALL-IDB1 and 100% for ALL-IDB2. Moreover, we have recorded, for the same bases ALL-IDB1 and ALL-IDB2, a value of the area under the curve (AUC) equal to 0.951 and 0.984 respectively. These values are also higher than those obtained by the methods used in the literature.

16:32-16:34
cfDNAPro: An R/Bioconductor package for robust and reproducible data analysis of cell-free DNA fragmentomic features
Format: Live from venue
Moderator(s): Jan Byška
  • Haichao Wang, Cancer Research UK Cambridge Institute, University of Cambridge and Cancer Research UK Cambridge Centre, Cambridge, United Kingdom
  • Paulius Mennea, Cancer Research UK Cambridge Institute, University of Cambridge and Cancer Research UK Cambridge Centre, Cambridge, United Kingdom
  • Elkie Chan, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, Hong Kong
  • Wendy N Cooper, Cancer Research UK Cambridge Institute, University of Cambridge and Cancer Research UK Cambridge Centre, Cambridge, United Kingdom
  • Nitzan Rosenfeld, Cancer Research UK Cambridge Institute, University of Cambridge and Cancer Research UK Cambridge Centre, Cambridge, United Kingdom
  • Hui Zhao, Cancer Research UK Cambridge Institute, University of Cambridge and Cancer Research UK Cambridge Centre, Cambridge, United Kingdom

Presentation Overview: Show

Background Cell-free DNA (cfDNA) in human body fluids exhibits characteristic fragmentation patterns, which can be exploited to support sensitive cancer detection and monitoring. However, fragmentomic analysis is easily biased by various biological experimental and analytical factors such as choice of library preparation kit and data processing software configuration. The field lacks specialized tools that attenuate biases and standardise the cfDNA fragmentomic analysis. Here, we present an open-source Bioconductor/R package, cfDNAPro, which provides feature curation and visualization utilities for fragmentomic analysis of paired-end sequencing data from cfDNA. Results The cfDNAPro R package allows users to produce visualisations which assist analysis whilst controlling for input biases. It implements parameterised quality control and bias curation steps, to ensure reproducibility and comparability of results. Starting from bam file, cfDNAPro annotates each fragment with meta information, which not only establishes essential foundation for fragment length, fragment motif and copy number analysis, but also supports a framework for cfDNA fragmentomics studies within the Bioconductor ecosystem. Conclusion cfDNAPro clarifies analytical challenges in the liquid biopsy field and proposes a standard for bias correction of cfDNA data within Bioconductor/R ecosystem. By offering fundamental utilities, it empowers further advanced methodological development in the broader study area of liquid biopsy.

16:35-16:37
PICKLUSTER: A protein-interface clustering and analysis plug-in for UCSF ChimeraX
Format: Live from venue
Moderator(s): Jan Byška
  • Luca Genz, Leibniz Institut für Virologie, Universität Hamburg, Centre for Structural Systems Biology, Germany
  • Thomas Mulvaney, 1Leibniz Institut für Virologie,Universitätsklinikum Hamburg Eppendorf,Centre for Structural Systems Biology, Germany
  • Maya Topf, 1Leibniz Institut für Virologie,Universitätsklinikum Hamburg Eppendorf,Centre for Structural Systems Biology, Germany

Presentation Overview: Show

Protein complexes are key components in the majority of biological processes within the cells. The identification and characterization of the protein interfaces in these complexes is crucial for understanding of the mechanisms of molecular recognition. Furthermore, the inhibition of the protein complex formation by targeting the interface has the potential of becoming important in drug development. However, large protein interfaces can consist of multiple interacting domains that are geometrically separated, posing a challenge for targeting the entire interface using drugs. In addition, previous research has shown the importance of small binding pockets in the protein interface to increase the selectivity of protein interface binders. Therefore, a division of the interface into smaller sub-interfaces based on their spatial properties could facilitate targeting the protein interface. Here, we developed PICKLUSTER, a plug-in for the molecular visualization program UCSF ChimeraX 1.4 that clusters protein interfaces based on distance and provides various scoring metrics for the analysis of the interface, not only of structures of protein complexes but also of models generated by AlphaFold. By fragmentation of the protein interface, PICKLUSTER offers a more focused and potentially more useful approach for targeting protein-protein interfaces.

16:38-16:40
Latent State Estimation of Cancer Patients Treated with Nivolumab Using Deep State Space Model
Format: Live from venue
Moderator(s): Jan Byška
  • Aya Nakamura, Graduate School of Medicine, Kyoto University, Japan
  • Ryosuke Kojima, Graduate School of Medicine, Kyoto University, Japan
  • Yuji Okamoto, Graduate School of Medicine, Kyoto University, Japan
  • Yohei Mineharu, Graduate School of Medicine, Kyoto University, Japan
  • Yohei Harada, Graduate School of Medicine, Kyoto University, Japan
  • Mayumi Kamada, Graduate School of Medicine, Kyoto University, Japan
  • Yasushi Okuno, Graduate School of Medicine, Kyoto University, Japan

Presentation Overview: Show

Analyzing electronic health records (EHRs) considering temporal changes in patient status has long been addressed to improve strategies of patient treatment. Our goal is to estimate informative latent states from EHR laboratory data and analyze typical time-series patterns of patient status changes, focusing on cancer patients treated with nivolumab, an effective anticancer drug. We propose a framework that includes a method for visualizing and interpreting the latent space using a deep state space model. Our framework consists of state-space model training using deep Kalman filter (DKF), latent state estimation, clustering, and visualization. Such clustering and visualization help users interpret the latent space. We applied our framework to time-series data of cancer patients who received the nivolumab anticancer drug at Kyoto University Hospital. By comparing the results retroactively from death, we succeeded in capturing special situations such as relatively safe situations and emergencies. Also, compared with other methods such as the variational autoencoder (VAE) and principal component analysis (PCA), our framework achieved clearer latent states. In addition to these results, we successfully extracted specific laboratory items characterizing the change of latent states such as lymphocytes, neutrophils, and items related to anemia.

16:41-16:43
Interactive Visualization of Gene Sets in Pangenomes
Format: Live from venue
Moderator(s): Jan Byška
  • Astrid van den Brandt, Eindhoven University of Technology, Netherlands
  • Sandra Smit, Wageningen University & Research, Netherlands
  • Eef M. Jonkheer, Wageningen University & Research, Netherlands
  • Huub van de Wetering, Eindhoven University of Technology, Netherlands
  • Anna Vilanova, Eindhoven University of Technology, Netherlands

Presentation Overview: Show

Comparing the way genes are organized on genomes is common in comparative genomics to understand their evolutionary history and potential functional variations. Pangenomes are beneficial for such comparisons but require visual analytics to support exploration of gene organization in a genomic context. Genomics researchers often characterize gene organization by conservation of gene order across genomes (i.e., synteny) and sequence similarity. Other features of the genes are also considered, such as their orientation, presence-absence variations, and the sequence context of neighboring genes. Their combination yields valuable insights into gene organization patterns. Deviations from the majority can reveal important biological variations or indicate annotation errors, both valuable to discover. Typical synteny tools use gene order and sequence similarity to compute and visualize conserved gene order as blocks, enabling global pattern inspection. However, scientists need tools that allow for interactive exploration of gene sets based on various gene and sequence features to understand different arrangements and conservation relations. We present GeneSets, an interactive visual interface for exploring gene organizations in genomic neighborhoods. With various feature parameter settings matched to different visual representations, users can compare the sets' arrangement from multiple perspectives. GeneSets is demonstrated using an important gene family in a potato pangenome.

16:44-16:46
Phylogenetic Context Using Phylogenetic Outlines
Format: Live from venue
Moderator(s): Jan Byška
  • Banu Cetinkaya, University of Tübingen, Germany
  • Daniel Huson, University of Tübingen, Germany

Presentation Overview: Show

Phylogenetic analysis often results in numerous phylogenetic trees, generated by utilizing multiple genes or methods or by conducting bootstrapping or Bayesian analysis, say. A consensus tree is used to summarize a set of trees. Consensus networks are also proposed to summarize the collection of phylogenetic trees. Such networks have the potential to display incompatibilities among the input trees. However, interpreting those networks can be challenging due to the significant number of nodes and edges and their non-planar structure. Here, we present the new concept of a phylogenetic consensus outline which is a new type of network but is significantly less complex than previous ones. We introduce an efficient algorithm for its computation. The main idea is to use a PQ-tree data structure to decide which splits to keep in the consensus. This consensus uses a phylogenetic outline for visualization. A phylogenetic outline is a network that represents a set of circular splits as an outer-labeled planar graph. We illustrate phylogenetic outline and consensus outline and their usage and explore how they compare to other methods on draft genomes of different assembly qualities, and on multiple gene trees from a published study on water lilies, respectively.

16:47-16:49
CCPlotR: An R package for the visualisation of cell-cell interactions
Format: Live from venue
Moderator(s): Jan Byška
  • Sarah Ennis, University of Galway, Ireland
  • Pilib Ó Broin, University of Galway, Ireland
  • Eva Szegezdi, University of Galway, Ireland

Presentation Overview: Show

The increasing availability of single-cell RNA-seq data in recent years has led to the development of multiple tools for predicting cellular crosstalk. These tools typically work by analysing the expression of genes that code for ligands and receptors from pairs of cell types that are known to interact with each other. Most tools will return a table of predicted interactions depicting the ligand, receptor, sending and receiving cell types for each interaction, as well as a score to rank important interactions. Some tools also generate plots to visualise the predicted interactions but these are not consistent across tools and since most datasets include several cell populations, visualisation can be challenging. Here we present CCPlotR - an R package that contains functions to generate cell-cell interaction plots. CCPlotR can generate several types of plots such as heatmaps, dotplots, circos plots and network diagrams, and works with the output of any cell-cell interaction prediction tool, requiring only a table of predicted interactions as input. The package is available on GitHub: https://github.com/Sarah145/CCPlotR and comes with a toy dataset to demonstrate the different functions. We anticipate that this will be a useful resource for single-cell researchers working on cell-cell interactions.

16:50-16:52
VIBE: An R package for the Visualization and Exploration of Bulk mRNA Expression data to prioritize cancer types for drug discovery
Format: Live from venue
Moderator(s): Jan Byška
  • Indu Khatri, Genmab B.V., Netherlands
  • Saskia van Asten, Genmab B.V., Netherlands
  • Francis Blokzijl, Genmab B.V., Netherlands
  • Iris Kolder, Genmab B.V., Netherlands

Presentation Overview: Show

The public databases for bulk mRNA sequencing data (e.g., TCGA and GTEx) offer a basic visualization to rank tumor types based on the expression of single genes. However, no statistical or visual understanding of content beyond a single gene is provided, and neither are biological pathway- or cell-specific gene signatures, which are often of great relevance to characterize the tumor microenvironment relative to the actual target expression. Here, we demonstrate VIBE: an R package that offers a wide range of functions to allow researchers to visualize, explore, and interpret bulk mRNA sequencing data sets (Figure 1). Furthermore, this package presents users with in-depth visualizations of individual and cohort-level summaries, such as concordant or discordant over- or under-expression of two genes or pathways using FACS-like scatterplots, the prevalence of subjects within and across tumor types exhibiting over- or under-expression, and correlation or co-expression between genes and gene signatures. In contrast to manual analysis, VIBE provides a comprehensive view of targeted pathways that can better improve the understanding of patient subsets and reduce the time and effort spent on assessing expression patterns of one or two genes or pathways in multiple tumor types.

16:53-16:55
Interactive visualisation for chromatin interaction networks
Format: Live from venue
Moderator(s): Jan Byška
  • Sandra Siliņa, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Andrejs Sizovs, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Gatis Melkus, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Peteris Rucevskis, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Juris Viksna, Institute of Mathematics and Computer Science, University of Latvia, Latvia

Presentation Overview: Show

In recent years a lot of research has focused on chromatin interactions, and Hi-C data analysis plays an important role in understanding gene regulation. Various software tools have been developed to analyse chromatin interaction data, including visualisations that allow a more rapid understanding of the overall information on these interactions. We present a software tool designed to aid analysis of chromatin interaction data represented as graphs. It focuses on identification and exploration of connected components found in some tissues but not others. The visualisation consists of two main parts – a crossfilter and a graph section. The crossfilter displays general information about the connected components and enables users to quickly assess the overall characteristics of the identified components. Users can also filter data based on multiple parameters to see information about a subset of the components. The graph section reveals more detailed information – interactions present in the component, genes and proteins associated to chromatin segments, and more. Currently preprocessed data for two datasets of 10 human tissue types is available. The preprocessed data sets have an average of about 100000 links per tissue type depending on chosen threshold. A more universal pipeline for other datasets is in active development.

16:56-16:58
3D modeling of Hi-C contacts: seeing the spatial organization of fungal chromosomes
Format: Live from venue
Moderator(s): Jan Byška
  • Thibault Poinsignon, Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université Paris‐Saclay, 91198 10 Gif‐sur‐Yvette, France
  • Mélina Gallopin, Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université Paris‐Saclay, 91198 10 Gif‐sur‐Yvette, France
  • Pierre Grognet, Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université Paris‐Saclay, 91198 10 Gif‐sur‐Yvette, France
  • Fabienne Malagnac, Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université Paris‐Saclay, 91198 10 Gif‐sur‐Yvette, France
  • Gaëlle Lelandais, Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université Paris‐Saclay, 91198 10 Gif‐sur‐Yvette, France
  • Pierre Poulain, Université Paris Cité, CNRS, Institut Jacques Monod, F-75013 Paris, France

Presentation Overview: Show

The spatial architecture of genomes in the nuclei is essential to study their function. The number of studies incorporating Hi-C approaches to probe interactions inside the chromatin is growing and, at the same time, the field for developing 3D modeling computational methods offers numerous solutions. However, this opportunity for original and insightful modeling is under exploited, and most fungal Hi-C studies don’t propose 3D models of chromatin contact networks. Our aim was to further promote the association between 3D modeling of the chromatin along contact maps in fungal Hi-C studies. To do so, we present here precise examples of 3D modeling of entire genomes and a software to automate this approach. Our analysis process was assembled into a modular workflow based on state-of-the-art analysis and modeling software that goes from Hi-C raw data to fully annotated 3D models. With this workflow, we re-analysed public Hi-C datasets available in three model species to visualize them in a 3D context. New models of the organization of those three emblematic fungal genomes were generated, summarizing the known features of fungal genomes into rich and integrated illustrations while also offering new insight into Saccharomyces cerevisiae cohesin anchor regions organization.

16:58-17:00
Understanding the contribution of immature myeloid cells to early melanoma establishment
Format: Live from venue
Moderator(s): Jan Byška
  • Xiaoyu Hou, University of Queensland Diamantina Institute (UQDI) and University of Queensland Faculty of Medicine, Australia
  • James Wells, University of Queensland Diamantina Institute (UQDI) and University of Queensland Faculty of Medicine, Australia

Presentation Overview: Show

In cancer, immature myeloid cells are known to be recruited to the tumour microenvironment and to differentiate into myeloid-derived suppressor cells with a potent ability to suppress various types of immune responses. The precise mechanisms that drive this differentiation remain unclear. In the laboratory, myeloid-derived suppressor cell differentiation is known to be induced through the interaction of peripheral blood mononuclear cells with melanoma cell lines. However, the absolute requirement for these cells in supporting the establishment of melanoma is not well understood. In my project, I am investigating whether immature myeloid cells play a fundamental role in tumour development by exploring the direct and indirect capacity of immature myeloid cells to impact tumour establishment and early growth. During this project, the immature myeloid cells were depleted before the tumour challenge to determine the effects of depletion on tumour establishment. The Nanostring GeoMx digital spatial profiling is used to discovery an approach to defining where immature myeloid cells are physically located within emerging tumours, which cells they interact with, and which growth and angiogenic factors they produce. These observations could lead to novel approaches to exploit these insights to stop melanoma at an early stage.

17:00-17:50
Invited Presentation: Keynote Presentation: Do we still need molecular graphics?
Format: Live from venue
Moderator(s): Jan Byška
  • Marc Baaden

Presentation Overview: Show

To what extent do we still need molecular graphics in today’s scientific landscape? Despite its long history and high level of maturity, molecular graphics remains an indispensable tool for scientific understanding and discovery, constantly facing new challenges. These currently include the ability to visualize complex molecular relationships and interactions, with the goal of scaling the size and scope of systems currently under investigation. In addition, molecular graphics programs must adapt to experimental (r)evolutions to remain relevant in the field. However, it is not enough to simply produce high quality visualizations. Scientists must be able to share these complex visualizations with their colleagues and with experts in other fields. The latest tools in molecular graphics, such as AI- and data-driven approaches, interactive simulations, augmented, mixed, and virtual reality, offer new ways to visualize and interact with molecular data and models. These advances enable researchers to explore scientific questions in new and innovative ways, but need to be more widely adopted to become routine in scientific investigations.

17:50-18:00
Award Ceremony and Closing
Format: Live from venue
Speakers: Jan Byška and Michael Krone
18:00-19:00
Poster Session