BioVis@ISMB 2022 Program

July 13, 2022

Invited Speakers

Once upon a time in Bio-Medical Data Visualization: Reflections on Research Before and During Pandemic

Tatiana von Landesberger
Prof. Tatiana von Landesberger

Tatiana von Landesberger, University of Cologne

Abstract: Research in visualization is often motivated by the endeavor to improve on the illustration of data: in order to better communicate data to others and to gain deeper insights into complex datasets, possibly from a variety of data sources. In the medical domain, the data can include for example patient data, health records as well as biologic data such as genome. Insights to be obtained from data may relate inter alia to the spreading of diseases, evolutionary analysis and virus mutations. The tasks include both retrospective analysis for finding the patient zero and modelling the spreading of a disease, as well predictive modelling of virus mutations and future disease spreading. The COVID-19 pandemic has confronted this general motivation for our research to a need for practical solutions. Infection control experts needed to quickly gain insights into novel datasets and to communicate the insights to colleagues and to a broader public, requiring quick and efficient visualization solutions.

New methods, tools, and methodologies have popped up from basic and from applied research. New data was collected, model results were produced that required rapid analysis. Multi-disciplinary teams worked and applied solutions to the new challenges resulting from the pandemic. The rapid response was only possible by leveraging on the experience and past research. Thus, the talk will take a larger historical perspective and present specific solutions from own experience, including reflections on data, task and user triangle as well as the challenges of multidisciplinary working styles.

Speaker Biography: Tatiana von Landesberger is a full professor of Computer Science – Visualization at University of Cologne, Germany. She received a PhD in 2010 and finished her habilitation in 2017 at TU Darmstadt, Germany. Her research focuses on information visualization and visual analytics of spatio-temporal and network data from various domains such as biology, medicine, finance, transportation, journalism or meteorology. Her recent research addresses the challenges of perception and cognition of visualization as well as visual analysis of disease spreading. She regularly publishes at top international conferences and has received multiple awards. Tatiana has served in program committees and organization committees for IEEE VIS, EuroVis, VMV and other venues. Recently, she was full paper chair at EuroVis and is now member of EuroVis steering committee and associate editor of Computer Graphics Forum.

Machine learning provides a new perspective on protein modification

Lennart Martens
Prof. Lennart Martens

Lennart Martens, Ghent University

Abstract: Over the last two decades, mass spectrometry based proteomics has evolved quite dramatically, leveraging advances in instrumentation, methods, and software tools to dig ever deeper into the protein content of cells. Yet, as the key effectors of cellular function, there is more to proteins than just their sequence. Indeed, many proteins also are tightly regulated in terms of their activity, their localisation, and their interactions. And while this is a well-known fact, very little is actually understood about the specific mechanisms by which this regulation is carried out. While some mechanisms such as protein phosphorylation have been studied in quite some detail already, other protein modifications have received much less attention, and therefore remain largely unexplored. As a result, interesting downstream phenomena, such as competition between, or coordination of, modifications on the same site or the same protein remain underexplored.

Interestingly, a wholly new generation of machine learning-based identification tools is now radically changing our ability to discover the proteome-wide modification landscape, with ionbot the leading exponent of these revolutionary new tools. Itself based on the MS²PIP and DeepLC models to predict analyte behaviour, ionbot is fast and reliable, and allows unbiased identification of protein modifications at unprecedented scale. We are now working on providing this open modification capability to (immuno-)peptidomics as well, where we have already been able to show that our predictors allow substantial advances in identification sensitivity and specificity.

To show the capabilities of ionbot, and to cast a first glance at the true complexity of a proteome as our experiments and instruments have seen it, we have applied ionbot to over 1 billion human spectra, and over 600 million mouse spectra from the PRIDE archive, uncovering a plethora of modifications of various origine and level of interest. This wholly new view of the modified proteome dramatically changes our view on proteins, and shows the overwhelming abundance of modifications, chemical, biological, or artefactual that affect the protein machinery of life.

In the past, we already leveraged such proteome-wide data to infer protein associations, but recent advances in machine learning have allowed us to dramatically expand our coverage, in turn yielding a massive increase in the (less) biased detection of protein association or interactions from high-throughput omics data.

We can also use these proteome-wide results to analyse tissues and even cell types for their protein composition, yielding detailed information on tissue-specific proteomes with potentially far-reaching applications.

Speaker Biography: Lennart Martens is Full Professor of Systems Biology at Ghent University, Group Leader of the Computational Omics and Systems Biology (CompOmics) group at VIB, and Associate Director of the VIB-UGent Center for Medical Biotechnology, all in Ghent, Belgium. He has been working in proteomics bioinformatics since his Master’s degree, which focused on the computational interpretation of peptide mass spectra, and the sequence-dependent fragmentation of peptides. He then worked as a software developer and framework architect for a software company for a few years, before returning to Ghent University to pursue a Ph.D. in proteomics and proteomics informatics. During this time, he worked on the development of high-throughput peptide centric proteomics techniques and on bioinformatics tools to support these new approaches. In 2003 he designed and built the PRIDE repository for the global dissemination of proteomics data at EMBL-EBI as a Marie Curie fellow of the European Commission. After obtaining his Ph.D., he rejoined EMBL-EBI to coordinate the newly created PRIDE group for the next three years, firmly establishing the system as the world’s foremost public proteomics data repository. He then moved back to Ghent University and VIB to take up his current positions, in which he focuses on novel machine learning algorithms for mass spectrometry data analysis, and their application to the large-scale orthogonal reprocessing of public proteomics data. Prof. Martens has been elected to the Young Academy of the Royal Belgian Academy of Sciences in 2013, to the Human Proteome Organisation (HUPO) Council in 2016, has been elected President of the European Proteomics Association (EuPA) in 2020, and was admitted as Fellow of the Royal Society for Chemistry in 2018. He also served on the HUPO Executive Board from 2017 to 2019, and was President of the ABRF Proteomics Informatics Research Group (iPRG) in 2011. Dr. Martens received the 2014 Prometheus Award for Research Excellence from Ghent University, and the 2015 ‘Juan Pablo Albar’ Proteomics Pioneer Award from the European Proteomics Association (EuPA). An author on 265 peer-reviewed papers, he has also co-written two Wiley textbooks on computational mass spectrometry.

Program

Schedule subject to change
All times listed are in CDT

10:30-11:30
Keynote Presentation: Once upon a time in Bio-Medical Data Visualization: Reflections on Research Before and During Pandem...
Format: Live-stream
Moderator(s): Michael Krone
  • Tatiana Landesberger

Presentation Overview: Show

Research in visualization is often motivated by the endeavor to improve on the illustration of data:...

11:30-11:50
Polyphony: an Interactive Transfer Learning Framework for Single-Cell Data Analysis
Format: Live from venue
Moderator(s): Michael Krone
  • Qianwen Wang, Harvard Medical School, United States
  • Furui Cheng, The Hong Kong University of Science and Technology, Hong Kong
  • Mark Keller, Harvard Medical School, United States
  • Huamin Qu, The Hong Kong University of Science and Technology, Hong Kong
  • Nils Gehlenborg, Harvard Medical School, United States

Presentation Overview: Show

Reference-based cell-type annotation can significantly reduce time and effort in single-cell analysis by transferring labels from a previously-annotated dataset to a new dataset. However, label transfer is challenging. End-to-end computational methods can fail due to mixing technical variants (e.g., different sequencing batches or techniques) that must be removed and biological variants (e.g., different cells) that must be conserved among datasets. To address this issue, we propose Polyphony, an interactive transfer learning (ITL) framework, to complement biologists' knowledge with advanced computational methods. Polyphony is motivated and guided by domain experts' needs for a controllable, interactive, and algorithm-assisted annotation process, identified through our multi-round expert interviews with six biologists. We introduce anchors, i.e., analogous cell populations across datasets, as a paradigm to explain the computational process and collect users' feedback for model improvement. A set of visualizations and interactions is provided to empower users to add, delete, or modify anchors, resulting in refined cell type annotations. We demonstrate the effectiveness of this approach through two usage scenarios and interviews with two biologists. The results show that our anchor-based ITL method takes advantage of both human and machine intelligence in annotating massive single-cell datasets.

11:50-12:00
Data Transformations for Effective Visualization of Single-Cell Embeddings
Format: Live from venue
Moderator(s): Michael Krone
  • Evan Greene, Ozette Technologies, United States
  • Greg Finak, Ozette Technologies, United States
  • Fritz Lekschas, Ozette Technologies, United States
  • Malisa Smith, Ozette Technologies, United States
  • Leonard A. D'Amico, Fred Hutchinson Cancer Research Center, United States
  • Nina Bhardwaj, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, United States
  • Candice D. Church, Division of Dermatology, Department of Medicine University of Washington, United States
  • Chihiro Morishima, Division of Dermatology, Department of Medicine University of Washington, United States
  • Nirasha Ramchurren, Fred Hutchinson Cancer Research Center, United States
  • Janis M. Taube, Johns Hopkins University School of Medicine, United States
  • Paul T. Nghiem, Division of Dermatology, Department of Medicine University of Washington, United States
  • Martin A. Cheever, Fred Hutchinson Cancer Research Center, United States
  • Steven P. Fling, Fred Hutchinson Cancer Research Center, United States
  • Raphael Gottardo, University of Lausanne and Lausanne University Hospital, Swiss Institute of Bioinformatics, Switzerland

Presentation Overview: Show

Nonlinear dimensionality reduction (DR) methods are commonly used to create two-dimensional embeddings of high-dimensional data for visualization. Since the effectiveness of learned embeddings can depend markedly on the choice of the DR method’s hyperparameters, prior work has focused on evaluating hyperparameter settings. However, data transformations can be equally important for creating effective embeddings. Yet, they have received less attention. In this talk, we’re going to present data transformation approaches for the embedding of single-cell data, specifically surface proteomics. Using computationally-derived labels for expression groups (e.g., low, medium, high) we can spread out and normalize the expression range of different cell phenotypes. Visually this allows for the identification of rare and complex cell types that would otherwise be indistinguishable from broad cell phenotypes. Moreover, such an approach effectively eliminates batch effects that are otherwise the cause for great differences in the lower-dimensional embedding and make sample-by-sample comparisons ineffective. Finally, we’re going to show a data transformation approach using simulated data to create a generic embedding with concrete data being mapped into it. Such an approach enables relative comparison of cluster expression profiles while still providing a global map for broad cluster similarities.

12:00-12:10
Visualizing Cluster-specific Genes from Single-cell Transcriptomics Data Using Association Plots
Format: Live from venue
Moderator(s): Michael Krone
  • Elzbieta Gralinska, Max Planck Institute for Molecular Genetics, Germany
  • Clemens Kohl, Max Planck Institute for Molecular Genetics, Germany
  • Bita Sokhandan Fadakar, Max Planck Institute for Molecular Genetics, Germany
  • Martin Vingron, Max Planck Institute for Molecular Genetics, Germany

Presentation Overview: Show

Visualizing single-cell transcriptomics data in an informative way is a major challenge in biological data analysis. Clustering of cells is a prominent analysis step and the results are usually visualized in a planar embedding of the cells using methods like PCA, t-SNE, or UMAP. Given a cluster of cells, one frequently searches for the genes highly expressed specifically in that cluster. At this point, visualization is usually replaced by studying a list of differentially expressed genes. We address this bottleneck by presenting Association Plots (APs) adapted to single-cell data. APs are derived from correspondence analysis, a projection method which embeds both genes and cells in high-dimensional space, where genes associated to a cell cluster lie in a particular direction. By employing this feature, APs constitute a dimension-independent visualization of cluster-specific genes from single-cell datasets. Our method is now available as a free Bioconductor package APL. We demonstrate the application of APs to single-cell RNA-seq data through several examples. First, we show the identification of marker genes using APs. Second, we present how APs aid in cell cluster annotation using a predefined list of marker genes. Finally, we compare results from APs to results from existing differential expression testing tools.

12:10-12:20
Kana: Interactive Single-Cell Analysis in the Browser
Format: Live from venue
Moderator(s): Michael Krone
  • Jayaram Kancherla, Genentech, Inc., United States
  • Hector Corrada Bravo, Genentech, Inc., United States
  • Aaron Lun, Genentech, Inc., United States

Presentation Overview: Show

We present kana, a web application for interactive scRNA-seq data analysis that combines execution of both visualization and computational analysis in the web browser. Kana leverages web technologies such as WebAssembly to efficiently perform the relevant computations on the user’s machine leveraging C++ libraries implementing analysis steps that are re-usable in non-visualization, or client/server approaches. As an added benefit of this client side approach, user data is never transferred or uploaded to a server, avoiding problems with data privacy. Since computations run in the browser, this also removes network latency hence providing a smooth interactive experience. Kana provides a streamlined one-click workflow for all steps in a typical scRNA-seq analysis, starting from a count matrix and finishing with marker detection and cell type annotation. Results are progressively rendered immediately as the underlying analysis step is complete and are presented in an intuitive web interface for further exploration and iterative analysis. Testing on public datasets shows that kana can analyze over 100,000 cells within 5 minutes on a typical laptop. The application is hosted on GitHub: http://github.com/jkanche/kana. The preprint is available at https://www.biorxiv.org/content/10.1101/2022.03.02.482701v1

12:20-12:30
Supervised capacity preserving mapping: a clustering guided visualization method for scRNA-seq data
Format: Live from venue
Moderator(s): Michael Krone
  • Zhiqian Zhai, Department of Statistics, University of California Los Angeles, United States
  • Yu L. Lei, Department of Periodontics and Oral Medicine, University of Michigan; University of Michigan Rogel Cancer Center, United States
  • Rongrong Wang, Department of Computational Mathematics, Science and Engineering and Department of Mathematics, MSU, United States
  • Yuying Xie, Department of Computational Mathematics, Science and Engineering and Department of Statistics and Probability, MSU, United States

Presentation Overview: Show

Recently, various visualization methods have been developed to analyze the scRNA-seq data. However, current visualization methods, including UMAP and t-SNE, are challenged by the limited accuracy of rendering the geometric relationship of populations with distinct functional states. Most visualization methods are unsupervised, leaving out information from the clustering results or given labels. This leads to the inaccurate depiction of the distances between the bona fide functional states. In particular, UMAP and t-SNE are not optimal to preserve the global geometric structure. They may result in a contradiction that clusters with near distance in the embedded dimensions are in fact further away in the original dimensions. Besides, UMAP and t-SNE cannot track cluster variance. The embedded cluster variance is not only associated with the true variance but also proportional to the sample size. We present supCPM, a robust supervised visualization method utilizing clustering results, which separates different clusters, preserves the global structure and tracks the cluster variance. Compared with other existing methods using synthetic and real datasets, supCPM shows improved performance in preserving the global geometric structure and data variance. Overall, supCPM provides an enhanced visualization pipeline to assist the interpretation of functional transition and accurately depict population segregation.

14:30-14:50
Microbiome Maps: Hilbert Curve Visualizations of Metagenomic Profiles
Format: Live from venue
Moderator(s): Helena Jambor
  • Camilo Valdes, Lawrence Livermore National Laboratory, United States
  • Vitalii Stebliankin, Bioinformatics Research Group (BioRG), Florida International University., United States
  • Daniel Ruiz-Perez, Bioinformatics Research Group (BioRG), Florida International University., United States
  • Ji In Park, Department of Medicine. Kangwon National University School of Medicine., South Korea
  • Hajeong Lee, Department of Internal Medicine, Seoul National University College of Medicine., South Korea
  • Giri Narasimhan, Bioinformatics Research Group (BioRG), Florida International University., United States

Presentation Overview: Show

Abundance profiles from metagenomic sequencing data synthesize information from billions of sequenced reads coming from thousands of microbial genomes. Analyzing and understanding these profiles can be a challenge since the data they represent are complex. Particularly challenging is their visualization, and here we present a technique called a ""Microbiome Map"" which visualizes a microbiome profile using a Hilbert curve. The maps are created using the Jasper software, which generates colorful 2D images that succinctly visualizes a microbiome sequencing profile. Color and location in a microbiome map play a vital role: locations represent a genome from a reference collection (whole-genome sequencing), or a set of OTUs (16S sequencing); and color can represent their relative abundance. Maps can also be interactively explored using Jasper, which integrates with online resources such as Ensembl, GenBank, and UniProt. We discuss how microbiome maps can be a powerful asset for classification and prediction models by visualizing the strain-level abundances of 44K genomes in 328 samples from the Human Microbiome Project, as well as 5K species in 200 fecal samples from a collaboration with Kangwon National University and Seoul National University in South Korea. More information can be found at ""www.microbiomemaps.org"".

14:50-15:10
Coral: a web-based visual analysis tool for creating and characterizing cohorts
Format: Live-stream
Moderator(s): Qianwen Wang
  • Patrick Adelberger, Institute of Computer Graphics, Johannes Kepler University Linz, Linz, A-4040, Austria, Austria
  • Klaus Eckelt, Institute of Computer Graphics, Johannes Kepler University Linz, Linz, A-4040, Austria, Austria
  • Markus Johann Bauer, Global Computational Biology and Digital Sciences, Boehringer Ingelheim RCV GmbH & Co KG, Vienna, A-1121, Austria, Austria
  • Marc Streit, Institute of Computer Graphics, Johannes Kepler University Linz, Linz, A-4040, Austria, Austria
  • Christian Haslinger, Global Computational Biology and Digital Sciences, Boehringer Ingelheim RCV GmbH & Co KG, Vienna, A-1121, Austria, Austria
  • Thomas Zichner, Global Computational Biology and Digital Sciences, Boehringer Ingelheim RCV GmbH & Co KG, Vienna, A-1121, Austria, Austria

Presentation Overview: Show

A main task in computational cancer analysis is the identification of patient subgroups (i.e. cohorts) based on a rich collection of metadata attributes (patient stratification) or genomic markers of response (biomarkers). Coral is a web-based cohort analysis tool that is designed to support this task: Users can interactively create and refine multiple cohorts, based on quantitative or categorical attributes, which can then be compared, characterized, and inspected down to the level of single items. The characterization includes the possibility for statistical testing between cohorts and provides intuitive access to prevalence information. Coral visualizes the evolution of cohorts as well as their relationships as a graph. Furthermore, findings can be stored, shared, and reproduced via the integrated session management. Coral is pre-loaded with data from over 128 000 samples from the AACR Project GENIE, The Cancer Genome Atlas, the Cell Line Encyclopedia, and two depletion screen datasets. To demonstrate the usefulness of Coral, we reproduce findings from a published article about KRASG12C somatic mutations in the AACR Project GENIE patients. We analyze the KRASG12C mutation frequencies for Non-Small Cell Lung Cancer (NSCLC) and colorectal cancer patient cohorts with regard to their differences in race and gender.

15:10-15:20
TCGAnalyzeR: A Web Portal for Visualization of Pan-Cancer Molecular Patient Data
Format: Live from venue
Moderator(s): Qianwen Wang
  • Tuğba Önal-Süzek, Muğla Sıtkı Koçman University, Turkey
  • Başak Abak Masud, Istanbul Medipol University, Turkey
  • Talip Zengin, Muğla Sıtkı Koçman University, Turkey

Presentation Overview: Show

The Cancer Genome Atlas (TCGA) contains multidimensional molecular data of 11,000 cancer patients of 33 cancer types. In our work,we aimed to present a pipeline integrating our recently published gene-signature based low-risk/high-risk TCGA patient cohorts (Zengin T and Önal-Süzek T., 2020; Zengin T and Önal-Süzek T., 2021) with all the single-nucleotide variations (SNVs), the copy number variations (CNVs), RNA-seq and clinical data of 33 different cancer patients from TCGA and PubChem BioAssay databases http://tcganalyzer.mu.edu.tr/ Our interactive web platform TCGAnalyzeR enables statistical analysis of big data in 4 main categories providing the users to interactively select the cancer type, data category(SNV/CNV/DEA/Clinical), mutation type (somatic or all), risk group(low-risk/high-risk) and cohort type(paired/all). Downloadable plots and data tables provided to interactively visualize data specific to each category. Each plot has its filtration options. The gene and patient (sample) names given in the tables and plots are selectable which enables the user to add a gene or patient to the “My genes” or “My patients” panel respectively for filtering other plots and copying the selected ones to the clipboard. For 3 cancer types, LUAD,LUSC, COAD, we provide pre-clustered low-risk or high-risk cohorts using our gene signature method for additional filtering.

15:20-15:30
PhyloDiver: A Visual Analytics Tool for Tumor Phylogenies
Format: Live from venue
Moderator(s): Qianwen Wang
  • Charles Blatti, University of Illinois at Urbana-Champaign, United States
  • Matthew Berry, University of Illinois at Urbana-Champaign, United States
  • Chad Olson, University of Illinois at Urbana-Champaign, United States
  • Lisa Gatzke, University of Illinois at Urbana-Champaign, United States
  • Chuanyi Zhang, University of Illinois at Urbana-Champaign, United States
  • Peter Groves, University of Illinois at Urbana-Champaign, United States
  • Colleen Bushell, University of Illinois at Urbana-Champaign, United States
  • Nicholas Chia, Mayo Clinic, United States
  • Zeynep Madak-Erdogan, University of Illinois at Urbana-Champaign, United States
  • Mohammed El-Kebir, University of Illinois at Urbana-Champaign, United States

Presentation Overview: Show

Cancer is the result of an evolutionary process, where somatic mutations accumulate over time in a population of cells. As such, a tumor is composed of multiple subpopulations of cells, or clones, with distinct complements of mutations. This intra-tumor heterogeneity is a major driver for resistance to therapy. Researchers use evolutionary trees, or phylogenies, to study intra-tumor heterogeneity and reason about cancer evolution. While many methods have been developed to visualize and interpret tumor phylogenies, these methods often provide either 1) a static image of clonal evolution that does not accommodate user interaction or 2) tree layout interfaces that do not incorporate clonal proportions and mutation details. Here, we introduce PhyloDiver, a novel visual analytics tool that enables end-users to study clonal evolution in an interactive fashion while remaining connected to the underlying annotated mutations.

16:00-16:20
Visual Exploration of Relationships and Structure in Low-Dimensional Embeddings
Format: Live-stream
Moderator(s): Qianwen Wang
  • Patrick Adelberger, Johannes Kepler University Linz, Austria
  • Klaus Eckelt, Johannes Kepler University Linz, Austria
  • Marc Streit, Johannes Kepler University Linz, Austria
  • Andreas Hinterreiter, Johannes Kepler University Linz, Austria
  • Conny Walchshofer, Johannes Kepler University Linz, Austria
  • Vaishali Dhanoa, Johannes Kepler University Linz and Pro2Future GmbH, Austria
  • Christina Humer, Johannes Kepler University Linz, Austria
  • Moritz Heckmann, Johannes Kepler University Linz, Austria
  • Christian Steinparz, Johannes Kepler University Linz, Austria

Presentation Overview: Show

We present an interactive visual approach for the exploration and formation of structural relationships in embeddings of high-dimensional data. These structural relationships, such as item sequences, associations of items with groups, and hierarchies between groups of items, define properties of many real-world datasets. Nevertheless, most existing methods for the visual exploration of embeddings treat these structures as second-class citizens or do not take them into account at all. In our proposed analysis workflow, users explore enriched scatterplots of the embedding, in which relationships between items and/or groups are visually highlighted. During their exploratory analysis, users can externalize their insights by setting up additional groups and relationships between items and/or groups---for example, by dividing a heterogeneous group of patients into several subgroups. The original high-dimensional data for single items, groups of items, or differences between items and groups are accessible through additional summary visualizations and difference visualizations that complement the embedding with a detailed look at the high-dimensional attributes. We carefully tailored these summary and difference visualizations to various data types and semantic contexts. We implemented the approach as a web application, which is open-source and publicly available at https://jku-vds-lab.at/apps/embedding-structure-explorer.

16:20-16:24
Plaice plots - an allele-aware visualization of clonal evolution
Format: Live-stream
Moderator(s): Helena Jambor
  • Sarah Sandmann, Institute of Medical Informatics, University of Münster, Germany
  • Yvonne Lisa Behrens, Department of Human Genetics, Hannover Medical School, Germany
  • Gudrun Göhring, Department of Human Genetics, Hannover Medical School, Germany
  • Julian Varghese, Institute of Medical Informatics, University of Münster, Germany

Presentation Overview: Show

Reconstruction of clonal evolution involves complex integrated analyses. The results are, in addition to classical representation by phylogenetic or clonal evolution trees, commonly visualized using fish plots. In these plots, the development of every individual clone is displayed, considering time on the x-axis, and cancer cell fraction on the y-axis. Thereby, fish-shaped objects are generated. Despite providing a comprehensive visualization of clonal evolution, fish plots display information only on clone-, not on allele-level. Biallelic mutations cannot be identified at first sight. However, with respect to disease progression, these mutations play an essential role. To fill this gap, we introduce plaice plots as a derivative of fish plots. The actual 'fish' become flatfish, i.e. plaice, and are mirrored - above and below the y-axis. The upper plot visualizes common clonal development, while the lower plot shows the fraction of remaining healthy alleles. For example, in case of mutated TP53 and additional del17p affecting the remaining healthy allele, the fraction of cells with deficient TP53 is marked in the lower plot. Similarly, X-chromosomal mutations in male samples, leading to a loss of the only available healthy allele, are visualized. Thereby, plaice plots allow for immediate identification of double-hit events.

16:24-16:27
An R Shiny app for systematically integrating genetic and pharmacologic cancer dependency maps
Format: Live from venue
Moderator(s): Helena Jambor
  • Yu-Chiao Chiu, UPMC Hillman Cancer Center, University of Pittsburgh, United States
  • Yidong Chen, Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, United States
  • Tapsya Nayak, Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, United States
  • Li-Ju Wang, Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, United States
  • Michael Ning, Department of Computer Science, University of Texas at Austin, United States

Presentation Overview: Show

The rapidly growing cancer dependency maps pave the way to precision oncology by identifying and targeting the “Achilles’ heel” of cancer. There is a pressing need for software that systematically links such genetic (gene knockouts) and pharmacologic dependencies (small compounds). Here we present a web-based R Shiny app that incorporates heterogenous data from large-scale high-throughput CRISPR screens, pharmacologic screens, and molecular signatures library, jointly covering 17k genes, 20k drugs, and 1k cell lines. The major goal is to match gene knockouts and drug treatments that induce similar effects in cell viability and/or gene expression perturbation in order to address two fundamental questions: 1) which drugs can be potential surrogates to the knockout of a gene, and 2) which genes are potential targets or mechanisms of action of a drug. The app has four complementary and interconnected modules that address various query scenarios to identify potential druggable genetic vulnerabilities and understand the mechanisms of action of a known or new drug. The results are represented by interactive figures and networks, as well as annotated data tables. In summary, our Shiny app enables easy and systematic navigation, visualization, and integration of the rapidly evolving genetic and pharmacologic dependency maps of cancer.

16:31-16:34
ECellDive: Exploring Biological Systems in Virtual Reality
Format: Live-stream
Moderator(s): Helena Jambor
  • Eliott Jacopin, RIKEN, Center for Biosystems Dynamics Research, Japan
  • Kozo Nishida, Genome Analytics Japan Inc., Tokyo, Japan, Japan
  • Kazunari Kaizu, RIKEN, Center for Biosystems Dynamics Research, Japan
  • Koichi Takahashi, RIKEN, Center for Biosystems Dynamics Research, Japan

Presentation Overview: Show

ECellDive is a virtual environment where users can model, simulate and visualize biological systems in collaboration with their colleagues. In ECellDive, everything is a module representing either data (e.g. a metabolic pathway) or any transform on this data (e.g. a Flux Balance Analysis). For demonstration purposes we import the Escher-FBA model in our virtual scene (Zachary A. King et al. 2017, doi:10.1371/journal.pcbi.1004321) and dive into it. Diving transfers us to a new scene containing the metabolic pathway encoded in Escher-FBA. From there on, we explore the pathway by strolling around. This is a major improvement compared to the original web app where we have to zoom in/out or pan to explore the model. Then, we highlight the structure of the network by grouping modules together automatically or manually. It is particularly efficient to help contextualize the model by, for example, visualizing cellular compartment and metabolic subsystems. Finally, we perform a Flux Balance Analysis (FBA) of the pathway and update the simulation results by knocking-out/activating reactions of interest. Finally, ECellDive is about collaboration: any changes can be exported and shared. But we can also join a session hosted by someone else in real-time to modify the same file.

16:34-16:38
VenOmics and Cell Signaling Environment for Studies and BioDiscoveries
Format: Live-stream
Moderator(s): Helena Jambor
  • Marcela Ishihara, Programa de Pós Graduação em Toxinologia, Laboratório de Toxinologia Aplicada, Instituto Butantan, Brazil, Brazil
  • Bruno Ferreira de Souza, Laboratório de Toxinologia Aplicada, Instituto Butantan, Brazil, Brazil
  • Henrique Cursino Vieira, Laboratório de Toxinologia Aplicada, Instituto Butantan, Brazil, Brazil
  • Hugo Aguirre Armelin, Laboratório de Ciclo Celular, Instituto Butantan, Brazil, Brazil
  • Marcelo Silva Reis, Departamento de Ciência da Computação, UNICAMP, Brazil, Brazil
  • Milton Yutaka Nishiyama-Jr, Laboratório de Toxinologia Aplicada, Instituto Butantan, Brazil, Brazil

Presentation Overview: Show

Animal venoms have fascinated humanity for a long time mainly due their complex actions and effects. Nowadays, these substances still intricate humans and represent one of the main drivers for the discovery of novel natural drugs with potential therapeutic, medicinal and agricultural properties. Venom's vary widely and its biotechnological relevance is mostly attributed to its complex composition, being composed of a plethora of peptides, enzymes and other molecular compounds. Due to the importance that venoms represent, a new field, Venomics, that combines high throughput data from different biological levels with molecular and computational techniques, has emerged. A higher understanding of these substances can aid the generation of more effective antivenoms and discovery of new biomolecules. Here, we present the VenOmics and Cell Signaling Environment for BioDiscoveries (VEnOmiCS4BD), a novel web-based public database, in development, for -omics storage and integration of multi-level venomous data, such as transcriptomics and proteomics, derived from venomous and envenomated organisms as well as platform for integrative analysis that allows data exploration of gene expression profiles, crossing experiments, signaling pathways and knowledge discovery. With VEnOmiCS4BD, we hope to facilitate Venomics research, serving as a commonplace for deposition and downstream analysis of heterogeneous biological data.

16:38-16:41
SciViewer- An interactive browser for visualizing large single cell datasets
Format: Live from venue
Moderator(s): Helena Jambor
  • Dhawal Jain, Pulmonary Drug Discovery Laboratory, Bayer US LLC. Pharmaceuticals, Research & Development, Boston, MA, United States
  • Sikander Hayat, Institute of Experimental Medicine and Systems Biology, RWTH Aachen, Germany, Germany
  • Michael Cho, Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA, United States
  • Edwin Silverman, Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA, United States
  • Rafael Kramann, Institute of Experimental Medicine and Systems Biology, RWTH Aachen, Germany, Germany
  • Alexis Laux-Biehlmann, Pulmonary Drug Discovery Laboratory, Bayer US LLC. Pharmaceuticals, Research & Development, Boston, MA, United States
  • Joydeep Chakraborty, Product Platform Research, Enterprise Platforms and Infrastructure, Bayer US LLC., Morristown, NJ, United States
  • Xinkai Li, Data Integration and Historians Services, Enterprise Platforms and Infrastructure, Bayer US LLC., Morristown, NJ, United States
  • Hobart Moore, Infrastructure Engineering Services, Enterprise Platforms and Infrastructure, Bayer US LLC., Morristown, NJ, United States
  • Pooja Srinivasa, Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA, United States

Presentation Overview: Show

Single-cell sequencing improves our ability to understand biological systems at single-cell resolution and can be used to identify novel drug targets and optimal cell-types for target validation. However, tools that can interactively visualize and provide target-centric views of these large datasets are limited. We present SciViewer (Single-cell Interactive Viewer), a novel tool to interactively visualize, annotate and share single-cell datasets. SciViewer allows visualization of cluster, gene and pathway level information such as clustering annotation, differential expression, pathway enrichment, cell-type specificity, cellular composition, normalized gene expression and comparison across datasets. Further, we provide APIs for SciViewer to interact with publicly available pharmacogenomics databases for systematic evaluation of potential novel drug targets. We provide a module for non-programmatic upload of single-cell datasets. SciViewer will be a useful tool for data exploration and target discovery from single-cell datasets. It is available on GitHub (https://github.com/Dhawal-Jain/SciViewer).

16:41-16:45
Gos: a declarative (epi)genomics visualization library for Python
Format: Live-stream
Moderator(s): Helena Jambor
  • Nils Gehlenborg, Harvard Medical School, United States
  • Trevor Manz, Harvard Medical School, United States
  • Sehi L'Yi, Harvard Medical School, United States

Presentation Overview: Show

Existing genomic visualization tools are tailored towards specific tasks and as such are limited in expressiveness. The Gosling visualization grammar defines a set of primitives that specify how genomic datasets can be transformed and mapped to visual properties, providing building-blocks to compose unique scalable and interactive genomic data visualizations. Gosling visualizations are defined via JSON, however, which can be tedious and error-prone to edit manually – especially for complex specifications containing many layered and repeated elements. Additionally, genomic datasets defined by the Gosling grammar are expected to be accessible via HTTP, which poses challenges for users since a simple web-server and/or HiGlass server must be configured separately to view local data. Here we present Gos – a Python library which includes an API designed for computational biologists to quickly compose Gosling visualizations. Gos allows the use of familiar language features (variables, functions, for-loops, etc.) to author validated Gosling specifications (JSON) and additionally implements data-loading utilities to transparently load local data into visualizations, abstracting away the complexity of configuring custom web-servers. Gos is designed for interactive analysis within a computational notebook environment and integrates into Jupyter Notebook, JupyterLab, and Google Colab.

16:45-16:48
CoSIA: An R Package that Measures and Visualizes Transcriptome Diversity across Model Organisms and Their Tissues
Format: Live from venue
Moderator(s): Helena Jambor
  • Vishal H. Oza, Heersink School of Medicine, The University of Alabama at Birmingham, United States
  • Brittany N. Lasseigne, Heersink School of Medicine, The University of Alabama at Birmingham, United States
  • Anisha Haldar, Heersink School of Medicine, The University of Alabama at Birmingham, United States
  • Elizabeth J. Ramsey, Heersink School of Medicine, The University of Alabama at Birmingham, United States

Presentation Overview: Show

Studying patient variants in model organisms is an active area of research. The key challenge is determining an ideal model organism for modeling and studying the patient variant phenotype. This task requires collaboration between a diverse group of experts and involves complex evaluations across multiple metrics like sequence alignment, human protein, and gene expression. Though there are many challenges in comparing the expression variation of a gene-associated variant, the advent of new databases with preprocessed expression data across species and tissues has prompted the exploration of transcriptome diversity aiding scientists in selecting a suitable model organism for phenotypic studies. We are developing CoSIA (Cross-Species Investigation and Analysis), an R package that provides researchers with multiple metrics for choosing the most suitable model organism for study by measuring and visualizing a diverse group of gene expression-based metrics. CoSIA uses curated non-diseased wild-type RNA-sequencing expression data from Bgee to visualize a gene’s expression across tissues and model organisms. Additionally, CoSIA provides functions to measure and visualize transcriptome diversity for a gene using median-based coefficients of variation and Shannon Entropy calculations. Thus, CoSIA provides researchers with tools to visualize the variation in a gene’s expression profile to determine a suitable model organism.

16:48-16:52
Interactive Exploration of Tissues and Cells Guided by Visual Pattern Mining
Format: Live from venue
Moderator(s): Helena Jambor
  • Qianwen Wang, Harvard Univeristy, United States
  • Nils Gehlenborg, Harvard University, United States

Presentation Overview: Show

Visual patterns of tissues and cells in microscopy images can unravel valuable insights to understand human bodies and treat diseases (e.g., histopathology). Recent advances in spatial omics enable the analysis of tissues at the cellular level and lead to an explosion of research interest. However, current studies rarely discuss visual patterns, which is partly due to the difficulty for humans to interpret the generated multiplexed images, which can have more than 40 channels. To tackle this research gap, this study proposes a visual analytics approach to facilitate the visual exploration of tissues and cells through visual pattern mining. Specifically, the proposed method consists of a backend data module and a frontend visualization module. The backend module employs a beta-VAE module and extracts visual patterns by simultaneously considering all channels of the multiplexed images. The frontend module supports users in arranging and grouping items (e.g., cell thumbnails, tissue patches) based on the identified visual patterns. Users can examine the distribution of certain visual patterns and associate the item visual patterns with their spatial contexts and other types of biological information. A preliminary case study on breast cancer demonstrates the effectiveness of our proposed approach.

16:52-16:55
Effective visualisation of the tumour microenvironment using glyph-based approaches
Format: Live-stream
Moderator(s): Helena Jambor
  • Heba Sailem, University of Oxford, United Kingdom

Presentation Overview: Show

Visualisation of cancer tissues is important for diagnosis, identifying driving pathological processes and potential biomarkers. Existing visualisation methods do not represent different tissue components and the tumour microenvironment intuitively and therefore are difficult to interpret by pathologists. Previously, we developed ShapoGraphy (www.shapography.com), a user-friendly web app for interactive creation of new glyph-based representations. Here we use ShapoGraphy to develop semantically relevant representation of multiplexed tissue image data that facilitate the pathological assessment and pattern discovery of tumour microenvironment phenotypes. We will present the development of our representation and demonstrate its utility using several datasets measuring protein activities in stromal, immune and cancer cells. We will also present the exploration of various glyph design choices that uses different shapes and marks to represent different tissue compartments and tumour heterogeneity. To determine the effectiveness of our approach, we reviewed our designs with pathologists and biologists. We found that a representation that utilises compactly arranged hexagons that encode variables using the colour and symbols is more favourable. Finally, we will discuss general guidelines for producing effective glyph-based representation. In summary, our approach addresses the limitations of other visualisation approaches and provides a flexible way for summarising tissue image data.

16:55-16:58
Using Mapper to Reveal Morphological Relationships in Passiflora Leaves
Format: Live from venue
Moderator(s): Helena Jambor
  • Sarah Percival, Michigan State University, United States

Presentation Overview: Show

As collections of data grow in size, it is increasingly important to have efficient means of analyzing large data sets. Topological data analysis (TDA) uses concepts from the mathematical field of topology to not only efficiently examine large data sets, but to make inferences related to the "shape" of data. In this project, we use Mapper, a tool from TDA that summarizes data into a graph, to discover an underlying structure relating the shapes of more than 3,300 Passiflora leaves from 40 different species. As the Mapper graph has a structure, or "shape" of its own, we think of it as a "shape of shapes" that provides information on the interplay between the developmental processes determining leaf shape within a single plant and the evolutionary processes between species. In particular, we examine the interactions between leaf species and both leaf age and leaf area by constructing a Mapper graph for each measure. For each node in the resulting graphs, we then compute the average leaf shape to obtain a graph structure that reveals how morphometric differences between species relate to the developmental changes that must occur for those shapes to be realized.

17:00-18:00
Keynote Presentation: Machine learning provides a new perspective on protein modification
Format: Live from venue
Moderator(s): Helena Jambor
  • Lennart Martens

Presentation Overview: Show

Over the last two decades, mass spectrometry based proteomics has evolved quite dramatically, levera...