Abstract: Research in visualization is often motivated by the endeavor to
improve on the illustration of data: in order to better communicate data to
others and to gain deeper insights into complex datasets, possibly from a
variety of data sources. In the medical domain, the data can include for
example patient data, health records as well as biologic data such as genome.
Insights to be obtained from data may relate inter alia to the spreading of
diseases, evolutionary analysis and virus mutations. The tasks include both
retrospective analysis for finding the patient zero and modelling the spreading
of a disease, as well predictive modelling of virus mutations and future disease
spreading. The COVID-19 pandemic has confronted this general motivation for our
research to a need for practical solutions. Infection control experts needed to
quickly gain insights into novel datasets and to communicate the insights to
colleagues and to a broader public, requiring quick and efficient visualization
solutions.
New methods, tools, and methodologies have popped up from basic and from applied
research. New data was collected, model results were produced that required
rapid analysis. Multi-disciplinary teams worked and applied solutions to the new
challenges resulting from the pandemic. The rapid response was only possible by
leveraging on the experience and past research. Thus, the talk will take a
larger historical perspective and present specific solutions from own
experience, including reflections on data, task and user triangle as well as the
challenges of multidisciplinary working styles.
Speaker Biography: Tatiana von Landesberger is a full professor of Computer Science –
Visualization at University of Cologne, Germany. She received a PhD in 2010 and
finished her habilitation in 2017 at TU Darmstadt, Germany. Her research focuses
on information visualization and visual analytics of spatio-temporal and network
data from various domains such as biology, medicine, finance, transportation,
journalism or meteorology. Her recent research addresses the challenges of
perception and cognition of visualization as well as visual analysis of disease
spreading. She regularly publishes at top international conferences and has
received multiple awards. Tatiana has served in program committees and
organization committees for IEEE VIS, EuroVis, VMV and other venues. Recently,
she was full paper chair at EuroVis and is now member of EuroVis steering
committee and associate editor of Computer Graphics Forum.
Machine learning provides a new perspective on protein modification
Abstract: Over the last two decades, mass spectrometry based proteomics has
evolved quite dramatically, leveraging advances in instrumentation, methods, and
software tools to dig ever deeper into the protein content of cells. Yet, as the
key effectors of cellular function, there is more to proteins than just their
sequence. Indeed, many proteins also are tightly regulated in terms of their
activity, their localisation, and their interactions. And while this is a
well-known fact, very little is actually understood about the specific
mechanisms by which this regulation is carried out. While some mechanisms such
as protein phosphorylation have been studied in quite some detail already, other
protein modifications have received much less attention, and therefore remain
largely unexplored. As a result, interesting downstream phenomena, such as
competition between, or coordination of, modifications on the same site or the
same protein remain underexplored.
Interestingly, a wholly new generation of machine learning-based identification
tools is now radically changing our ability to discover the proteome-wide
modification landscape, with ionbot the leading exponent
of these revolutionary new tools. Itself based on the
MS²PIP and
DeepLC models to predict analyte
behaviour, ionbot is fast and reliable, and allows unbiased identification of
protein modifications at unprecedented scale. We are now working on providing
this open modification capability to (immuno-)peptidomics as well, where we have
already been able to show that our predictors allow substantial advances in
identification sensitivity and specificity.
To show the capabilities of ionbot, and to cast a first glance at the true
complexity of a proteome as our experiments and instruments have seen it, we
have applied ionbot to over 1 billion human spectra, and over 600 million mouse
spectra from the PRIDE archive, uncovering a
plethora of modifications of various origine and level of interest. This wholly
new view of the modified proteome dramatically changes our view on proteins, and
shows the overwhelming abundance of modifications, chemical, biological, or
artefactual that affect the protein machinery of life.
In the past, we already leveraged such proteome-wide data to infer protein
associations, but recent advances in
machine learning have allowed us to dramatically expand our coverage, in turn
yielding a massive increase in the (less) biased detection of protein
association or interactions from high-throughput omics data.
We can also use these proteome-wide results to analyse tissues and even cell
types for their protein composition, yielding detailed information on
tissue-specific proteomes with potentially far-reaching applications.
Speaker Biography: Lennart Martens is Full Professor of Systems Biology at Ghent
University, Group Leader of the Computational Omics and Systems Biology
(CompOmics) group at VIB, and Associate Director of the VIB-UGent Center for
Medical Biotechnology, all in Ghent, Belgium. He has been working in proteomics
bioinformatics since his Master’s degree, which focused on the computational
interpretation of peptide mass spectra, and the sequence-dependent
fragmentation of peptides. He then worked as a software developer and framework
architect for a software company for a few years, before returning to Ghent
University to pursue a Ph.D. in proteomics and proteomics informatics. During
this time, he worked on the development of high-throughput peptide centric
proteomics techniques and on bioinformatics tools to support these new
approaches. In 2003 he designed and built the PRIDE repository for the global
dissemination of proteomics data at EMBL-EBI as a Marie Curie fellow of the
European Commission. After obtaining his Ph.D., he rejoined EMBL-EBI to
coordinate the newly created PRIDE group for the next three years, firmly
establishing the system as the world’s foremost public proteomics data
repository. He then moved back to Ghent University and VIB to take up his
current positions, in which he focuses on novel machine learning algorithms for
mass spectrometry data analysis, and their application to the large-scale
orthogonal reprocessing of public proteomics data. Prof. Martens has been
elected to the Young Academy of the Royal Belgian Academy of Sciences in 2013,
to the Human Proteome Organisation (HUPO) Council in 2016, has been elected
President of the European Proteomics Association (EuPA) in 2020, and was
admitted as Fellow of the Royal Society for Chemistry in 2018. He also served
on the HUPO Executive Board from 2017 to 2019, and was President of the ABRF
Proteomics Informatics Research Group (iPRG) in 2011. Dr. Martens received the
2014 Prometheus Award for Research Excellence from Ghent University, and the
2015 ‘Juan Pablo Albar’ Proteomics Pioneer Award from the European Proteomics
Association (EuPA). An author on 265 peer-reviewed papers, he has also
co-written two Wiley textbooks on computational mass spectrometry.
Program
Schedule subject to change
All times listed are in CDT
10:30-11:30
Keynote Presentation: Once upon a time in Bio-Medical Data
Visualization: Reflections on Research Before and During Pandem...
Reference-based cell-type annotation can significantly reduce time and effort in single-cell analysis by
transferring labels from a previously-annotated dataset to a new dataset. However, label transfer is
challenging. End-to-end computational methods can fail due to mixing technical variants (e.g., different
sequencing batches or techniques) that must be removed and biological variants (e.g., different cells)
that must be conserved among datasets. To address this issue, we propose Polyphony, an interactive
transfer learning (ITL) framework, to complement biologists' knowledge with advanced computational
methods. Polyphony is motivated and guided by domain experts' needs for a controllable, interactive, and
algorithm-assisted annotation process, identified through our multi-round expert interviews with six
biologists. We introduce anchors, i.e., analogous cell populations across datasets, as a paradigm to
explain the computational process and collect users' feedback for model improvement. A set of
visualizations and interactions is provided to empower users to add, delete, or modify anchors,
resulting in refined cell type annotations. We demonstrate the effectiveness of this approach through
two usage scenarios and interviews with two biologists. The results show that our anchor-based ITL
method takes advantage of both human and machine intelligence in annotating massive single-cell
datasets.
11:50-12:00
Data Transformations for Effective Visualization of Single-Cell
Embeddings
Format: Live from venue
Moderator(s): Michael Krone
Evan Greene, Ozette Technologies, United States
Greg Finak, Ozette Technologies, United States
Fritz Lekschas, Ozette Technologies, United States
Malisa Smith, Ozette Technologies, United States
Leonard A. D'Amico, Fred Hutchinson Cancer Research Center, United States
Nina Bhardwaj, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, United States
Candice D. Church, Division of Dermatology, Department of Medicine University of Washington, United
States
Chihiro Morishima, Division of Dermatology, Department of Medicine University of Washington, United
States
Nirasha Ramchurren, Fred Hutchinson Cancer Research Center, United States
Janis M. Taube, Johns Hopkins University School of Medicine, United States
Paul T. Nghiem, Division of Dermatology, Department of Medicine University of Washington, United
States
Martin A. Cheever, Fred Hutchinson Cancer Research Center, United States
Steven P. Fling, Fred Hutchinson Cancer Research Center, United States
Raphael Gottardo, University of Lausanne and Lausanne University Hospital, Swiss Institute of
Bioinformatics, Switzerland
Nonlinear dimensionality reduction (DR) methods are commonly used to create two-dimensional embeddings of
high-dimensional data for visualization. Since the effectiveness of learned embeddings can depend
markedly on the choice of the DR method’s hyperparameters, prior work has focused on evaluating
hyperparameter settings. However, data transformations can be equally important for creating effective
embeddings. Yet, they have received less attention.
In this talk, we’re going to present data transformation approaches for the embedding of single-cell
data, specifically surface proteomics. Using computationally-derived labels for expression groups (e.g.,
low, medium, high) we can spread out and normalize the expression range of different cell phenotypes.
Visually this allows for the identification of rare and complex cell types that would otherwise be
indistinguishable from broad cell phenotypes. Moreover, such an approach effectively eliminates batch
effects that are otherwise the cause for great differences in the lower-dimensional embedding and make
sample-by-sample comparisons ineffective. Finally, we’re going to show a data transformation approach
using simulated data to create a generic embedding with concrete data being mapped into it. Such an
approach enables relative comparison of cluster expression profiles while still providing a global map
for broad cluster similarities.
12:00-12:10
Visualizing Cluster-specific Genes from Single-cell Transcriptomics
Data Using Association Plots
Format: Live from venue
Moderator(s): Michael Krone
Elzbieta Gralinska, Max Planck Institute for Molecular Genetics,
Germany
Clemens Kohl, Max Planck Institute for Molecular Genetics, Germany
Bita Sokhandan Fadakar, Max Planck Institute for Molecular Genetics, Germany
Martin Vingron, Max Planck Institute for Molecular Genetics, Germany
Visualizing single-cell transcriptomics data in an informative way is a major challenge in biological
data analysis. Clustering of cells is a prominent analysis step and the results are usually visualized
in a planar embedding of the cells using methods like PCA, t-SNE, or UMAP. Given a cluster of cells, one
frequently searches for the genes highly expressed specifically in that cluster. At this point,
visualization is usually replaced by studying a list of differentially expressed genes.
We address this bottleneck by presenting Association Plots (APs) adapted to single-cell data. APs are
derived from correspondence analysis, a projection method which embeds both genes and cells in
high-dimensional space, where genes associated to a cell cluster lie in a particular direction. By
employing this feature, APs constitute a dimension-independent visualization of cluster-specific genes
from single-cell datasets. Our method is now available as a free Bioconductor package APL.
We demonstrate the application of APs to single-cell RNA-seq data through several examples. First, we
show the identification of marker genes using APs. Second, we present how APs aid in cell cluster
annotation using a predefined list of marker genes. Finally, we compare results from APs to results from
existing differential expression testing tools.
12:10-12:20
Kana: Interactive Single-Cell Analysis in the Browser
Format: Live from venue
Moderator(s): Michael Krone
Jayaram Kancherla, Genentech, Inc., United States
Hector Corrada Bravo, Genentech, Inc., United States
We present kana, a web application for interactive scRNA-seq data analysis that combines execution of
both visualization and computational analysis in the web browser. Kana leverages web technologies such
as WebAssembly to efficiently perform the relevant computations on the user’s machine leveraging C++
libraries implementing analysis steps that are re-usable in non-visualization, or client/server
approaches. As an added benefit of this client side approach, user data is never transferred or uploaded
to a server, avoiding problems with data privacy. Since computations run in the browser, this also
removes network latency hence providing a smooth interactive experience. Kana provides a streamlined
one-click workflow for all steps in a typical scRNA-seq analysis, starting from a count matrix and
finishing with marker detection and cell type annotation. Results are progressively rendered immediately
as the underlying analysis step is complete and are presented in an intuitive web interface for further
exploration and iterative analysis. Testing on public datasets shows that kana can analyze over 100,000
cells within 5 minutes on a typical laptop.
The application is hosted on GitHub: http://github.com/jkanche/kana. The preprint is available at
https://www.biorxiv.org/content/10.1101/2022.03.02.482701v1
12:20-12:30
Supervised capacity preserving mapping: a clustering guided
visualization method for scRNA-seq data
Format: Live from venue
Moderator(s): Michael Krone
Zhiqian Zhai, Department of Statistics, University of California Los Angeles, United States
Yu L. Lei, Department of Periodontics and Oral Medicine, University of Michigan; University of
Michigan Rogel Cancer Center, United States
Rongrong Wang, Department of Computational Mathematics, Science and Engineering and Department of
Mathematics, MSU, United States
Yuying Xie, Department of Computational Mathematics, Science and Engineering and Department of
Statistics and Probability, MSU, United States
Recently, various visualization methods have been developed to analyze the scRNA-seq data. However,
current visualization methods, including UMAP and t-SNE, are challenged by the limited accuracy of
rendering the geometric relationship of populations with distinct functional states. Most visualization
methods are unsupervised, leaving out information from the clustering results or given labels. This
leads to the inaccurate depiction of the distances between the bona fide functional states. In
particular, UMAP and t-SNE are not optimal to preserve the global geometric structure. They may result
in a contradiction that clusters with near distance in the embedded dimensions are in fact further away
in the original dimensions. Besides, UMAP and t-SNE cannot track cluster variance. The embedded cluster
variance is not only associated with the true variance but also proportional to the sample size.
We present supCPM, a robust supervised visualization method utilizing clustering results, which
separates different clusters, preserves the global structure and tracks the cluster variance. Compared
with other existing methods using synthetic and real datasets, supCPM shows improved performance in
preserving the global geometric structure and data variance. Overall, supCPM provides an enhanced
visualization pipeline to assist the interpretation of functional transition and accurately depict
population segregation.
14:30-14:50
Microbiome Maps: Hilbert Curve Visualizations of Metagenomic
Profiles
Format: Live from venue
Moderator(s): Helena Jambor
Camilo Valdes, Lawrence Livermore National Laboratory, United States
Vitalii Stebliankin, Bioinformatics Research Group (BioRG), Florida International University.,
United States
Daniel Ruiz-Perez, Bioinformatics Research Group (BioRG), Florida International University., United
States
Ji In Park, Department of Medicine. Kangwon National University School of Medicine., South Korea
Hajeong Lee, Department of Internal Medicine, Seoul National University College of Medicine., South
Korea
Giri Narasimhan, Bioinformatics Research Group (BioRG), Florida International University., United
States
Abundance profiles from metagenomic sequencing data synthesize information from billions of sequenced
reads coming from thousands of microbial genomes. Analyzing and understanding these profiles can be a
challenge since the data they represent are complex. Particularly challenging is their visualization,
and here we present a technique called a ""Microbiome Map"" which visualizes a microbiome profile using
a Hilbert curve.
The maps are created using the Jasper software, which generates colorful 2D images that succinctly
visualizes a microbiome sequencing profile. Color and location in a microbiome map play a vital role:
locations represent a genome from a reference collection (whole-genome sequencing), or a set of OTUs
(16S sequencing); and color can represent their relative abundance. Maps can also be interactively
explored using Jasper, which integrates with online resources such as Ensembl, GenBank, and UniProt.
We discuss how microbiome maps can be a powerful asset for classification and prediction models by
visualizing the strain-level abundances of 44K genomes in 328 samples from the Human Microbiome Project,
as well as 5K species in 200 fecal samples from a collaboration with Kangwon National University and
Seoul National University in South Korea.
More information can be found at ""www.microbiomemaps.org"".
14:50-15:10
Coral: a web-based visual analysis tool for creating and
characterizing cohorts
Format: Live-stream
Moderator(s): Qianwen Wang
Patrick Adelberger, Institute of Computer Graphics, Johannes Kepler
University Linz, Linz, A-4040, Austria, Austria
Klaus Eckelt, Institute of Computer Graphics, Johannes Kepler University Linz, Linz, A-4040,
Austria, Austria
Markus Johann Bauer, Global Computational Biology and Digital Sciences, Boehringer Ingelheim RCV
GmbH & Co KG, Vienna, A-1121, Austria, Austria
Marc Streit, Institute of Computer Graphics, Johannes Kepler University Linz, Linz, A-4040, Austria,
Austria
Christian Haslinger, Global Computational Biology and Digital Sciences, Boehringer Ingelheim RCV
GmbH & Co KG, Vienna, A-1121, Austria, Austria
Thomas Zichner, Global Computational Biology and Digital Sciences, Boehringer Ingelheim RCV GmbH
& Co KG, Vienna, A-1121, Austria, Austria
A main task in computational cancer analysis is the identification of patient subgroups (i.e. cohorts)
based on a rich collection of metadata attributes (patient stratification) or genomic markers of
response (biomarkers). Coral is a web-based cohort analysis tool that is designed to support this task:
Users can interactively create and refine multiple cohorts, based on quantitative or categorical
attributes, which can then be compared, characterized, and inspected down to the level of single items.
The characterization includes the possibility for statistical testing between cohorts and provides
intuitive access to prevalence information. Coral visualizes the evolution of cohorts as well as their
relationships as a graph. Furthermore, findings can be stored, shared, and reproduced via the integrated
session management. Coral is pre-loaded with data from over 128 000 samples from the AACR Project GENIE,
The Cancer Genome Atlas, the Cell Line Encyclopedia, and two depletion screen datasets.
To demonstrate the usefulness of Coral, we reproduce findings from a published article about KRASG12C
somatic mutations in the AACR Project GENIE patients. We analyze the KRASG12C mutation frequencies for
Non-Small Cell Lung Cancer (NSCLC) and colorectal cancer patient cohorts with regard to their
differences in race and gender.
15:10-15:20
TCGAnalyzeR: A Web Portal for Visualization of Pan-Cancer Molecular
Patient Data
Format: Live from venue
Moderator(s): Qianwen Wang
Tuğba Önal-Süzek, Muğla Sıtkı Koçman University, Turkey
Başak Abak Masud, Istanbul Medipol University, Turkey
Talip Zengin, Muğla Sıtkı Koçman University, Turkey
The Cancer Genome Atlas (TCGA) contains multidimensional molecular data of 11,000 cancer patients of 33
cancer types. In our work,we aimed to present a pipeline integrating our recently published
gene-signature based low-risk/high-risk TCGA patient cohorts (Zengin T and Önal-Süzek T., 2020; Zengin T
and Önal-Süzek T., 2021) with all the single-nucleotide variations (SNVs), the copy number variations
(CNVs), RNA-seq and clinical data of 33 different cancer patients from TCGA and PubChem BioAssay
databases http://tcganalyzer.mu.edu.tr/
Our interactive web platform TCGAnalyzeR enables statistical analysis of big data in 4 main categories
providing the users to interactively select the cancer type, data category(SNV/CNV/DEA/Clinical),
mutation type (somatic or all), risk group(low-risk/high-risk) and cohort type(paired/all). Downloadable
plots and data tables provided to interactively visualize data specific to each category. Each plot has
its filtration options. The gene and patient (sample) names given in the tables and plots are selectable
which enables the user to add a gene or patient to the “My genes” or “My patients” panel respectively
for filtering other plots and copying the selected ones to the clipboard. For 3 cancer types, LUAD,LUSC,
COAD, we provide pre-clustered low-risk or high-risk cohorts using our gene signature method for
additional filtering.
15:20-15:30
PhyloDiver: A Visual Analytics Tool for Tumor Phylogenies
Format: Live from venue
Moderator(s): Qianwen Wang
Charles Blatti, University of Illinois at Urbana-Champaign, United
States
Matthew Berry, University of Illinois at Urbana-Champaign, United States
Chad Olson, University of Illinois at Urbana-Champaign, United States
Lisa Gatzke, University of Illinois at Urbana-Champaign, United States
Chuanyi Zhang, University of Illinois at Urbana-Champaign, United States
Peter Groves, University of Illinois at Urbana-Champaign, United States
Colleen Bushell, University of Illinois at Urbana-Champaign, United States
Nicholas Chia, Mayo Clinic, United States
Zeynep Madak-Erdogan, University of Illinois at Urbana-Champaign, United States
Mohammed El-Kebir, University of Illinois at Urbana-Champaign, United States
Cancer is the result of an evolutionary process, where somatic mutations accumulate over time in a
population of cells. As such, a tumor is composed of multiple subpopulations of cells, or clones, with
distinct complements of mutations. This intra-tumor heterogeneity is a major driver for resistance to
therapy. Researchers use evolutionary trees, or phylogenies, to study intra-tumor heterogeneity and
reason about cancer evolution. While many methods have been developed to visualize and interpret tumor
phylogenies, these methods often provide either 1) a static image of clonal evolution that does not
accommodate user interaction or 2) tree layout interfaces that do not incorporate clonal proportions and
mutation details. Here, we introduce PhyloDiver, a novel visual analytics tool that enables end-users to
study clonal evolution in an interactive fashion while remaining connected to the underlying annotated
mutations.
16:00-16:20
Visual Exploration of Relationships and Structure in Low-Dimensional
Embeddings
Format: Live-stream
Moderator(s): Qianwen Wang
Patrick Adelberger, Johannes Kepler University Linz, Austria
Klaus Eckelt, Johannes Kepler University Linz, Austria
Marc Streit, Johannes Kepler University Linz, Austria
Andreas Hinterreiter, Johannes Kepler University Linz, Austria
Conny Walchshofer, Johannes Kepler University Linz, Austria
Vaishali Dhanoa, Johannes Kepler University Linz and Pro2Future GmbH, Austria
Christina Humer, Johannes Kepler University Linz, Austria
Moritz Heckmann, Johannes Kepler University Linz, Austria
Christian Steinparz, Johannes Kepler University Linz, Austria
We present an interactive visual approach for the exploration and formation of structural relationships
in embeddings of high-dimensional data.
These structural relationships, such as item sequences, associations of items with groups, and
hierarchies between groups of items, define properties of many real-world datasets. Nevertheless, most
existing methods for the visual exploration of embeddings treat these structures as second-class
citizens or do not take them into account at all.
In our proposed analysis workflow, users explore enriched scatterplots of the embedding, in which
relationships between items and/or groups are visually highlighted. During their exploratory analysis,
users can externalize their insights by setting up additional groups and relationships between items
and/or groups---for example, by dividing a heterogeneous group of patients into several subgroups.
The original high-dimensional data for single items, groups of items, or differences between items and
groups are accessible through additional summary visualizations and difference visualizations that
complement the embedding with a detailed look at the high-dimensional attributes.
We carefully tailored these summary and difference visualizations to various data types and semantic
contexts.
We implemented the approach as a web application, which is open-source and publicly available at
https://jku-vds-lab.at/apps/embedding-structure-explorer.
16:20-16:24
Plaice plots - an allele-aware visualization of clonal
evolution
Format: Live-stream
Moderator(s): Helena Jambor
Sarah Sandmann, Institute of Medical Informatics, University of
Münster, Germany
Yvonne Lisa Behrens, Department of Human Genetics, Hannover Medical School, Germany
Gudrun Göhring, Department of Human Genetics, Hannover Medical School, Germany
Julian Varghese, Institute of Medical Informatics, University of Münster, Germany
Reconstruction of clonal evolution involves complex integrated analyses. The results are, in addition to
classical representation by phylogenetic or clonal evolution trees, commonly visualized using fish
plots. In these plots, the development of every individual clone is displayed, considering time on the
x-axis, and cancer cell fraction on the y-axis. Thereby, fish-shaped objects are generated.
Despite providing a comprehensive visualization of clonal evolution, fish plots display information only
on clone-, not on allele-level. Biallelic mutations cannot be identified at first sight. However, with
respect to disease progression, these mutations play an essential role. To fill this gap, we introduce
plaice plots as a derivative of fish plots. The actual 'fish' become flatfish, i.e. plaice, and are
mirrored - above and below the y-axis. The upper plot visualizes common clonal development, while the
lower plot shows the fraction of remaining healthy alleles. For example, in case of mutated TP53 and
additional del17p affecting the remaining healthy allele, the fraction of cells with deficient TP53 is
marked in the lower plot. Similarly, X-chromosomal mutations in male samples, leading to a loss of the
only available healthy allele, are visualized. Thereby, plaice plots allow for immediate identification
of double-hit events.
16:24-16:27
An R Shiny app for systematically integrating genetic and
pharmacologic cancer dependency maps
Format: Live from venue
Moderator(s): Helena Jambor
Yu-Chiao Chiu, UPMC Hillman Cancer Center, University of Pittsburgh, United States
Yidong Chen, Greehey Children's Cancer Research Institute, University of Texas Health San Antonio,
United States
Tapsya Nayak, Greehey Children's Cancer Research Institute, University of Texas Health San Antonio,
United States
Li-Ju Wang, Greehey Children's Cancer Research Institute, University of Texas Health San Antonio,
United States
Michael Ning, Department of Computer Science, University of Texas at Austin, United States
The rapidly growing cancer dependency maps pave the way to precision oncology by identifying and
targeting the “Achilles’ heel” of cancer. There is a pressing need for software that systematically
links such genetic (gene knockouts) and pharmacologic dependencies (small compounds). Here we present a
web-based R Shiny app that incorporates heterogenous data from large-scale high-throughput CRISPR
screens, pharmacologic screens, and molecular signatures library, jointly covering 17k genes, 20k drugs,
and 1k cell lines. The major goal is to match gene knockouts and drug treatments that induce similar
effects in cell viability and/or gene expression perturbation in order to address two fundamental
questions: 1) which drugs can be potential surrogates to the knockout of a gene, and 2) which genes are
potential targets or mechanisms of action of a drug. The app has four complementary and interconnected
modules that address various query scenarios to identify potential druggable genetic vulnerabilities and
understand the mechanisms of action of a known or new drug. The results are represented by interactive
figures and networks, as well as annotated data tables. In summary, our Shiny app enables easy and
systematic navigation, visualization, and integration of the rapidly evolving genetic and pharmacologic
dependency maps of cancer.
16:31-16:34
ECellDive: Exploring Biological Systems in Virtual Reality
Format: Live-stream
Moderator(s): Helena Jambor
Eliott Jacopin, RIKEN, Center for Biosystems Dynamics Research, Japan
Kozo Nishida, Genome Analytics Japan Inc., Tokyo, Japan, Japan
Kazunari Kaizu, RIKEN, Center for Biosystems Dynamics Research, Japan
Koichi Takahashi, RIKEN, Center for Biosystems Dynamics Research, Japan
ECellDive is a virtual environment where users can model, simulate and visualize biological systems in
collaboration with their colleagues. In ECellDive, everything is a module representing either data (e.g.
a metabolic pathway) or any transform on this data (e.g. a Flux Balance Analysis).
For demonstration purposes we import the Escher-FBA model in our virtual scene (Zachary A. King et al.
2017, doi:10.1371/journal.pcbi.1004321) and dive into it. Diving transfers us to a new scene containing
the metabolic pathway encoded in Escher-FBA. From there on, we explore the pathway by strolling around.
This is a major improvement compared to the original web app where we have to zoom in/out or pan to
explore the model. Then, we highlight the structure of the network by grouping modules together
automatically or manually. It is particularly efficient to help contextualize the model by, for example,
visualizing cellular compartment and metabolic subsystems. Finally, we perform a Flux Balance Analysis
(FBA) of the pathway and update the simulation results by knocking-out/activating reactions of interest.
Finally, ECellDive is about collaboration: any changes can be exported and shared. But we can also join
a session hosted by someone else in real-time to modify the same file.
16:34-16:38
VenOmics and Cell Signaling Environment for Studies and
BioDiscoveries
Format: Live-stream
Moderator(s): Helena Jambor
Marcela Ishihara, Programa de Pós Graduação em Toxinologia, Laboratório
de Toxinologia Aplicada, Instituto Butantan, Brazil, Brazil
Bruno Ferreira de Souza, Laboratório de Toxinologia Aplicada, Instituto Butantan, Brazil, Brazil
Henrique Cursino Vieira, Laboratório de Toxinologia Aplicada, Instituto Butantan, Brazil, Brazil
Hugo Aguirre Armelin, Laboratório de Ciclo Celular, Instituto Butantan, Brazil, Brazil
Marcelo Silva Reis, Departamento de Ciência da Computação, UNICAMP, Brazil, Brazil
Milton Yutaka Nishiyama-Jr, Laboratório de Toxinologia Aplicada, Instituto Butantan, Brazil, Brazil
Animal venoms have fascinated humanity for a long time mainly due their complex actions and effects.
Nowadays, these substances still intricate humans and represent one of the main drivers for the
discovery of novel natural drugs with potential therapeutic, medicinal and agricultural properties.
Venom's vary widely and its biotechnological relevance is mostly attributed to its complex composition,
being composed of a plethora of peptides, enzymes and other molecular compounds. Due to the importance
that venoms represent, a new field, Venomics, that combines high throughput data from different
biological levels with molecular and computational techniques, has emerged. A higher understanding of
these substances can aid the generation of more effective antivenoms and discovery of new biomolecules.
Here, we present the VenOmics and Cell Signaling Environment for BioDiscoveries (VEnOmiCS4BD), a novel
web-based public database, in development, for -omics storage and integration of multi-level venomous
data, such as transcriptomics and proteomics, derived from venomous and envenomated organisms as well as
platform for integrative analysis that allows data exploration of gene expression profiles, crossing
experiments, signaling pathways and knowledge discovery. With VEnOmiCS4BD, we hope to facilitate
Venomics research, serving as a commonplace for deposition and downstream analysis of heterogeneous
biological data.
16:38-16:41
SciViewer- An interactive browser for visualizing large single cell
datasets
Format: Live from venue
Moderator(s): Helena Jambor
Dhawal Jain, Pulmonary Drug Discovery Laboratory, Bayer US LLC. Pharmaceuticals, Research &
Development, Boston, MA, United States
Sikander Hayat, Institute of Experimental Medicine and Systems Biology, RWTH Aachen, Germany,
Germany
Michael Cho, Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical
School, Boston, MA, USA, United States
Edwin Silverman, Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard
Medical School, Boston, MA, USA, United States
Rafael Kramann, Institute of Experimental Medicine and Systems Biology, RWTH Aachen, Germany,
Germany
Alexis Laux-Biehlmann, Pulmonary Drug Discovery Laboratory, Bayer US LLC. Pharmaceuticals, Research
& Development, Boston, MA, United States
Joydeep Chakraborty, Product Platform Research, Enterprise Platforms and Infrastructure, Bayer US
LLC., Morristown, NJ, United States
Xinkai Li, Data Integration and Historians Services, Enterprise Platforms and Infrastructure, Bayer
US LLC., Morristown, NJ, United States
Hobart Moore, Infrastructure Engineering Services, Enterprise Platforms and Infrastructure, Bayer US
LLC., Morristown, NJ, United States
Pooja Srinivasa, Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard
Medical School, Boston, MA, USA, United States
Single-cell sequencing improves our ability to understand biological systems at single-cell resolution
and can be used to identify novel drug targets and optimal cell-types for target validation. However,
tools that can interactively visualize and provide target-centric views of these large datasets are
limited. We present SciViewer (Single-cell Interactive Viewer), a novel tool to interactively visualize,
annotate and share single-cell datasets. SciViewer allows visualization of cluster, gene and pathway
level information such as clustering annotation, differential expression, pathway enrichment, cell-type
specificity, cellular composition, normalized gene expression and comparison across datasets. Further,
we provide APIs for SciViewer to interact with publicly available pharmacogenomics databases for
systematic evaluation of potential novel drug targets. We provide a module for non-programmatic upload
of single-cell datasets. SciViewer will be a useful tool for data exploration and target discovery from
single-cell datasets. It is available on GitHub (https://github.com/Dhawal-Jain/SciViewer).
16:41-16:45
Gos: a declarative (epi)genomics visualization library for
Python
Format: Live-stream
Moderator(s): Helena Jambor
Nils Gehlenborg, Harvard Medical School, United States
Trevor Manz, Harvard Medical School, United States
Existing genomic visualization tools are tailored towards specific tasks and as such are limited in
expressiveness. The Gosling visualization grammar defines a set of primitives that specify how genomic
datasets can be transformed and mapped to visual properties, providing building-blocks to compose unique
scalable and interactive genomic data visualizations. Gosling visualizations are defined via JSON,
however, which can be tedious and error-prone to edit manually – especially for complex specifications
containing many layered and repeated elements. Additionally, genomic datasets defined by the Gosling
grammar are expected to be accessible via HTTP, which poses challenges for users since a simple
web-server and/or HiGlass server must be configured separately to view local data. Here we present Gos –
a Python library which includes an API designed for computational biologists to quickly compose Gosling
visualizations. Gos allows the use of familiar language features (variables, functions, for-loops, etc.)
to author validated Gosling specifications (JSON) and additionally implements data-loading utilities to
transparently load local data into visualizations, abstracting away the complexity of configuring custom
web-servers. Gos is designed for interactive analysis within a computational notebook environment and
integrates into Jupyter Notebook, JupyterLab, and Google Colab.
16:45-16:48
CoSIA: An R Package that Measures and Visualizes Transcriptome
Diversity across Model Organisms and Their Tissues
Format: Live from venue
Moderator(s): Helena Jambor
Vishal H. Oza, Heersink School of Medicine, The University of Alabama at Birmingham, United States
Brittany N. Lasseigne, Heersink School of Medicine, The University of Alabama at Birmingham, United
States
Anisha Haldar, Heersink School of Medicine, The University of Alabama
at Birmingham, United States
Elizabeth J. Ramsey, Heersink School of Medicine, The University of Alabama at Birmingham, United
States
Studying patient variants in model organisms is an active area of research. The key challenge is
determining an ideal model organism for modeling and studying the patient variant phenotype. This task
requires collaboration between a diverse group of experts and involves complex evaluations across
multiple metrics like sequence alignment, human protein, and gene expression. Though there are many
challenges in comparing the expression variation of a gene-associated variant, the advent of new
databases with preprocessed expression data across species and tissues has prompted the exploration of
transcriptome diversity aiding scientists in selecting a suitable model organism for phenotypic studies.
We are developing CoSIA (Cross-Species Investigation and Analysis), an R package that provides
researchers with multiple metrics for choosing the most suitable model organism for study by measuring
and visualizing a diverse group of gene expression-based metrics. CoSIA uses curated non-diseased
wild-type RNA-sequencing expression data from Bgee to visualize a gene’s expression across tissues and
model organisms. Additionally, CoSIA provides functions to measure and visualize transcriptome diversity
for a gene using median-based coefficients of variation and Shannon Entropy calculations. Thus, CoSIA
provides researchers with tools to visualize the variation in a gene’s expression profile to determine a
suitable model organism.
16:48-16:52
Interactive Exploration of Tissues and Cells Guided by Visual
Pattern Mining
Format: Live from venue
Moderator(s): Helena Jambor
Qianwen Wang, Harvard Univeristy, United States
Nils Gehlenborg, Harvard University, United States
Visual patterns of tissues and cells in microscopy images can unravel valuable insights to understand
human bodies and treat diseases (e.g., histopathology). Recent advances in spatial omics enable the
analysis of tissues at the cellular level and lead to an explosion of research interest. However,
current studies rarely discuss visual patterns, which is partly due to the difficulty for humans to
interpret the generated multiplexed images, which can have more than 40 channels.
To tackle this research gap, this study proposes a visual analytics approach to facilitate the visual
exploration of tissues and cells through visual pattern mining. Specifically, the proposed method
consists of a backend data module and a frontend visualization module. The backend module employs a
beta-VAE module and extracts visual patterns by simultaneously considering all channels of the
multiplexed images. The frontend module supports users in arranging and grouping items (e.g., cell
thumbnails, tissue patches) based on the identified visual patterns. Users can examine the distribution
of certain visual patterns and associate the item visual patterns with their spatial contexts and other
types of biological information. A preliminary case study on breast cancer demonstrates the
effectiveness of our proposed approach.
16:52-16:55
Effective visualisation of the tumour microenvironment using
glyph-based approaches
Visualisation of cancer tissues is important for diagnosis, identifying driving pathological processes
and potential biomarkers. Existing visualisation methods do not represent different tissue components
and the tumour microenvironment intuitively and therefore are difficult to interpret by pathologists.
Previously, we developed ShapoGraphy (www.shapography.com), a user-friendly web app for interactive
creation of new glyph-based representations. Here we use ShapoGraphy to develop semantically relevant
representation of multiplexed tissue image data that facilitate the pathological assessment and pattern
discovery of tumour microenvironment phenotypes.
We will present the development of our representation and demonstrate its utility using several datasets
measuring protein activities in stromal, immune and cancer cells. We will also present the exploration
of various glyph design choices that uses different shapes and marks to represent different tissue
compartments and tumour heterogeneity. To determine the effectiveness of our approach, we reviewed our
designs with pathologists and biologists. We found that a representation that utilises compactly
arranged hexagons that encode variables using the colour and symbols is more favourable. Finally, we
will discuss general guidelines for producing effective glyph-based representation. In summary, our
approach addresses the limitations of other visualisation approaches and provides a flexible way for
summarising tissue image data.
16:55-16:58
Using Mapper to Reveal Morphological Relationships in Passiflora
Leaves
Format: Live from venue
Moderator(s): Helena Jambor
Sarah Percival, Michigan State University, United States
As collections of data grow in size, it is increasingly important to have efficient means of analyzing
large data sets. Topological data analysis (TDA) uses concepts from the mathematical field of topology
to not only efficiently examine large data sets, but to make inferences related to the "shape" of data.
In this project, we use Mapper, a tool from TDA that summarizes data into a graph, to discover an
underlying structure relating the shapes of more than 3,300 Passiflora leaves from 40 different species.
As the Mapper graph has a structure, or "shape" of its own, we think of it as a "shape of shapes" that
provides information on the interplay between the developmental processes determining leaf shape within
a single plant and the evolutionary processes between species. In particular, we examine the
interactions between leaf species and both leaf age and leaf area by constructing a Mapper graph for
each measure. For each node in the resulting graphs, we then compute the average leaf shape to obtain a
graph structure that reveals how morphometric differences between species relate to the developmental
changes that must occur for those shapes to be realized.
17:00-18:00
Keynote Presentation: Machine learning provides a new perspective on
protein modification