Yearly Archives: 2017


Learning causal networks with latent variables from multivariate information in genomic data

Our next meeting will be at 11:00 on Dec 5th, in room 4160 of the Discovery building. Our Selected paper is Learning causal networks with latent variables from multivariate information in genomic data.
The abstract is as follows.

Learning causal networks from large-scale genomic data remains challenging in absence of time series or controlled perturbation experiments. We report an information- theoretic method which learns a large class of causal or non-causal graphical models from purely observational data, while including the effects of unobserved latent variables, commonly found in many genomic datasets. Starting from a complete graph, the method iteratively removes dispensable edges, by uncovering significant information contributions from indirect paths, and assesses edge-specific confidences from randomization of available data. The remaining edges are then oriented based on the signature of causality in observational data. The approach and associated algorithm, miic, outperform earlier methods on a broad range of benchmark networks. Causal network reconstructions are presented at different biological size and time scales, from gene regulation in single cells to whole genome duplication in tumor development as well as long term evolution of vertebrates. Miic is publicly available at https://github.com/miicTeam/MIIC.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Reconstruction of developmental landscapes by optimal-transport analysis of single-cell gene expression sheds light on cellular reprogramming.

Our next meeting will be at 11:00 on Nov 7th, in room 4160 of the Discovery building. Our Selected paper is Reconstruction of developmental landscapes by optimal-transport analysis of single-cell gene expression sheds light on cellular reprogramming.
The abstract is as follows.

Understanding the molecular programs that guide cellular differentiation during development is a major goal of modern biology. Here, we introduce an approach, WADDINGTON-OT, based on the mathematics of optimal transport, for inferring developmental landscapes, probabilistic cellular fates and dynamic trajectories from large-scale single-cell RNA-seq (scRNA-seq) data collected along a time course. We demonstrate the power of WADDINGTON-OT by applying the approach to study 65,781 scRNA-seq profiles collected at 10 time points over 16 days during reprogramming of fibroblasts to iPSCs. We construct a high-resolution map of reprogramming that rediscovers known features; uncovers new alternative cell fates including neural- and placental-like cells; predicts the origin and fate of any cell class; highlights senescent-like cells that may support reprogramming through paracrine signaling; and implicates regulatory models in particular trajectories. Of these findings, we highlight Obox6, which we experimentally show enhances reprogramming efficiency. Our approach provides a general framework for investigating cellular differentiation.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Vicus: Exploiting local structures to improve network-based analysis of biological data

Our next meeting will be at 11:00 on Oct 24th, in room 4160 of the Discovery building. Our Selected paper is Vicus: Exploiting local structures to improve network-based analysis of biological data.
The abstract is as follows.

Biological networks entail important topological features and patterns critical to understanding interactions within complicated biological systems. Despite a great progress in understanding their structure, much more can be done to improve our inference and network analysis. Spectral methods play a key role in many network-based applications. Fundamental to spectral methods is the Laplacian, a matrix that captures the global structure of the network. Unfortunately, the Laplacian does not take into account intricacies of the network’s local structure and is sensitive to noise in the network. These two properties are fundamental to biological networks and cannot be ignored. We propose an alternative matrix Vicus. The Vicus matrix captures the local neighborhood structure of the network and thus is more effective at modeling biological interactions. We demonstrate the advantages of Vicus in the context of spectral methods by extensive empirical benchmarking on tasks such as single cell dimensionality reduction, protein module discovery and ranking genes for cancer subtyping. Our experiments show that using Vicus, spectral methods result in more accurate and robust performance in all of these tasks.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Knowledge-guided gene prioritization reveals new insights into the mechanisms of chemoresistance

Our next meeting will be at 11:00 on Oct 10th, in room 4160 of the Discovery building. Our Selected paper is Knowledge-guided gene prioritization reveals new insights into the mechanisms of chemoresistance.
The abstract is as follows.

Background: Identification of genes whose basal mRNA expression predicts the sensitivity of tumor cells to cytotoxic treatments can play an important role in individualized cancer medicine. It enables detailed characterization of the mechanism of action of drugs. Furthermore, screening the expression of these genes in the tumor tissue may suggest the best course of chemotherapy or a combination of drugs to overcome drug resistance.

Results: We developed a computational method called ProGENI to identify genes most associated with the variation of drug response across different individuals, based on gene expression data. In contrast to existing methods, ProGENI also utilizes prior knowledge of protein–protein and genetic interactions, using random walk techniques. Analysis of two relatively new and large datasets including gene expression data on hundreds of cell lines and their cytotoxic responses to a large compendium of drugs reveals a significant improvement in prediction of drug sensitivity using genes identified by ProGENI compared to other methods. Our siRNA knockdown experiments on ProGENI-identified genes confirmed the role of many new genes in sensitivity to three chemotherapy drugs: cisplatin, docetaxel, and doxorubicin. Based on such experiments and extensive literature survey, we demonstrate that about 73% of our top predicted genes modulate drug response in selected cancer cell lines. In addition, global analysis of genes associated with groups of drugs uncovered pathways of cytotoxic response shared by each group.

Conclusions: Our results suggest that knowledge-guided prioritization of genes using ProGENI gives new insight into mechanisms of drug resistance and identifies genes that may be targeted to overcome this phenomenon.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Reversed graph embedding resolves complex single-cell trajectories

Our next meeting will be at 11:00 on September 26th, in room 4160 of the Discovery building. Our Selected paper is Reversed graph embedding resolves complex single-cell trajectories.
The abstract is as follows.

Single-cell trajectories can unveil how gene regulation governs cell fate decisions. However, learning the structure of complex trajectories with multiple branches remains a challenging computational problem. We present Monocle 2, an algorithm that uses reversed graph embedding to describe multiple fate decisions in a fully unsupervised manner. We applied Monocle 2 to two studies of blood development and found that mutations in the genes encoding key lineage transcription factors divert cells to alternative fates.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


GenomeDISCO: A concordance score for chromosome conformation capture experiments using random walks on contact map graphs

Our next meeting will be at 11:00 on September 12th, in room 4160 of the Discovery building. Our Selected paper is GenomeDISCO: A concordance score for chromosome conformation capture experiments using random walks on contact map graphs.
The abstract is as follows.

The three-dimensional organization of chromatin plays a critical role in gene regulation and disease. High-throughput chromosome conformation capture experiments such as Hi-C are used to obtain genome-wide maps of 3D chromatin contacts. However, robust estimation of data quality and systematic comparison of these contact maps is challenging due to the multi-scale, hierarchical structure of the data and the resulting idiosyncratic properties of experimental noise. We introduce a multi-scale concordance measure called GenomeDISCO (DIfferences between Smoothed COntact maps) for assessing the similarity of a pair of contact maps obtained from chromosome capture experiments. We denoise the contact maps using random walks on the contact map graph, and integrate concordance at multiple scales of smoothing. We use simulated datasets to benchmark GenomeDISCO’s sensitivity to different types of noise typically affecting chromatin contact maps. When applied to a large collection of Hi-C datasets, GenomeDISCO accurately distinguishes biological replicates from samples obtained from different cell types. Software implementing GenomeDISCO is available at http://github.com/kundajelab/genomedisco.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


An Expanded View of Complex Traits: From Polygenic to Omnigenic

Our next meeting will be at 2:30 on September 1st, in room 4160 of the Discovery building. Our Selected paper is An Expanded View of Complex Traits: From Polygenic to Omnigenic.
The abstract is as follows.

A central goal of genetics is to understand the links between genetic variation and disease. Intuitively, one might expect disease-causing variants to cluster into key pathways that drive disease etiology. But for complex traits, association signals tend to be spread across most of the genome—including near many genes without an obvious connection to disease. We propose that gene regulatory networks are sufficiently interconnected such that all genes expressed in disease-relevant cells are liable to affect the functions of core disease-related genes and that most heritability can be explained by effects on genes outside core pathways. We refer to this hypothesis as an “omnigenic” model.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


LASSIM—A network inference toolbox for genome-wide mechanistic modeling

Our next meeting will be at 2:30 on August 18th, in room 4160 of the Discovery building. Our Selected paper is LASSIM—A network inference toolbox for genome-wide mechanistic modeling.
The abstract is as follows.

Recent technological advancements have made time-resolved, quantitative, multi-omics data available for many model systems, which could be integrated for systems pharmacokinetic use. Here, we present large-scale simulation modeling (LASSIM), which is a novel mathematical tool for performing large-scale inference using mechanistically defined ordinary differential equations (ODE) for gene regulatory networks (GRNs). LASSIM integrates structural knowledge about regulatory interactions and non-linear equations with multiple steady state and dynamic response expression datasets. The rationale behind LASSIM is that biological GRNs can be simplified using a limited subset of core genes that are assumed to regulate all other gene transcription events in the network. The LASSIM method is implemented as a general-purpose toolbox using the PyGMO Python package to make the most of multicore computers and high performance clusters, and is available at https://gitlab.com/Gustafsson-lab/lassim. As a method, LASSIM works in two steps, where it first infers a non-linear ODE system of the pre-specified core gene expression. Second, LASSIM in parallel optimizes the parameters that model the regulation of peripheral genes by core system genes. We showed the usefulness of this method by applying LASSIM to infer a large-scale non-linear model of naïve Th2 cell differentiation, made possible by integrating Th2 specific bindings, time-series together with six public and six novel siRNA-mediated knock-down experiments. ChIP-seq showed significant overlap for all tested transcription factors. Next, we performed novel time-series measurements of total T-cells during differentiation towards Th2 and verified that our LASSIM model could monitor those data significantly better than comparable models that used the same Th2 bindings. In summary, the LASSIM toolbox opens the door to a new type of model-based data analysis that combines the strengths of reliable mechanistic models with truly systems-level data. We demonstrate the power of this approach by inferring a mechanistically motivated, genome-wide model of the Th2 transcription regulatory system, which plays an important role in several immune related diseases.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Context Specificity in Causal Signaling Networks Revealed by Phosphoprotein Profiling

Our next meeting will be at 2:30 on August 4th, in room 4160 of the Discovery building. Our Selected paper is Context Specificity in Causal Signaling Networks Revealed by Phosphoprotein Profiling.
The abstract is as follows.

Signaling networks downstream of receptor tyrosine kinases are among the most extensively studied biological networks, but new approaches are needed to elucidate causal relationships between network components and understand how such relationships are influenced by biological context and disease. Here, we investigate the context specificity of signaling networks within a causal conceptual framework using reverse-phase protein array time-course assays and network analysis approaches. We focus on a well-defined set of signaling proteins profiled under inhibition with five kinase inhibitors in 32 contexts: four breast cancer cell lines (MCF7, UACC812, BT20, and BT549) under eight stimulus conditions. The data, spanning multiple pathways and comprising ~70,000 phosphoprotein and ~260,000 protein measurements, provide a wealth of testable, context-specific hypotheses, several of which we experimentally validate. Furthermore, the data provide a unique resource for computational methods development, permitting empirical assessment of causal network learning in a complex, mammalian setting.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning

Our next meeting will be at 3:00 on June 23th, in room 4160 of the Discovery building. Our Selected paper is Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning.
The abstract is as follows.

We present single-cell interpretation via multikernel learning (SIMLR), an analytic framework and software which learns a similarity measure from single-cell RNA-seq data in order to perform dimension reduction, clustering and visualization. On seven published data sets, we benchmark SIMLR against state-of-the-art methods. We show that SIMLR is scalable and greatly enhances clustering performance while improving the visualization and interpretability of single-cell sequencing data.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.