Past Discussions


Clustering gene expression time series data using an infinite Gaussian process mixture model

Our next meeting will be at 2pm on Feb 12th, in room 4160 of the Discovery building. Our Selected paper is Clustering gene expression time series data using an infinite Gaussian process mixture model.
.
The abstract is as follows.

Transcriptome-wide time series expression profiling is used to characterize the cellular response to environmental perturbations. The first step to analyzing transcriptional response data is often to cluster genes with similar responses. Here, we present a nonparametric model-based method, Dirichlet process Gaussian process mixture model (DPGP), which jointly models data clusters with a Dirichlet process and temporal dependencies with Gaussian processes. We demonstrate the accuracy of DPGP in comparison to state-of-the-art approaches using hundreds of simulated data sets. To further test our method, we apply DPGP to published microarray data from a microbial model organism exposed to stress and to novel RNA-seq data from a human cell line exposed to the glucocorticoid dexamethasone. We validate our clusters by examining local transcription factor binding and histone modifications. Our results demonstrate that jointly modeling cluster number and temporal dependencies can reveal shared regulatory mechanisms. DPGP software is freely available online at https://github.com/PrincetonUniversity/DP_GP_cluster.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


NetSig: network-based discovery from cancer genomes

Our next meeting will be at 2pm on Jan 29th, in room 4160 of the Discovery building. Our Selected paper is NetSig: network-based discovery from cancer genomes.
The abstract is as follows.

Methods that integrate molecular network information and tumor genome data could complement gene-based statistical tests to identify likely new cancer genes; but such approaches are challenging to validate at scale, and their predictive value remains unclear. We developed a robust statistic (NetSig) that integrates protein interaction networks with data from 4,742 tumor exomes. NetSig can accurately classify known driver genes in 60% of tested tumor types and predicts 62 new driver candidates. Using a quantitative experimental framework to determine in vivo tumorigenic potential in mice, we found that NetSig candidates induce tumors at rates that are comparable to those of known oncogenes and are ten-fold higher than those of random genes. By reanalyzing nine tumor-inducing NetSig candidates in 242 patients with oncogene-negative lung adenocarcinomas, we find that two (AKT2 and TFDP2) are significantly amplified. Our study presents a scalable integrated computational and experimental workflow to expand discovery from cancer genomes.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Learning causal networks with latent variables from multivariate information in genomic data

Our next meeting will be at 11:00 on Dec 5th, in room 4160 of the Discovery building. Our Selected paper is Learning causal networks with latent variables from multivariate information in genomic data.
The abstract is as follows.

Learning causal networks from large-scale genomic data remains challenging in absence of time series or controlled perturbation experiments. We report an information- theoretic method which learns a large class of causal or non-causal graphical models from purely observational data, while including the effects of unobserved latent variables, commonly found in many genomic datasets. Starting from a complete graph, the method iteratively removes dispensable edges, by uncovering significant information contributions from indirect paths, and assesses edge-specific confidences from randomization of available data. The remaining edges are then oriented based on the signature of causality in observational data. The approach and associated algorithm, miic, outperform earlier methods on a broad range of benchmark networks. Causal network reconstructions are presented at different biological size and time scales, from gene regulation in single cells to whole genome duplication in tumor development as well as long term evolution of vertebrates. Miic is publicly available at https://github.com/miicTeam/MIIC.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Reconstruction of developmental landscapes by optimal-transport analysis of single-cell gene expression sheds light on cellular reprogramming.

Our next meeting will be at 11:00 on Nov 7th, in room 4160 of the Discovery building. Our Selected paper is Reconstruction of developmental landscapes by optimal-transport analysis of single-cell gene expression sheds light on cellular reprogramming.
The abstract is as follows.

Understanding the molecular programs that guide cellular differentiation during development is a major goal of modern biology. Here, we introduce an approach, WADDINGTON-OT, based on the mathematics of optimal transport, for inferring developmental landscapes, probabilistic cellular fates and dynamic trajectories from large-scale single-cell RNA-seq (scRNA-seq) data collected along a time course. We demonstrate the power of WADDINGTON-OT by applying the approach to study 65,781 scRNA-seq profiles collected at 10 time points over 16 days during reprogramming of fibroblasts to iPSCs. We construct a high-resolution map of reprogramming that rediscovers known features; uncovers new alternative cell fates including neural- and placental-like cells; predicts the origin and fate of any cell class; highlights senescent-like cells that may support reprogramming through paracrine signaling; and implicates regulatory models in particular trajectories. Of these findings, we highlight Obox6, which we experimentally show enhances reprogramming efficiency. Our approach provides a general framework for investigating cellular differentiation.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Vicus: Exploiting local structures to improve network-based analysis of biological data

Our next meeting will be at 11:00 on Oct 24th, in room 4160 of the Discovery building. Our Selected paper is Vicus: Exploiting local structures to improve network-based analysis of biological data.
The abstract is as follows.

Biological networks entail important topological features and patterns critical to understanding interactions within complicated biological systems. Despite a great progress in understanding their structure, much more can be done to improve our inference and network analysis. Spectral methods play a key role in many network-based applications. Fundamental to spectral methods is the Laplacian, a matrix that captures the global structure of the network. Unfortunately, the Laplacian does not take into account intricacies of the network’s local structure and is sensitive to noise in the network. These two properties are fundamental to biological networks and cannot be ignored. We propose an alternative matrix Vicus. The Vicus matrix captures the local neighborhood structure of the network and thus is more effective at modeling biological interactions. We demonstrate the advantages of Vicus in the context of spectral methods by extensive empirical benchmarking on tasks such as single cell dimensionality reduction, protein module discovery and ranking genes for cancer subtyping. Our experiments show that using Vicus, spectral methods result in more accurate and robust performance in all of these tasks.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Knowledge-guided gene prioritization reveals new insights into the mechanisms of chemoresistance

Our next meeting will be at 11:00 on Oct 10th, in room 4160 of the Discovery building. Our Selected paper is Knowledge-guided gene prioritization reveals new insights into the mechanisms of chemoresistance.
The abstract is as follows.

Background: Identification of genes whose basal mRNA expression predicts the sensitivity of tumor cells to cytotoxic treatments can play an important role in individualized cancer medicine. It enables detailed characterization of the mechanism of action of drugs. Furthermore, screening the expression of these genes in the tumor tissue may suggest the best course of chemotherapy or a combination of drugs to overcome drug resistance.

Results: We developed a computational method called ProGENI to identify genes most associated with the variation of drug response across different individuals, based on gene expression data. In contrast to existing methods, ProGENI also utilizes prior knowledge of protein–protein and genetic interactions, using random walk techniques. Analysis of two relatively new and large datasets including gene expression data on hundreds of cell lines and their cytotoxic responses to a large compendium of drugs reveals a significant improvement in prediction of drug sensitivity using genes identified by ProGENI compared to other methods. Our siRNA knockdown experiments on ProGENI-identified genes confirmed the role of many new genes in sensitivity to three chemotherapy drugs: cisplatin, docetaxel, and doxorubicin. Based on such experiments and extensive literature survey, we demonstrate that about 73% of our top predicted genes modulate drug response in selected cancer cell lines. In addition, global analysis of genes associated with groups of drugs uncovered pathways of cytotoxic response shared by each group.

Conclusions: Our results suggest that knowledge-guided prioritization of genes using ProGENI gives new insight into mechanisms of drug resistance and identifies genes that may be targeted to overcome this phenomenon.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Reversed graph embedding resolves complex single-cell trajectories

Our next meeting will be at 11:00 on September 26th, in room 4160 of the Discovery building. Our Selected paper is Reversed graph embedding resolves complex single-cell trajectories.
The abstract is as follows.

Single-cell trajectories can unveil how gene regulation governs cell fate decisions. However, learning the structure of complex trajectories with multiple branches remains a challenging computational problem. We present Monocle 2, an algorithm that uses reversed graph embedding to describe multiple fate decisions in a fully unsupervised manner. We applied Monocle 2 to two studies of blood development and found that mutations in the genes encoding key lineage transcription factors divert cells to alternative fates.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


GenomeDISCO: A concordance score for chromosome conformation capture experiments using random walks on contact map graphs

Our next meeting will be at 11:00 on September 12th, in room 4160 of the Discovery building. Our Selected paper is GenomeDISCO: A concordance score for chromosome conformation capture experiments using random walks on contact map graphs.
The abstract is as follows.

The three-dimensional organization of chromatin plays a critical role in gene regulation and disease. High-throughput chromosome conformation capture experiments such as Hi-C are used to obtain genome-wide maps of 3D chromatin contacts. However, robust estimation of data quality and systematic comparison of these contact maps is challenging due to the multi-scale, hierarchical structure of the data and the resulting idiosyncratic properties of experimental noise. We introduce a multi-scale concordance measure called GenomeDISCO (DIfferences between Smoothed COntact maps) for assessing the similarity of a pair of contact maps obtained from chromosome capture experiments. We denoise the contact maps using random walks on the contact map graph, and integrate concordance at multiple scales of smoothing. We use simulated datasets to benchmark GenomeDISCO’s sensitivity to different types of noise typically affecting chromatin contact maps. When applied to a large collection of Hi-C datasets, GenomeDISCO accurately distinguishes biological replicates from samples obtained from different cell types. Software implementing GenomeDISCO is available at http://github.com/kundajelab/genomedisco.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


An Expanded View of Complex Traits: From Polygenic to Omnigenic

Our next meeting will be at 2:30 on September 1st, in room 4160 of the Discovery building. Our Selected paper is An Expanded View of Complex Traits: From Polygenic to Omnigenic.
The abstract is as follows.

A central goal of genetics is to understand the links between genetic variation and disease. Intuitively, one might expect disease-causing variants to cluster into key pathways that drive disease etiology. But for complex traits, association signals tend to be spread across most of the genome—including near many genes without an obvious connection to disease. We propose that gene regulatory networks are sufficiently interconnected such that all genes expressed in disease-relevant cells are liable to affect the functions of core disease-related genes and that most heritability can be explained by effects on genes outside core pathways. We refer to this hypothesis as an “omnigenic” model.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


LASSIM—A network inference toolbox for genome-wide mechanistic modeling

Our next meeting will be at 2:30 on August 18th, in room 4160 of the Discovery building. Our Selected paper is LASSIM—A network inference toolbox for genome-wide mechanistic modeling.
The abstract is as follows.

Recent technological advancements have made time-resolved, quantitative, multi-omics data available for many model systems, which could be integrated for systems pharmacokinetic use. Here, we present large-scale simulation modeling (LASSIM), which is a novel mathematical tool for performing large-scale inference using mechanistically defined ordinary differential equations (ODE) for gene regulatory networks (GRNs). LASSIM integrates structural knowledge about regulatory interactions and non-linear equations with multiple steady state and dynamic response expression datasets. The rationale behind LASSIM is that biological GRNs can be simplified using a limited subset of core genes that are assumed to regulate all other gene transcription events in the network. The LASSIM method is implemented as a general-purpose toolbox using the PyGMO Python package to make the most of multicore computers and high performance clusters, and is available at https://gitlab.com/Gustafsson-lab/lassim. As a method, LASSIM works in two steps, where it first infers a non-linear ODE system of the pre-specified core gene expression. Second, LASSIM in parallel optimizes the parameters that model the regulation of peripheral genes by core system genes. We showed the usefulness of this method by applying LASSIM to infer a large-scale non-linear model of naïve Th2 cell differentiation, made possible by integrating Th2 specific bindings, time-series together with six public and six novel siRNA-mediated knock-down experiments. ChIP-seq showed significant overlap for all tested transcription factors. Next, we performed novel time-series measurements of total T-cells during differentiation towards Th2 and verified that our LASSIM model could monitor those data significantly better than comparable models that used the same Th2 bindings. In summary, the LASSIM toolbox opens the door to a new type of model-based data analysis that combines the strengths of reliable mechanistic models with truly systems-level data. We demonstrate the power of this approach by inferring a mechanistically motivated, genome-wide model of the Th2 transcription regulatory system, which plays an important role in several immune related diseases.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.