Yearly Archives: 2016


Compact Integration of Multi-Network Topology for Functional Analysis of Genes

Our next meeting will be at 3:00 on December 19th, in room 3160 of the Discovery building. Our Selected paper is Compact Integration of Multi-Network Topology for Functional Analysis of Genes.
The abstract is as follows.

The topological landscape of molecular or functional interaction networks provides a rich source of information for inferring functional patterns of genes or proteins. However, a pressing yet-unsolved challenge is how to combine multiple heterogeneous networks, each having different connectivity patterns, to achieve more accurate inference. Here, we describe the Mashup framework for scalable and robust network integration. In Mashup, the diffusion in each network is first analyzed to characterize the topological context of each node. Next, the high-dimensional topological patterns in individual networks are canonically represented using low-dimensional vectors, one per gene or protein. These vectors can then be plugged into off-the-shelf machine learning methods to derive functional insights about genes or proteins. We present tools based on Mashup that achieve state-of-the-art performance in three diverse functional inference tasks: protein function prediction, gene ontology reconstruction, and genetic interaction prediction. Mashup enables deeper insights into the structure of rapidly accumulating and diverse biological network data and can be broadly applied to other network science domains.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions

Our next meeting will be at 3:00 on December 5th, in room 3160 of the Discovery building. Our Selected paper is Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions.
The abstract is as follows.

Massively parallel reporter assays (MPRAs) enable nucleotide-resolution dissection of transcriptional regulatory regions, such as enhancers, but only few regions at a time. Here we present a combined experimental and computational approach, Systematic high-resolution activation and repression profiling with reporter tiling using MPRA (Sharpr-MPRA), that allows high-resolution analysis of thousands of regions simultaneously. Sharpr-MPRA combines dense tiling of overlapping MPRA constructs with a probabilistic graphical model to recognize functional regulatory nucleotides, and to distinguish activating and repressive nucleotides, using their inferred contribution to reporter gene expression. We used Sharpr-MPRA to test 4.6 million nucleotides spanning 15,000 putative regulatory regions tiled at 5-nucleotide resolution in two human cell types. Our results recovered known cell-type-specific regulatory motifs and evolutionarily conserved nucleotides, and distinguished known activating and repressive motifs. Our results also showed that endogenous chromatin state and DNA accessibility are both predictive of regulatory function in reporter assays, identified retroviral elements with activating roles, and uncovered ‘attenuator’ motifs with repressive roles in active chromatin.


DeepChrome: Deep-learning for predicting gene expression from histone modifications

Our next meeting will be at 3:00 on November 21st, in room 3160 of the Discovery building. Our Selected paper is DeepChrome: Deep-learning for predicting gene expression from histone modifications.
The abstract is as follows.

Motivation: Histone modifications are among the most important factors that control gene regulation. Computational methods that predict gene expression from histone modification signals are highly desirable for understanding their combinatorial effects in gene regulation. This knowledge can help in developing ‘epigenetic drugs’ for diseases like cancer. Previous studies for quantifying the relationship between histone modifications and gene expression levels either failed to capture combinatorial effects or relied on multiple methods that separate predictions and combinatorial analysis. This paper develops a unified discriminative framework using a deep convolutional neural network to classify gene expression using histone modification data as input. Our system, called DeepChrome, allows automatic extraction of complex interactions among important features. To simultaneously visualize the combinatorial interactions among histone modifications, we propose a novel optimization-based technique that generates feature pattern maps from the learnt deep model. This provides an intuitive description of underlying epigenetic mechanisms that regulate genes.
Results: We show that DeepChrome outperforms state-of-the-art models like Support Vector Machines and Random Forests for gene expression classification task on 56 different cell-types from REMC database. The output of our visualization technique not only validates the previous observations but also allows novel insights about combinatorial interactions among histone modification marks, some of which have recently been observed by experimental studies.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer.

Our next meeting will be at 3:00 on October 10th, in room 3160 of the Discovery building. Our Selected paper is Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer.
The abstract is as follows.

To provide a detailed analysis of the molecular components and underlying mechanisms associated with ovarian cancer, we performed a comprehensive mass-spectrometry-based proteomic characterization of 174 ovarian tumors previously analyzed by The Cancer Genome Atlas (TCGA), of which 169 were high-grade serous carcinomas (HGSCs). Integrating our proteomic measurements with the genomic data yielded a number of insights into disease, such as how different copy-number alternations influence the proteome, the proteins associated with chromosomal instability, the sets of signaling pathways that diverse genome rearrangements converge on, and the ones most associated with short overall survival. Specific protein acetylations associated with homologous recombination deficiency suggest a potential means for stratifying patients for therapy. In addition to providing a valuable resource, these findings provide a view of how the somatic genome drives the cancer proteome and associations between protein and post-translational modification levels and clinical outcomes in HGSC

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Causal Mechanistic Regulatory Network for Glioblastoma Deciphered Using Systems Genetics Network Analysis

Our next meeting will be at 3:00 on September 26th, in room 3160 of the Discovery building. Our Selected paper is Causal Mechanistic Regulatory Network for Glioblastoma Deciphered Using Systems Genetics Network Analysis.
The abstract is as follows.

We developed the transcription factor (TF)-target gene database and the Systems Genetics Network Analysis (SYGNAL) pipeline to decipher transcriptional regulatory networks from multi-omic and clinical patient data, and we applied these tools to 422 patients with glioblastoma multiforme (GBM). The resulting gbmSYGNAL network predicted 112 somatically mutated genes or pathways that act through 74 TFs and 37 microRNAs (miRNAs) (67 not previously associated with GBM) to dysregulate 237 distinct co-regulated gene modules associated with patient survival or oncogenic processes. The regulatory predictions were associated to cancer phenotypes using CRISPR-Cas9 and small RNA perturbation studies and also demonstrated GBM specificity. Two pairwise combinations (ETV6-NFKB1 and romidepsin-miR-486-3p) predicted by the gbmSYGNAL network had synergistic anti-proliferative effects. Finally, the network revealed that mutations in NF1 and PIK3CA modulate IRF1-mediated regulation of MHC class I antigen processing and presentation genes to increase tumor lymphocyte infiltration and worsen prognosis. Importantly, SYGNAL is widely applicable for integrating genomic and transcriptomic measurements from other human cohorts.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Context Specific and Differential Gene Co-expression Networks via Bayesian Biclustering

Our next meeting will be at 12:30 on September 12th, in room 3160 of the Discovery building. Our Selected paper is Context Specific and Differential Gene Co-expression Networks via Bayesian Biclustering.
The abstract is as follows.

Identifying latent structure in high-dimensional genomic data is essential for exploring biological processes. Here, we consider recovering gene co-expression networks from gene expression data, where each network encodes relationships between genes that are co-regulated by shared biological mechanisms. To do this, we develop a Bayesian statistical model for biclustering to infer subsets of co-regulated genes that covary in all of the samples or in only a subset of the samples. Our biclustering method, BicMix, allows overcomplete representations of the data, computational tractability, and joint modeling of unknown confounders and biological signals. Compared with related biclustering methods, BicMix recovers latent structure with higher precision across diverse simulation scenarios as compared to state-of-the-art biclustering methods. Further, we develop a principled method to recover context specific gene co-expression networks from the estimated sparse biclustering matrices. We apply BicMix to breast cancer gene expression data and to gene expression data from a cardiovascular study cohort, and we recover gene co-expression networks that are differential across ER+ and ER- samples and across male and female samples. We apply BicMix to the Genotype-Tissue Expression (GTEx) pilot data, and we find tissue specific gene networks. We validate these findings by using our tissue specific networks to identify trans-eQTLs specific to one of four primary tissues.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Tensor decomposition for multiple-tissue gene expression experiments

Our next meeting will be at 12:30 on August 15th, in room 3160 of the Discovery building. Our Selected paper is Tensor decomposition for multiple-tissue gene expression experiments.
The abstract is as follows.

Genome-wide association studies of gene expression traits and other cellular phenotypes have successfully identified links between genetic variation and biological processes. The majority of discoveries have uncovered cis–expression quantitative trait locus (eQTL) effects via mass univariate testing of SNPs against gene expression in single tissues. Here we present a Bayesian method for multiple-tissue experiments focusing on uncovering gene networks linked to genetic variation. Our method decomposes the 3D array (or tensor) of gene expression measurements into a set of latent components. We identify sparse gene networks that can then be tested for association against genetic variation across the genome. We apply our method to a data set of 845 individuals from the TwinsUK cohort with gene expression measured via RNA-seq analysis in adipose, lymphoblastoid cell lines (LCLs) and skin. We uncover several gene networks with a genetic basis and clear biological and statistical significance. Extensions of this approach will allow integration of different omics, environmental and phenotypic data sets.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


CellCODE: a robust latent variable approach to differential expression analysis for heterogeneous cell populations

Our next meeting will be at 12:30 on August 1st, in room 3160 of the Discovery building. Our Selected paper is CellCODE: a robust latent variable approach to differential expression analysis for heterogeneous cell populations.
The abstract is as follows.

Motivation: Identifying alterations in gene expression associated with different clinical states is important for the study of human biology. However, clinical samples used in gene expression studies are often derived from heterogeneous mixtures with variable cell-type composition, complicating statistical analysis. Considerable effort has been devoted to modeling sample heterogeneity, and presently, there are many methods that can estimate cell proportions or pure cell-type expression from mixture data. However, there is no method that comprehensively addresses mixture analysis in the context of differential expression without relying on additional proportion information, which can be inaccurate and is frequently unavailable.

Results: In this study, we consider a clinically relevant situation where neither accurate proportion estimates nor pure cell expression is of direct interest, but where we are rather interested in detecting and interpreting relevant differential expression in mixture samples. We develop a method, Cell-type COmputational Differential Estimation (CellCODE), that addresses the specific statistical question directly, without requiring a physical model for mixture components. Our approach is based on latent variable analysis and is computationally transparent; it requires no additional experimental data, yet outperforms existing methods that use independent proportion measurements. CellCODE has few parameters that are robust and easy to interpret. The method can be used to track changes in proportion, improve power to detect differential expression and assign the differentially expressed genes to the correct cell type. .

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Epigenomic Co-localization and Co-evolution Reveal a Key Role for 5hmC as a Communication Hub in the Chromatin Network of ESCs

Our selected paper for this week is titled Epigenomic Co-localization and Co-evolution Reveal a Key Role for 5hmC as a Communication Hub in the Chromatin Network of ESCs, from Cell.The abstract is as follows:

Epigenetic communication through histone and cytosine modifications is essential for gene regula- tion and cell identity. Here, we propose a framework that is based on a chromatin communication model to get insight on the function of epigenetic modifica- tions in ESCs. The epigenetic communication network was inferred from genome-wide location data plus extensive manual annotation. Notably, we found that 5-hydroxymethylcytosine (5hmC) is the most-influential hub of this network, connecting DNA demethylation to nucleosome remodeling complexes and to key transcription factors of plurip- otency. Moreover, an evolutionary analysis revealed a central role of 5hmC in the co-evolution of chro- matin-related proteins. Further analysis of regions where 5hmC co-localizes with specific interactors shows that each interaction points to chromatin remodeling, stemness, differentiation, or meta- bolism. Our results highlight the importance of cyto- sine modifications in the epigenetic communication of ESCs.

Feel free to begin our discussion in the comments section below. Our meeting will be at 12:30 PM in room 3160 of the Discovery building on July 18th.


Simultaneous Pathway Activity Inference and Gene Expression Analysis Using RNA Sequencing

Our next meeting will be at 12:30 on June 20th, in room 3160 of the Discovery building. Our Selected paper is Simultaneous Pathway Activity Inference and Gene Expression Analysis Using RNA Sequencing.
The abstract is as follows.

Reporter gene assays are a venerable tool for studying signaling pathways, but they lack the throughput and complexity necessary to contribute to a systems-level understanding of endogenous signaling networks. We present a parallel reporter assay, transcription factor activity sequencing (TF-seq), built on synthetic DNA enhancer elements, which enables parallel measurements in primary cells of the transcriptome and transcription factor activity from more than 40 signaling pathways. Using TF-seq in Myd88−/− macrophages, we captured dynamic pathway activity changes underpinning the global transcriptional changes of the innate immune response. We also applied TF-seq to investigate small molecule mechanisms of action and find a role for NF-κB activation and coordination of the STAT1 response in the macrophage reaction to the anti-inflammatory natural product halofuginone. Simultaneous TF-seq and global gene expression profiling represent an integrative approach for gaining mechanistic insight into pathway activity and transcriptional changes that result from genetic and small molecule perturbations.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.