Ali


FIND: difFerential chromatin INteractions Detection using a spatial Poisson process

Our next meeting will be at 2pm on Mar 26th, in room 4160 of the Discovery building. Our Selected paper is FIND: difFerential chromatin INteractions Detection using a spatial Poisson process.
The abstract is as follows.

Polymer-based simulations and experimental studies indicate the existence of a spatial dependency between the adjacent DNA fibers involved in the formation of chromatin loops. However, the existing strategies for detecting differential chromatin interactions assume that the interacting segments are spatially independent from the other segments nearby. To resolve this issue, we developed a new computational method, FIND, which considers the local spatial dependency between interacting loci. FIND uses a spatial Poisson process to detect differential chromatin interactions that show a significant difference in their interaction frequency and the interaction frequency of their neighbors. Simulation and biological data analysis show that FIND outperforms the widely used count-based methods and has a better signal-to-noise ratio.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


GenomeDISCO: A concordance score for chromosome conformation capture experiments using random walks on contact map graphs

Our next meeting will be at 11:00 on September 12th, in room 4160 of the Discovery building. Our Selected paper is GenomeDISCO: A concordance score for chromosome conformation capture experiments using random walks on contact map graphs.
The abstract is as follows.

The three-dimensional organization of chromatin plays a critical role in gene regulation and disease. High-throughput chromosome conformation capture experiments such as Hi-C are used to obtain genome-wide maps of 3D chromatin contacts. However, robust estimation of data quality and systematic comparison of these contact maps is challenging due to the multi-scale, hierarchical structure of the data and the resulting idiosyncratic properties of experimental noise. We introduce a multi-scale concordance measure called GenomeDISCO (DIfferences between Smoothed COntact maps) for assessing the similarity of a pair of contact maps obtained from chromosome capture experiments. We denoise the contact maps using random walks on the contact map graph, and integrate concordance at multiple scales of smoothing. We use simulated datasets to benchmark GenomeDISCO’s sensitivity to different types of noise typically affecting chromatin contact maps. When applied to a large collection of Hi-C datasets, GenomeDISCO accurately distinguishes biological replicates from samples obtained from different cell types. Software implementing GenomeDISCO is available at http://github.com/kundajelab/genomedisco.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


An Expanded View of Complex Traits: From Polygenic to Omnigenic

Our next meeting will be at 2:30 on September 1st, in room 4160 of the Discovery building. Our Selected paper is An Expanded View of Complex Traits: From Polygenic to Omnigenic.
The abstract is as follows.

A central goal of genetics is to understand the links between genetic variation and disease. Intuitively, one might expect disease-causing variants to cluster into key pathways that drive disease etiology. But for complex traits, association signals tend to be spread across most of the genome—including near many genes without an obvious connection to disease. We propose that gene regulatory networks are sufficiently interconnected such that all genes expressed in disease-relevant cells are liable to affect the functions of core disease-related genes and that most heritability can be explained by effects on genes outside core pathways. We refer to this hypothesis as an “omnigenic” model.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


LASSIM—A network inference toolbox for genome-wide mechanistic modeling

Our next meeting will be at 2:30 on August 18th, in room 4160 of the Discovery building. Our Selected paper is LASSIM—A network inference toolbox for genome-wide mechanistic modeling.
The abstract is as follows.

Recent technological advancements have made time-resolved, quantitative, multi-omics data available for many model systems, which could be integrated for systems pharmacokinetic use. Here, we present large-scale simulation modeling (LASSIM), which is a novel mathematical tool for performing large-scale inference using mechanistically defined ordinary differential equations (ODE) for gene regulatory networks (GRNs). LASSIM integrates structural knowledge about regulatory interactions and non-linear equations with multiple steady state and dynamic response expression datasets. The rationale behind LASSIM is that biological GRNs can be simplified using a limited subset of core genes that are assumed to regulate all other gene transcription events in the network. The LASSIM method is implemented as a general-purpose toolbox using the PyGMO Python package to make the most of multicore computers and high performance clusters, and is available at https://gitlab.com/Gustafsson-lab/lassim. As a method, LASSIM works in two steps, where it first infers a non-linear ODE system of the pre-specified core gene expression. Second, LASSIM in parallel optimizes the parameters that model the regulation of peripheral genes by core system genes. We showed the usefulness of this method by applying LASSIM to infer a large-scale non-linear model of naïve Th2 cell differentiation, made possible by integrating Th2 specific bindings, time-series together with six public and six novel siRNA-mediated knock-down experiments. ChIP-seq showed significant overlap for all tested transcription factors. Next, we performed novel time-series measurements of total T-cells during differentiation towards Th2 and verified that our LASSIM model could monitor those data significantly better than comparable models that used the same Th2 bindings. In summary, the LASSIM toolbox opens the door to a new type of model-based data analysis that combines the strengths of reliable mechanistic models with truly systems-level data. We demonstrate the power of this approach by inferring a mechanistically motivated, genome-wide model of the Th2 transcription regulatory system, which plays an important role in several immune related diseases.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Reproducibility of computational workflows is automated using continuous analysis

Our next meeting will be at 3:00 on March 24th, in room 4160 of the Discovery building. Our Selected paper is Reproducibility of computational workflows is automated using continuous analysis.
The abstract is as follows.

Replication, validation and extension of experiments are crucial for scientific progress. Computational experiments are scriptable and should be easy to reproduce. However, computational analyses are designed and run in a specific computing environment, which may be difficult or impossible to match using written instructions. We report the development of continuous analysis, a workflow that enables reproducible computational analyses. Continuous analysis combines Docker, a container technology akin to virtual machines, with continuous integration, a software development technique, to automatically rerun a computational analysis whenever updates or improvements are made to source code or data. This enables researchers to reproduce results without contacting the study authors. Continuous analysis allows reviewers, editors or readers to verify reproducibility without manually downloading and rerunning code and can provide an audit trail for analyses of data that cannot be shared.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Genome-Scale Networks Link Neurodegenerative Disease Genes to α-Synuclein through Specific Molecular Pathways

Our next meeting will be at 3:00 on March 10th, in room 4160 of the Discovery building. Our Selected paper is Genome-Scale Networks Link Neurodegenerative Disease Genes to α-Synuclein through Specific Molecular Pathways.
The abstract is as follows.

Numerous genes and molecular pathways are implicated in neurodegenerative proteinopathies, but their inter-relationships are poorly understood. We systematically mapped molecular pathways underlying the toxicity of alpha-synuclein (α-syn), a protein central to Parkinson’s disease. Genome-wide screens in yeast identified 332 genes that impact α-syn toxicity. To “humanize” this molecular network, we developed a computational method, TransposeNet. This integrates a Steiner prize-collecting approach with homology assignment through sequence, structure, and interaction topology. TransposeNet linked α-syn to multiple parkinsonism genes and druggable targets through perturbed protein trafficking and ER quality control as well as mRNA metabolism and translation. A calcium signaling hub linked these processes to perturbed mitochondrial quality control and function, metal ion transport, transcriptional regulation, and signal transduction. Parkinsonism gene interaction profiles spatially opposed in the network (ATP13A2/PARK9 and VPS35/PARK17) were highly distinct, and network relationships for specific genes (LRRK2/PARK8, ATXN2, and EIF4G1/PARK18) were confirmed in patient induced pluripotent stem cell (iPSC)-derived neurons. This cross-species platform connected diverse neurodegenerative genes to proteinopathy through specific mechanisms and may facilitate patient stratification for targeted therapy.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Affinity regression predicts the recognition code of nucleic acid–binding proteins

Our next meeting will be at 3:00 on February 10th, in room 4160 of the Discovery building. Our Selected paper is Affinity regression predicts the recognition code of nucleic acid–binding proteins.
The abstract is as follows.

Predicting the affinity profiles of nucleic acid–binding proteins directly from the protein sequence is a challenging problem. We present a statistical approach for learning the recognition code of a family of transcription factors or RNA-binding proteins (RBPs) from high-throughput binding data. Our method, called affinity regression, trains on protein binding microarray (PBM) or RNAcompete data to learn an interaction model between proteins and nucleic acids using only protein domain and probe sequences as inputs. When trained on mouse homeodomain PBM profiles, our model correctly identifies residues that confer DNA-binding specificity and accurately predicts binding motifs for an independent set of divergent homeodomains. Similarly, when trained on RNAcompete profiles for diverse RBPs, our model correctly predicts the binding affinities of held-out proteins and identifies key RNA-binding residues, despite the high level of sequence divergence across RBPs. We expect that the method will be broadly applicable to modeling and predicting paired macromolecular interactions in settings where high-throughput affinity data are available.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Lineage-Specific Genome Architecture Links Enhancers and Non-coding Disease Variants to Target Gene Promoters

Our next meeting will be at 3:00 on January 27th, in room 4160 of the Discovery building. Our Selected paper is Lineage-Specific Genome Architecture Links Enhancers and Non-coding Disease Variants to Target Gene Promoters.
The abstract is as follows.

Long-range interactions between regulatory elements and gene promoters play key roles in transcriptional regulation. The vast majority of interactions are uncharted, constituting a major missing link in understanding genome control. Here, we use promoter capture Hi-C to identify interacting regions of 31,253 promoters in 17 human primary hematopoietic cell types. We show that promoter interactions are highly cell type specific and enriched for links between active promoters and epigenetically marked enhancers. Promoter interactomes reflect lineage relationships of the hematopoietic tree, consistent with dynamic remodeling of nuclear architecture during differentiation. Interacting regions are enriched in genetic variants linked with altered expression of genes they contact, highlighting their functional role. We exploit this rich resource to connect non-coding disease variants to putative target promoters, prioritizing thousands of disease-candidate genes and implicating disease pathways. Our results demonstrate the power of primary cell promoter interactomes to reveal insights into genomic regulatory mechanisms underlying common diseases..

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Compact Integration of Multi-Network Topology for Functional Analysis of Genes

Our next meeting will be at 3:00 on December 19th, in room 3160 of the Discovery building. Our Selected paper is Compact Integration of Multi-Network Topology for Functional Analysis of Genes.
The abstract is as follows.

The topological landscape of molecular or functional interaction networks provides a rich source of information for inferring functional patterns of genes or proteins. However, a pressing yet-unsolved challenge is how to combine multiple heterogeneous networks, each having different connectivity patterns, to achieve more accurate inference. Here, we describe the Mashup framework for scalable and robust network integration. In Mashup, the diffusion in each network is first analyzed to characterize the topological context of each node. Next, the high-dimensional topological patterns in individual networks are canonically represented using low-dimensional vectors, one per gene or protein. These vectors can then be plugged into off-the-shelf machine learning methods to derive functional insights about genes or proteins. We present tools based on Mashup that achieve state-of-the-art performance in three diverse functional inference tasks: protein function prediction, gene ontology reconstruction, and genetic interaction prediction. Mashup enables deeper insights into the structure of rapidly accumulating and diverse biological network data and can be broadly applied to other network science domains.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


DeepChrome: Deep-learning for predicting gene expression from histone modifications

Our next meeting will be at 3:00 on November 21st, in room 3160 of the Discovery building. Our Selected paper is DeepChrome: Deep-learning for predicting gene expression from histone modifications.
The abstract is as follows.

Motivation: Histone modifications are among the most important factors that control gene regulation. Computational methods that predict gene expression from histone modification signals are highly desirable for understanding their combinatorial effects in gene regulation. This knowledge can help in developing ‘epigenetic drugs’ for diseases like cancer. Previous studies for quantifying the relationship between histone modifications and gene expression levels either failed to capture combinatorial effects or relied on multiple methods that separate predictions and combinatorial analysis. This paper develops a unified discriminative framework using a deep convolutional neural network to classify gene expression using histone modification data as input. Our system, called DeepChrome, allows automatic extraction of complex interactions among important features. To simultaneously visualize the combinatorial interactions among histone modifications, we propose a novel optimization-based technique that generates feature pattern maps from the learnt deep model. This provides an intuitive description of underlying epigenetic mechanisms that regulate genes.
Results: We show that DeepChrome outperforms state-of-the-art models like Support Vector Machines and Random Forests for gene expression classification task on 56 different cell-types from REMC database. The output of our visualization technique not only validates the previous observations but also allows novel insights about combinatorial interactions among histone modification marks, some of which have recently been observed by experimental studies.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.