Yearly Archives: 2017


Predicting Causal Relationships from Biological Data: Applying Automated Casual Discovery on Mass Cytometry Data of Human Immune Cells

Our next meeting will be at 3:00 on June 09th, in room 4160 of the Discovery building. Our Selected paper is Predicting Causal Relationships from Biological Data: Applying Automated Casual Discovery on Mass Cytometry Data of Human Immune Cells.
The abstract is as follows.

Learning the causal relationships that define a molecular system allows us to predict how the system will respond to different interventions. Distinguishing causality from mere association typically requires randomized experiments. Methods for automated causal discovery from limited experiments exist, but have so far rarely been tested in systems biology applications. In this work, we apply state-of-the art causal discovery methods on a large collection of public mass cytometry data sets, measuring intra-cellular signaling proteins of the human immune system and their response to several perturbations. We show how different experimental conditions can be used to facilitate causal discovery, and apply two fundamental methods that produce context-specific causal predictions. Causal predictions were reproducible across independent data sets from two different studies, but often disagree with the KEGG pathway databases. Within this context, we discuss the caveats we need to overcome for automated causal discovery to become a part of the routine data analysis in systems biology.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Selecting the most appropriate time points to profile in high-throughpsut studies

Our next meeting will be at 3:00 on May 26th, in room 4160 of the Discovery building. Our Selected paper is Selecting the most appropriate time points to profile in high-throughpsut studies.
The abstract is as follows.

Biological systems are increasingly being studied by high throughput profiling of molecular data over time. Determining the set of time points to sample in studies that profile several different types of molecular data is still challenging. Here we present the Time Point Selection (TPS) method that solves this combinatorial problem in a principled and practical way. TPS utilizes expression data from a small set of genes sampled at a high rate. As we show by applying TPS to study mouse lung development, the points selected by TPS can be used to reconstruct an accurate representation for the expression values of the non selected points. Further, even though the selection is only based on gene expression, these points are also appropriate for representing a much larger set of protein, miRNA and DNA methylation changes over time. TPS can thus serve as a key design strategy for high throughput time series experiments.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Discovering sparse transcription factor codes for cell states and state transitions during development

Our next meeting will be at 3:00 on April 28th, in room 4160 of the Discovery building. Our Selected paper is Discovering sparse transcription factor codes for cell states and state transitions during development.
The abstract is as follows.

Computational analysis of gene expression to determine both the sequence of lineage choices made by multipotent cells and to identify the genes influencing these decisions is challenging. Here we discover a pattern in the expression levels of a sparse subset of genes among cell types in B- and T-cell developmental lineages that correlates with developmental topologies. We develop a statistical framework using this pattern to simultaneously infer lineage transitions and the genes that determine these relationships. We use this technique to reconstruct the early hematopoietic and intestinal developmental trees. We extend this framework to analyze single-cell RNA-seq data from early human cortical development, inferring a neocortical-hindbrain split in early progenitor cells and the key genes that could control this lineage decision. Our work allows us to simultaneously infer both the identity and lineage of cell types as well as a small set of key genes whose expression patterns reflect these relationships.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Can We Predict Gene Expression by Understanding Proximal Promoter Architecture?

Our next meeting will be at 3:00 on April 14th, in room 4160 of the Discovery building. Our Selected paper is Discovering sparse transcription factor codes for cell states and state transitions during development.
The abstract is as follows.

We review computational predictions of expression from the promoter architecture – the set of transcription factors that can bind the proximal promoter. We focus on spatial expression patterns in animals with complex body plans and many distinct tissue types. This field is ripe for change as functional genomics datasets accumulate for both expression and protein–DNA interactions. While there has been some success in predicting the breadth of expression (i.e., the fraction of tissue types a gene is expressed in), predicting tissue specificity remains challenging. We discuss how progress can be achieved through either machine learning or complementary combinatorial data mining. The likely impact of single-cell expression data is considered. Finally, we discuss the design of artificial promoters as a practical application.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Reproducibility of computational workflows is automated using continuous analysis

Our next meeting will be at 3:00 on March 24th, in room 4160 of the Discovery building. Our Selected paper is Reproducibility of computational workflows is automated using continuous analysis.
The abstract is as follows.

Replication, validation and extension of experiments are crucial for scientific progress. Computational experiments are scriptable and should be easy to reproduce. However, computational analyses are designed and run in a specific computing environment, which may be difficult or impossible to match using written instructions. We report the development of continuous analysis, a workflow that enables reproducible computational analyses. Continuous analysis combines Docker, a container technology akin to virtual machines, with continuous integration, a software development technique, to automatically rerun a computational analysis whenever updates or improvements are made to source code or data. This enables researchers to reproduce results without contacting the study authors. Continuous analysis allows reviewers, editors or readers to verify reproducibility without manually downloading and rerunning code and can provide an audit trail for analyses of data that cannot be shared.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Genome-Scale Networks Link Neurodegenerative Disease Genes to α-Synuclein through Specific Molecular Pathways

Our next meeting will be at 3:00 on March 10th, in room 4160 of the Discovery building. Our Selected paper is Genome-Scale Networks Link Neurodegenerative Disease Genes to α-Synuclein through Specific Molecular Pathways.
The abstract is as follows.

Numerous genes and molecular pathways are implicated in neurodegenerative proteinopathies, but their inter-relationships are poorly understood. We systematically mapped molecular pathways underlying the toxicity of alpha-synuclein (α-syn), a protein central to Parkinson’s disease. Genome-wide screens in yeast identified 332 genes that impact α-syn toxicity. To “humanize” this molecular network, we developed a computational method, TransposeNet. This integrates a Steiner prize-collecting approach with homology assignment through sequence, structure, and interaction topology. TransposeNet linked α-syn to multiple parkinsonism genes and druggable targets through perturbed protein trafficking and ER quality control as well as mRNA metabolism and translation. A calcium signaling hub linked these processes to perturbed mitochondrial quality control and function, metal ion transport, transcriptional regulation, and signal transduction. Parkinsonism gene interaction profiles spatially opposed in the network (ATP13A2/PARK9 and VPS35/PARK17) were highly distinct, and network relationships for specific genes (LRRK2/PARK8, ATXN2, and EIF4G1/PARK18) were confirmed in patient induced pluripotent stem cell (iPSC)-derived neurons. This cross-species platform connected diverse neurodegenerative genes to proteinopathy through specific mechanisms and may facilitate patient stratification for targeted therapy.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Mutation effects predicted from sequence co-variation

Our next meeting will be at 3:00 on February 24th, in room 4160 of the Discovery building. Our Selected paper is Mutation effects predicted from sequence co-variation.
The abstract is as follows.

Many high-throughput experimental technologies have been developed to assess the effects of large numbers of mutations (variation) on phenotypes. However, designing functional assays for these methods is challenging, and systematic testing of all combinations is impossible, so robust methods to predict the effects of genetic variation are needed. Most prediction methods exploit evolutionary sequence conservation but do not consider the interdependencies of residues or bases. We present EVmutation, an unsupervised statistical method for predicting the effects of mutations that explicitly captures residue dependencies between positions. We validate EVmutation by comparing its predictions with outcomes of high-throughput mutagenesis experiments and measurements of human disease mutations and show that it outperforms methods that do not account for epistasis. EVmutation can be used to assess the quantitative effects of mutations in genes of any organism. We provide pre-computed predictions for ~7,000 human proteins at http://evmutation.org/.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Affinity regression predicts the recognition code of nucleic acid–binding proteins

Our next meeting will be at 3:00 on February 10th, in room 4160 of the Discovery building. Our Selected paper is Affinity regression predicts the recognition code of nucleic acid–binding proteins.
The abstract is as follows.

Predicting the affinity profiles of nucleic acid–binding proteins directly from the protein sequence is a challenging problem. We present a statistical approach for learning the recognition code of a family of transcription factors or RNA-binding proteins (RBPs) from high-throughput binding data. Our method, called affinity regression, trains on protein binding microarray (PBM) or RNAcompete data to learn an interaction model between proteins and nucleic acids using only protein domain and probe sequences as inputs. When trained on mouse homeodomain PBM profiles, our model correctly identifies residues that confer DNA-binding specificity and accurately predicts binding motifs for an independent set of divergent homeodomains. Similarly, when trained on RNAcompete profiles for diverse RBPs, our model correctly predicts the binding affinities of held-out proteins and identifies key RNA-binding residues, despite the high level of sequence divergence across RBPs. We expect that the method will be broadly applicable to modeling and predicting paired macromolecular interactions in settings where high-throughput affinity data are available.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Lineage-Specific Genome Architecture Links Enhancers and Non-coding Disease Variants to Target Gene Promoters

Our next meeting will be at 3:00 on January 27th, in room 4160 of the Discovery building. Our Selected paper is Lineage-Specific Genome Architecture Links Enhancers and Non-coding Disease Variants to Target Gene Promoters.
The abstract is as follows.

Long-range interactions between regulatory elements and gene promoters play key roles in transcriptional regulation. The vast majority of interactions are uncharted, constituting a major missing link in understanding genome control. Here, we use promoter capture Hi-C to identify interacting regions of 31,253 promoters in 17 human primary hematopoietic cell types. We show that promoter interactions are highly cell type specific and enriched for links between active promoters and epigenetically marked enhancers. Promoter interactomes reflect lineage relationships of the hematopoietic tree, consistent with dynamic remodeling of nuclear architecture during differentiation. Interacting regions are enriched in genetic variants linked with altered expression of genes they contact, highlighting their functional role. We exploit this rich resource to connect non-coding disease variants to putative target promoters, prioritizing thousands of disease-candidate genes and implicating disease pathways. Our results demonstrate the power of primary cell promoter interactomes to reveal insights into genomic regulatory mechanisms underlying common diseases..

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.