Past Discussions


Context Specificity in Causal Signaling Networks Revealed by Phosphoprotein Profiling

Our next meeting will be at 2:30 on August 4th, in room 4160 of the Discovery building. Our Selected paper is Context Specificity in Causal Signaling Networks Revealed by Phosphoprotein Profiling.
The abstract is as follows.

Signaling networks downstream of receptor tyrosine kinases are among the most extensively studied biological networks, but new approaches are needed to elucidate causal relationships between network components and understand how such relationships are influenced by biological context and disease. Here, we investigate the context specificity of signaling networks within a causal conceptual framework using reverse-phase protein array time-course assays and network analysis approaches. We focus on a well-defined set of signaling proteins profiled under inhibition with five kinase inhibitors in 32 contexts: four breast cancer cell lines (MCF7, UACC812, BT20, and BT549) under eight stimulus conditions. The data, spanning multiple pathways and comprising ~70,000 phosphoprotein and ~260,000 protein measurements, provide a wealth of testable, context-specific hypotheses, several of which we experimentally validate. Furthermore, the data provide a unique resource for computational methods development, permitting empirical assessment of causal network learning in a complex, mammalian setting.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning

Our next meeting will be at 3:00 on June 23th, in room 4160 of the Discovery building. Our Selected paper is Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning.
The abstract is as follows.

We present single-cell interpretation via multikernel learning (SIMLR), an analytic framework and software which learns a similarity measure from single-cell RNA-seq data in order to perform dimension reduction, clustering and visualization. On seven published data sets, we benchmark SIMLR against state-of-the-art methods. We show that SIMLR is scalable and greatly enhances clustering performance while improving the visualization and interpretability of single-cell sequencing data.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Predicting Causal Relationships from Biological Data: Applying Automated Casual Discovery on Mass Cytometry Data of Human Immune Cells

Our next meeting will be at 3:00 on June 09th, in room 4160 of the Discovery building. Our Selected paper is Predicting Causal Relationships from Biological Data: Applying Automated Casual Discovery on Mass Cytometry Data of Human Immune Cells.
The abstract is as follows.

Learning the causal relationships that define a molecular system allows us to predict how the system will respond to different interventions. Distinguishing causality from mere association typically requires randomized experiments. Methods for automated causal discovery from limited experiments exist, but have so far rarely been tested in systems biology applications. In this work, we apply state-of-the art causal discovery methods on a large collection of public mass cytometry data sets, measuring intra-cellular signaling proteins of the human immune system and their response to several perturbations. We show how different experimental conditions can be used to facilitate causal discovery, and apply two fundamental methods that produce context-specific causal predictions. Causal predictions were reproducible across independent data sets from two different studies, but often disagree with the KEGG pathway databases. Within this context, we discuss the caveats we need to overcome for automated causal discovery to become a part of the routine data analysis in systems biology.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Selecting the most appropriate time points to profile in high-throughpsut studies

Our next meeting will be at 3:00 on May 26th, in room 4160 of the Discovery building. Our Selected paper is Selecting the most appropriate time points to profile in high-throughpsut studies.
The abstract is as follows.

Biological systems are increasingly being studied by high throughput profiling of molecular data over time. Determining the set of time points to sample in studies that profile several different types of molecular data is still challenging. Here we present the Time Point Selection (TPS) method that solves this combinatorial problem in a principled and practical way. TPS utilizes expression data from a small set of genes sampled at a high rate. As we show by applying TPS to study mouse lung development, the points selected by TPS can be used to reconstruct an accurate representation for the expression values of the non selected points. Further, even though the selection is only based on gene expression, these points are also appropriate for representing a much larger set of protein, miRNA and DNA methylation changes over time. TPS can thus serve as a key design strategy for high throughput time series experiments.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Discovering sparse transcription factor codes for cell states and state transitions during development

Our next meeting will be at 3:00 on April 28th, in room 4160 of the Discovery building. Our Selected paper is Discovering sparse transcription factor codes for cell states and state transitions during development.
The abstract is as follows.

Computational analysis of gene expression to determine both the sequence of lineage choices made by multipotent cells and to identify the genes influencing these decisions is challenging. Here we discover a pattern in the expression levels of a sparse subset of genes among cell types in B- and T-cell developmental lineages that correlates with developmental topologies. We develop a statistical framework using this pattern to simultaneously infer lineage transitions and the genes that determine these relationships. We use this technique to reconstruct the early hematopoietic and intestinal developmental trees. We extend this framework to analyze single-cell RNA-seq data from early human cortical development, inferring a neocortical-hindbrain split in early progenitor cells and the key genes that could control this lineage decision. Our work allows us to simultaneously infer both the identity and lineage of cell types as well as a small set of key genes whose expression patterns reflect these relationships.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Can We Predict Gene Expression by Understanding Proximal Promoter Architecture?

Our next meeting will be at 3:00 on April 14th, in room 4160 of the Discovery building. Our Selected paper is Discovering sparse transcription factor codes for cell states and state transitions during development.
The abstract is as follows.

We review computational predictions of expression from the promoter architecture – the set of transcription factors that can bind the proximal promoter. We focus on spatial expression patterns in animals with complex body plans and many distinct tissue types. This field is ripe for change as functional genomics datasets accumulate for both expression and protein–DNA interactions. While there has been some success in predicting the breadth of expression (i.e., the fraction of tissue types a gene is expressed in), predicting tissue specificity remains challenging. We discuss how progress can be achieved through either machine learning or complementary combinatorial data mining. The likely impact of single-cell expression data is considered. Finally, we discuss the design of artificial promoters as a practical application.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Reproducibility of computational workflows is automated using continuous analysis

Our next meeting will be at 3:00 on March 24th, in room 4160 of the Discovery building. Our Selected paper is Reproducibility of computational workflows is automated using continuous analysis.
The abstract is as follows.

Replication, validation and extension of experiments are crucial for scientific progress. Computational experiments are scriptable and should be easy to reproduce. However, computational analyses are designed and run in a specific computing environment, which may be difficult or impossible to match using written instructions. We report the development of continuous analysis, a workflow that enables reproducible computational analyses. Continuous analysis combines Docker, a container technology akin to virtual machines, with continuous integration, a software development technique, to automatically rerun a computational analysis whenever updates or improvements are made to source code or data. This enables researchers to reproduce results without contacting the study authors. Continuous analysis allows reviewers, editors or readers to verify reproducibility without manually downloading and rerunning code and can provide an audit trail for analyses of data that cannot be shared.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Genome-Scale Networks Link Neurodegenerative Disease Genes to α-Synuclein through Specific Molecular Pathways

Our next meeting will be at 3:00 on March 10th, in room 4160 of the Discovery building. Our Selected paper is Genome-Scale Networks Link Neurodegenerative Disease Genes to α-Synuclein through Specific Molecular Pathways.
The abstract is as follows.

Numerous genes and molecular pathways are implicated in neurodegenerative proteinopathies, but their inter-relationships are poorly understood. We systematically mapped molecular pathways underlying the toxicity of alpha-synuclein (α-syn), a protein central to Parkinson’s disease. Genome-wide screens in yeast identified 332 genes that impact α-syn toxicity. To “humanize” this molecular network, we developed a computational method, TransposeNet. This integrates a Steiner prize-collecting approach with homology assignment through sequence, structure, and interaction topology. TransposeNet linked α-syn to multiple parkinsonism genes and druggable targets through perturbed protein trafficking and ER quality control as well as mRNA metabolism and translation. A calcium signaling hub linked these processes to perturbed mitochondrial quality control and function, metal ion transport, transcriptional regulation, and signal transduction. Parkinsonism gene interaction profiles spatially opposed in the network (ATP13A2/PARK9 and VPS35/PARK17) were highly distinct, and network relationships for specific genes (LRRK2/PARK8, ATXN2, and EIF4G1/PARK18) were confirmed in patient induced pluripotent stem cell (iPSC)-derived neurons. This cross-species platform connected diverse neurodegenerative genes to proteinopathy through specific mechanisms and may facilitate patient stratification for targeted therapy.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Mutation effects predicted from sequence co-variation

Our next meeting will be at 3:00 on February 24th, in room 4160 of the Discovery building. Our Selected paper is Mutation effects predicted from sequence co-variation.
The abstract is as follows.

Many high-throughput experimental technologies have been developed to assess the effects of large numbers of mutations (variation) on phenotypes. However, designing functional assays for these methods is challenging, and systematic testing of all combinations is impossible, so robust methods to predict the effects of genetic variation are needed. Most prediction methods exploit evolutionary sequence conservation but do not consider the interdependencies of residues or bases. We present EVmutation, an unsupervised statistical method for predicting the effects of mutations that explicitly captures residue dependencies between positions. We validate EVmutation by comparing its predictions with outcomes of high-throughput mutagenesis experiments and measurements of human disease mutations and show that it outperforms methods that do not account for epistasis. EVmutation can be used to assess the quantitative effects of mutations in genes of any organism. We provide pre-computed predictions for ~7,000 human proteins at http://evmutation.org/.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Affinity regression predicts the recognition code of nucleic acid–binding proteins

Our next meeting will be at 3:00 on February 10th, in room 4160 of the Discovery building. Our Selected paper is Affinity regression predicts the recognition code of nucleic acid–binding proteins.
The abstract is as follows.

Predicting the affinity profiles of nucleic acid–binding proteins directly from the protein sequence is a challenging problem. We present a statistical approach for learning the recognition code of a family of transcription factors or RNA-binding proteins (RBPs) from high-throughput binding data. Our method, called affinity regression, trains on protein binding microarray (PBM) or RNAcompete data to learn an interaction model between proteins and nucleic acids using only protein domain and probe sequences as inputs. When trained on mouse homeodomain PBM profiles, our model correctly identifies residues that confer DNA-binding specificity and accurately predicts binding motifs for an independent set of divergent homeodomains. Similarly, when trained on RNAcompete profiles for diverse RBPs, our model correctly predicts the binding affinities of held-out proteins and identifies key RNA-binding residues, despite the high level of sequence divergence across RBPs. We expect that the method will be broadly applicable to modeling and predicting paired macromolecular interactions in settings where high-throughput affinity data are available.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.