ppointer


09.03.14

Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin

Katherine A. Hoadley1, 20, Christina Yau2, 20, Denise M. Wolf3, 20, Andrew D. Cherniack4, 20, David Tamborero5, Sam Ng6, Max D.M. Leiserson7, Beifang Niu8, Michael D. McLellan8, Vladislav Uzunangelov6, Jiashan Zhang9, Cyriac Kandoth8, Rehan Akbani10, Hui Shen11, 22, Larsson Omberg12, Andy Chu13, Adam A. Margolin12, 21, Laura J. van’t Veer3, Nuria Lopez-Bigas5, 14, Peter W. Laird11, 22, Benjamin J. Raphael7, Li Ding8, A. Gordon Robertson13, Lauren A. Byers10, Gordon B. Mills10, John N. Weinstein10, Carter Van Waes18, Zhong Chen19, Eric A. Collisson15,The Cancer Genome Atlas Research Network, Christopher C. Benz2, , , Charles M. Perou1, 16, 17, , , Joshua M. Stuart6, ,

1 Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
2 Buck Institute for Research on Aging, Novato, CA 94945, USA
3 Department of Laboratory Medicine, University of California San Francisco, 2340 Sutter St, San Francisco, CA, 94115, USA
4 The Eli and Edythe Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
5 Research Unit on Biomedical Informatics, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Dr. Aiguader 88, Barcelona 08003, Spain
6 Department of Biomolecular Engineering, Center for Biomolecular Sciences and Engineering, University of California, Santa Cruz, 1156 High St., Santa Cruz, CA 95064, USA
7 Department of Computer Science and Center for Computational Molecular Biology, Brown University, 115 Waterman St, Providence RI 02912, USA
8 The Genome Institute, Washington University, St Louis, MO 63108, USA
9 National Cancer Institute, NIH, Bethesda, MD 20892, USA
10 UT MD Anderson Cancer Center, Bioinformatics and Computational Biology, 1400 Pressler Street, Unit 1410, Houston, TX 77030, USA
11 USC Epigenome Center, University of Southern California Keck School of Medicine, 1450 Biggy Street, Los Angeles, CA 90033, USA
12 Sage Bionetworks 1100 Fairview Avenue North, M1-C108, Seattle, WA 98109-1024, USA
13 Canada’s Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC V5Z 4S6, Canada
14 Catalan Institution for Research and Advanced Studies (ICREA), Passeig Lluís Companys, 23, Barcelona 08010, Spain
15 Department of Medicine, University of California San Francisco, 450 35d St, San Francisco, CA, 94148, USA
16 Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
17 Department of Pathology and Laboratory Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
18 Building 10, Room 4-2732, NIDCD/NIH, 10 Center Drive, Bethesda, MD 20892
19 Head and Neck Surgery Branch, NIDCD/NIH, 10 Center Drive, Room 5D55, Bethesda, MD 20892


08.06.2014

An Integrated Model of Multiple-Condition ChIP-Seq Data Reveals Predeterminants of Cdx2 Binding

 

Shaun Mahony*, Matthew D. Edwards*, Esteban O. Mazzoni, Richard I. Sherwood, Akshay Kakumanu, Carolyn A. Morrison, Hynek Wichterle, David K. Gifford

*equal contributor

Published: March 27, 2014    DOI: 10.1371/journal.pcbi.1003501

Abstract

Regulatory proteins can bind to different sets of genomic targets in various cell types or conditions. To reliably characterize such condition-specific regulatory binding we introduce MultiGPS, an integrated machine learning approach for the analysis of multiple related ChIP-seq experiments. MultiGPS is based on a generalized Expectation Maximization framework that shares information across multiple experiments for binding event discovery. We demonstrate that our framework enables the simultaneous modeling of sparse condition-specific binding changes, sequence dependence, and replicate-specific noise sources. MultiGPS encourages consistency in reported binding event locations across multiple-condition ChIP-seq datasets and provides accurate estimation of ChIP enrichment levels at each event. MultiGPS’s multi-experiment modeling approach thus provides a reliable platform for detecting differential binding enrichment across experimental conditions. We demonstrate the advantages of MultiGPS with an analysis of Cdx2 binding in three distinct developmental contexts. By accurately characterizing condition-specific Cdx2 binding, MultiGPS enables novel insight into the mechanistic basis of Cdx2 site selectivity. Specifically, the condition-specific Cdx2 sites characterized by MultiGPS are highly associated with pre-existing genomic context, suggesting that such sites are pre-determined by cell-specific regulatory architecture. However, MultiGPS-defined condition-independent sites are not predicted by pre-existing regulatory signals, suggesting that Cdx2 can bind to a subset of locations regardless of genomic environment. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2–5.


07.09.2014 and 07.23.14

Predicting Dynamic Signaling Network Response under Unseen Perturbations

Fan Zhu 1 and Yuanfang Guan 1,2,3,*

1 Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA

2 Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109, USA

3 Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA

* To whom correspondence should be addressed.

ABSTRACT

Motivation: Predicting trajectories of signaling networks under complex perturbations is one of the most valuable but challenging tasks in systems biology. Signaling networks are involved in most of the biological pathways and modeling their dynamics has wide applications including drug design and treatment outcome prediction.

Results: In this paper, we report a novel model for predicting the cell type-specific time course response of signaling proteins under unseen perturbations. This algorithm achieved the top performance in the 2013 8th Dialogue for Reverse Engineering Assessments and Methods (DREAM 8) sub challenge: time course prediction in breast cancer cell lines. We formulate the trajectory prediction problem into a standard regularization problem; the solution becomes solving this discrete ill-posed problem. This algorithm includes three steps: denoising, estimating regression coefficients and modeling trajectories under unseen perturbations. We further validated the accuracy of this method against simulation and experimental data. Furthermore, this method reduces computational time by magnitudes compared to state-of-the-art methods, allowing genome-wide modeling of signaling pathways and time course trajectories to be carried out in a practical time.

Availability and Implementation: Source code is available at http://guanlab.ccmb.med.umich.edu/DREAM/code.html and as supplementary file online. Contact: gyuanfan@umich.edu

 


06.25.2014

Network-guided regression for detecting associations between DNA methylation and gene expression

 

Zi Wang1, Edward Curry2 and Giovanni Montana1,3,*

1Department of Mathematics, Imperial College London, London SW7 2AZ.

2 Division of Cancer, Imperial College London, Hammersmith Hospital, London, W12 0NN

3 Department of Biomedical Engineering, King’s College London, St Thomas’ Hospital, London SE1 7EH

*To whom correspondence should be addressed. Giovanni Montana, E-mail: giovanni.montana@kcl.ac.uk

 

Abstract 

Motivation: High-throughput profiling in biological research has resulted in the availability of a wealth of data cataloguing the genetic, epigenetic and transcriptional states of cells. This data could yield discoveries that lead to breakthroughs in the diagnosis and treatment of human disease, but requires statistical methods designed to find the most relevant patterns from millions of potential interactions. Aberrant DNA methylation is often a feature of cancer, and has been proposed as a therapeutic target. However, the relationship between DNA methylation and gene expression remains poorly understood.

Results: We propose Network-sparse Reduced-Rank Regression (NsRRR), a multivariate regression framework capable of using prior biological knowledge expressed as gene interaction networks to guide the search for associations between gene expression and DNA methylation signatures. We use simulations to show the advantage of our proposed model in terms of variable selection accuracy over alternative models that do not use prior network information. We discuss an application of NsRRR to TCGA datasets on primary ovarian tumours.

Availability: R code implementing the NsRRR model is available at http://www2.imperial.ac.uk/~gmontana/


06.11.2014

A Validated Regulatory Network for Th17 Cell Specification

Maria Ciofani1, 10, Aviv Madar3, 4, 10, Carolina Galan1, MacLean Sellars1, Kieran Mace3, Florencia Pauli5, Ashish Agarwal3, Wendy Huang1, Christopher N. Parkurst1, Michael Muratet5, Kim M. Newberry5, Sarah Meadows5, Alex Greenfield2, Yi Yang1, Preti Jain5, Francis K. Kirigin2, Carmen Birchmeier6, Erwin F. Wagner7, Kenneth M. Murphy8, 9, Richard M. Myers5, Richard Bonneau3, 4, Corresponding author contact information, E-mail the corresponding author, Dan R. Littman1, 9, Corresponding author contact information, E-mail the corresponding author

1 Molecular Pathogenesis Program, The Kimmel Center for Biology and Medicine of the Skirball Institute, New York University School of Medicine, New York, NY 10016, USA

2 Computational Biology Program, The Sackler Institute, New York University School of Medicine, New York, NY 10016, USA

3 Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, 10003 USA

4 Computer Science Department, Courant Institute of Mathematical Sciences, New York University, New York, NY, 10003 USA

5 HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA

6 Developmental Biology, Max Delbruck for Molecular Medicine, 13125 Berlin, Germany

7 Cancer Cell Biology Programme, Spanish National Cancer Research Centre (CNIO), E-28029 Madrid, Spain

8 Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63108, USA

9 The Howard Hughes Medical Institute

 

Summary

Th17 cells have critical roles in mucosal defense and are major contributors to inflammatory disease. Their differentiation requires the nuclear hormone receptor RORγt working with multiple other essential transcription factors (TFs). We have used an iterative systems approach, combining genome-wide TF occupancy, expression profiling of TF mutants, and expression time series to delineate the Th17 global transcriptional regulatory network. We find that cooperatively bound BATF and IRF4 contribute to initial chromatin accessibility and, with STAT3, initiate a transcriptional program that is then globally tuned by the lineage-specifying TF RORγt, which plays a focal deterministic role at key loci. Integration of multiple data sets allowed inference of an accurate predictive model that we computationally and experimentally validated, identifying multiple new Th17 regulators, including Fosl2, a key determinant of cellular plasticity. This interconnected network can be used to investigate new therapeutic approaches to manipulate Th17 functions in the setting of inflammatory disease.


04.16.14

SPINE: a framework for signaling-regulatory pathway inference from cause-effect experiments

Oved Ourfali 1, Tomer Shlomi 1, Trey Ideker 3, Eytan Ruppin 1,2 and Roded Sharan 1

1 School of Computer Science, 2 School of Medicine, Tel-Aviv University, Tel-Aviv, Israel and 3 Department of Bioengineering, University of California, San Diego, CA 92093, USA

Abstract:
Motivation: The complex program of gene expression allows the cell to cope with changing genetic, developmental and environmental conditions. The accumulating large-scale measurements of gene knockout effects and molecular interactions allow us to begin to uncover regulatory and signaling pathways within the cell that connect causal to affected genes on a network of physical interactions.

Results: We present a novel framework, SPINE, for Signaling-regulatory Pathway INferencE. The framework aims at explaining gene expression experiments in which a gene is knocked out and as a result multiple genes change their expression levels. To this end, an integrated network of protein–protein and protein-DNA interactions is constructed, and signaling pathways connecting the causal gene to the affected genes are searched for in this network. The reconstruction problem is translated into that of assigning an activation/repression attribute with each protein so as to explain (in expectation) a maximum number of the knockout effects observed. We provide an integer programming formulation for the latter problem and solve it using a commercial solver.

We validate the method by applying it to a yeast subnetwork that is involved in mating. In cross-validation tests, SPINE obtains very high accuracy in predicting knockout effects (99%). Next, we apply SPINE to the entire yeast network to predict protein effects and reconstruct signaling and regulatory pathways. Overall, we are able to infer 861 paths with confidence and assign effects to 183 genes. The predicted effects are found to be in high agreement with current biological knowledge.

Availability: The algorithm and data are available at http://cs.tau.ac.il/~roded/SPINE.html


11.13.13

TREEGL: reverse engineering tree-evolving gene networks underlying developing biological lineages

Ankur P. Parikh1, Wei Wu2, Ross E. Curtis3,4 and Eric P. Xing1,3,4

1 School of Computer Science, Carnegie Mellon University, 2 Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, University of Pittsburgh, 3 Lane Center for Computational Biology, Carnegie Mellon University and 4 Joint Carnegie Mellon University-University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, PA, 15213

Abstract:
Motivation: Estimating gene regulatory networks over biological lineages is central to a deeper understanding of how cells evolve during development and differentiation. However, one challenge in estimating such evolving networks is that their host cells not only contiguously evolve, but also branch over time. For example, a stem cell evolves into two more specialized daughter cells at each division, forming a tree of networks. Another example is in a laboratory setting: a biologist may apply several different drugs individually to malignant cancer cells to analyze the effects of each drug on the cells; the cells treated by one drug may not be intrinsically similar to those treated by another, but rather to the malignant cancer cells they were derived from.

Results: We propose a novel algorithm, Treegl, an ℓ1 plus total variation penalized linear regression method, to effectively estimate multiple gene networks corresponding to cell types related by a tree-genealogy, based on only a few samples from each cell type. Treegl takes advantage of the similarity between related networks along the biological lineage, while at the same time exposing sharp differences between the networks. We demonstrate that our algorithm performs significantly better than existing methods via simulation. Furthermore we explore an application to a breast cancer dataset, and show that our algorithm is able to produce biologically valid results that provide insight into the progression and reversion of breast cancer cells.

Availability: Software will be available at http://www.sailing.cs.cmu.edu/.

Contact: epxing@cs.cmu.edu


12.09.13

Differential expression in RNA-seq: A matter of depth

Sonia Tarazona1,2, Fernando García-Alcalde1, Joaquín Dopazo1, Alberto Ferrer2 and Ana Conesa1,3
1Bioinformatics and Genomics Department, Centro de Investigación Príncipe Felipe, 46012 Valencia, Spain;
2Department of Applied Statistics, Operations Research and Quality, Universidad Politécnica de Valencia, 46022 Valencia, Spain

Abstract:
Next-generation sequencing (NGS) technologies are revolutionizing genome research, and in particular, their application to transcriptomics (RNA-seq) is increasingly being used for gene expression profiling as a replacement for microarrays. However, the properties of RNA-seq data have not been yet fully established, and additional research is needed for understanding how these data respond to differential expression analysis. In this work, we set out to gain insights into the characteristics of RNA-seq data analysis by studying an important parameter of this technology: the sequencing depth. We have analyzed how sequencing depth affects the detection of transcripts and their identification as differentially expressed, looking at aspects such as transcript biotype, length, expression level, and fold-change. We have evaluated different algorithms available for the analysis of RNA-seq and proposed a novel approach—NOISeq—that differs from existing methods in that it is data-adaptive and nonparametric. Our results reveal that most existing methodologies suffer from a strong dependency on sequencing depth for their differential expression calls and that this results in a considerable number of false positives that increases as the number of reads grows. In contrast, our proposed method models the noise distribution from the actual data, can therefore better adapt to the size of the data set, and is more effective in controlling the rate of false discoveries. This work discusses the true potential of RNA-seq for studying regulation at low expression ranges, the noise within RNA-seq data, and the issue of replication.


01.22.14

Perturbation Biology: Inferring Signaling Networks in Cellular Systems

Evan J. Molinelli equal contributor, Anil Korkut equal contributor, Weiqing Wang equal contributor, Martin L. Miller, Nicholas P. Gauthier, Xiaohong Jing, Poorvi Kaushik, Qin He, Gordon Mills, David B. Solit, Christine A. Pratilas, Martin Weigt, Alfredo Braunstein, Andrea Pagnani, Riccardo Zecchina, Chris Sander

Abstract:

We present a powerful experimental-computational technology for inferring network models that predict the response of cells to perturbations, and that may be useful in the design of combinatorial therapy against cancer. The experiments are systematic series of perturbations of cancer cell lines by targeted drugs, singly or in combination. The response to perturbation is quantified in terms of relative changes in the measured levels of proteins, phospho-proteins and cellular phenotypes such as viability. Computational network models are derived de novo, i.e., without prior knowledge of signaling pathways, and are based on simple non-linear differential equations. The prohibitively large solution space of all possible network models is explored efficiently using a probabilistic algorithm, Belief Propagation (BP), which is three orders of magnitude faster than standard Monte Carlo methods. Explicit executable models are derived for a set of perturbation experiments in SKMEL-133 melanoma cell lines, which are resistant to the therapeutically important inhibitor of RAF kinase. The resulting network models reproduce and extend known pathway biology. They empower potential discoveries of new molecular interactions and predict efficacious novel drug perturbations, such as the inhibition of PLK1, which is verified experimentally. This technology is suitable for application to larger systems in diverse areas of molecular biology.

 

To comment, please see the continuation meeting post on 02.05.14.


02.05.14

Perturbation Biology: Inferring Signaling Networks in Cellular Systems

Evan J. Molinelli equal contributor, Anil Korkut equal contributor, Weiqing Wang equal contributor, Martin L. Miller, Nicholas P. Gauthier, Xiaohong Jing, Poorvi Kaushik, Qin He, Gordon Mills, David B. Solit, Christine A. Pratilas, Martin Weigt, Alfredo Braunstein, Andrea Pagnani, Riccardo Zecchina, Chris Sander

Abstract:

We present a powerful experimental-computational technology for inferring network models that predict the response of cells to perturbations, and that may be useful in the design of combinatorial therapy against cancer. The experiments are systematic series of perturbations of cancer cell lines by targeted drugs, singly or in combination. The response to perturbation is quantified in terms of relative changes in the measured levels of proteins, phospho-proteins and cellular phenotypes such as viability. Computational network models are derived de novo, i.e., without prior knowledge of signaling pathways, and are based on simple non-linear differential equations. The prohibitively large solution space of all possible network models is explored efficiently using a probabilistic algorithm, Belief Propagation (BP), which is three orders of magnitude faster than standard Monte Carlo methods. Explicit executable models are derived for a set of perturbation experiments in SKMEL-133 melanoma cell lines, which are resistant to the therapeutically important inhibitor of RAF kinase. The resulting network models reproduce and extend known pathway biology. They empower potential discoveries of new molecular interactions and predict efficacious novel drug perturbations, such as the inhibition of PLK1, which is verified experimentally. This technology is suitable for application to larger systems in diverse areas of molecular biology.