gene expression


Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM


Charles J. Vaske1,†, Stephen C. Benz2,†, J. Zachary Sanborn2, Dent Earl2, Christopher Szeto2, Jingchun Zhu2, David Haussler1,2 and Joshua M. Stuart2,*

+ Author Affiliations

1 Howard Hughes Medical Institute and 2 Department of Biomolecular Engineering and Center for Biomolecular Science and Engineering, UC Santa Cruz, CA, USA

* To whom correspondence should be addressed.


Motivation: High-throughput data is providing a comprehensive view of the molecular changes in cancer tissues. New technologies allow for the simultaneous genome-wide assay of the state of genome copy number variation, gene expression, DNA methylation and epigenetics of tumor samples and cancer cell lines.

Analyses of current data sets find that genetic alterations between patients can differ but often involve common pathways. It is therefore critical to identify relevant pathways involved in cancer progression and detect how they are altered in different patients.

Results: We present a novel method for inferring patient-specific genetic activities incorporating curated pathway interactions among genes. A gene is modeled by a factor graph as a set of interconnected variables encoding the expression and known activity of a gene and its products, allowing the incorporation of many types of omic data as evidence. The method predicts the degree to which a pathway’s activities (e.g. internal gene states, interactions or high-level ‘outputs’) are altered in the patient using probabilistic inference.

Compared with a competing pathway activity inference approach called SPIA, our method identifies altered activities in cancer-related pathways with fewer false-positives in both a glioblastoma multiform (GBM) and a breast cancer dataset. PARADIGM identified consistent pathway-level activities for subsets of the GBM patients that are overlooked when genes are considered in isolation. Further, grouping GBM patients based on their significant pathway perturbations divides them into clinically-relevant subgroups having significantly different survival outcomes. These findings suggest that therapeutics might be chosen that target genes at critical points in the commonly perturbed pathway(s) of a group of patients.

Availability:Source code available at


Supplementary information: Supplementary data are available at Bioinformatics online.


Network-guided regression for detecting associations between DNA methylation and gene expression


Zi Wang1, Edward Curry2 and Giovanni Montana1,3,*

1Department of Mathematics, Imperial College London, London SW7 2AZ.

2 Division of Cancer, Imperial College London, Hammersmith Hospital, London, W12 0NN

3 Department of Biomedical Engineering, King’s College London, St Thomas’ Hospital, London SE1 7EH

*To whom correspondence should be addressed. Giovanni Montana, E-mail:



Motivation: High-throughput profiling in biological research has resulted in the availability of a wealth of data cataloguing the genetic, epigenetic and transcriptional states of cells. This data could yield discoveries that lead to breakthroughs in the diagnosis and treatment of human disease, but requires statistical methods designed to find the most relevant patterns from millions of potential interactions. Aberrant DNA methylation is often a feature of cancer, and has been proposed as a therapeutic target. However, the relationship between DNA methylation and gene expression remains poorly understood.

Results: We propose Network-sparse Reduced-Rank Regression (NsRRR), a multivariate regression framework capable of using prior biological knowledge expressed as gene interaction networks to guide the search for associations between gene expression and DNA methylation signatures. We use simulations to show the advantage of our proposed model in terms of variable selection accuracy over alternative models that do not use prior network information. We discuss an application of NsRRR to TCGA datasets on primary ovarian tumours.

Availability: R code implementing the NsRRR model is available at


SPINE: a framework for signaling-regulatory pathway inference from cause-effect experiments

Oved Ourfali 1, Tomer Shlomi 1, Trey Ideker 3, Eytan Ruppin 1,2 and Roded Sharan 1

1 School of Computer Science, 2 School of Medicine, Tel-Aviv University, Tel-Aviv, Israel and 3 Department of Bioengineering, University of California, San Diego, CA 92093, USA

Motivation: The complex program of gene expression allows the cell to cope with changing genetic, developmental and environmental conditions. The accumulating large-scale measurements of gene knockout effects and molecular interactions allow us to begin to uncover regulatory and signaling pathways within the cell that connect causal to affected genes on a network of physical interactions.

Results: We present a novel framework, SPINE, for Signaling-regulatory Pathway INferencE. The framework aims at explaining gene expression experiments in which a gene is knocked out and as a result multiple genes change their expression levels. To this end, an integrated network of protein–protein and protein-DNA interactions is constructed, and signaling pathways connecting the causal gene to the affected genes are searched for in this network. The reconstruction problem is translated into that of assigning an activation/repression attribute with each protein so as to explain (in expectation) a maximum number of the knockout effects observed. We provide an integer programming formulation for the latter problem and solve it using a commercial solver.

We validate the method by applying it to a yeast subnetwork that is involved in mating. In cross-validation tests, SPINE obtains very high accuracy in predicting knockout effects (99%). Next, we apply SPINE to the entire yeast network to predict protein effects and reconstruct signaling and regulatory pathways. Overall, we are able to infer 861 paths with confidence and assign effects to 183 genes. The predicted effects are found to be in high agreement with current biological knowledge.

Availability: The algorithm and data are available at