Network-guided regression for detecting associations between DNA methylation and gene expression
Zi Wang1, Edward Curry2 and Giovanni Montana1,3,*
1Department of Mathematics, Imperial College London, London SW7 2AZ.
2 Division of Cancer, Imperial College London, Hammersmith Hospital, London, W12 0NN
3 Department of Biomedical Engineering, King’s College London, St Thomas’ Hospital, London SE1 7EH
*To whom correspondence should be addressed. Giovanni Montana, E-mail: giovanni.montana@kcl.ac.uk
Abstract
Motivation: High-throughput profiling in biological research has resulted in the availability of a wealth of data cataloguing the genetic, epigenetic and transcriptional states of cells. This data could yield discoveries that lead to breakthroughs in the diagnosis and treatment of human disease, but requires statistical methods designed to find the most relevant patterns from millions of potential interactions. Aberrant DNA methylation is often a feature of cancer, and has been proposed as a therapeutic target. However, the relationship between DNA methylation and gene expression remains poorly understood.
Results: We propose Network-sparse Reduced-Rank Regression (NsRRR), a multivariate regression framework capable of using prior biological knowledge expressed as gene interaction networks to guide the search for associations between gene expression and DNA methylation signatures. We use simulations to show the advantage of our proposed model in terms of variable selection accuracy over alternative models that do not use prior network information. We discuss an application of NsRRR to TCGA datasets on primary ovarian tumours.
Availability: R code implementing the NsRRR model is available at http://www2.imperial.ac.uk/~gmontana/