Comparative Epigenome Studies Can Provide Insight into Both Evolution, and Cancer.

The epigenome is a dynamic layer of information encoded with in the genome and the chromatin super-structure of chromosomes in eukaryotic cells. The epigenome includes cytosine methylation on the DNA strand, and modifications and variants of histone proteins that shape the chromatin structure of DNA. This information is in addition to the protein coding information encoded in the DNA sequence, and notably relates to the context-specific regulation of gene expression in a cell. The epigenome is critical to the differentiation processes of embryonic stem (ES) cells and the determination of cell fate. The epigenome plays a fundamental role in the organization of cell-type-specific regulatory programs in complex organisms like mammals, i.e. humans.

Compared to the genome (DNA sequence) the epigenome is dynamic, undergoing large-scale changes on the time scale of cellular life. The classic epigenetic markers of DNA methylation and histone modification are widely studied, yet many mechanisms may remain unknown. Histone modifications are known to have both activating and repressive roles in enhancer and promoter regions of the genome, and control the availability of protein-coding genes for transcription (Ku et al. (2012) and Xiao et al. (2012)). In contrast DNA methylation has a characteristically repressive regulatory role, but has recently been associated with regulatory processes at secondary promoter regions within gene bodies (Maunakea et al., 2010). Genome-wide assays for such markers, such as ChIP-seq for histone modification markers like H2K27me, have become routine, allowing large compendia of data to be generated for various epi-marks and transcription factors. One example is the work of the ENCODE consortium, which has produced catalogues of such data for multiple species, including Homo sapiens (human) and Mus musculus (mouse). Such data are also cataloged by the NIH Roadmap Epigenomics Mapping Consortium (Bernstein, B. et al., 2010), and cover an array of epi-markers, species, and cell lines within those species (both normal and diseased).

The inter-species comparison of these data provides a novel view on the role of the epigenome in evolution. Recent work has produced a specifically rich data resource, including ChIP-seq data for a series of eight histone markers and four transcription factors in each of three mammalian species (Xiao et al., 2012). Additionally, DNA methylation (MeDIP-seq and MRE-seq) and gene expression (RNA-seq) are also assayed. The species studied here are Homo sapiens, Mus Musculus and Sus Scrofa (pig). This, and similar, data resources provide exciting new opportunities to study cellular processes and gene regulation in a context of an integrative analysis for multiple genomic features and functions.

Studies of epigenomic patterns in multiple species provide insight into the role of the epigenome in evolution. Studying gene regions that are orthologous between species highlights those epigenomic features that are important on the evolutionary time scale. Such analysis (Xiao et al., 2012) indicates that, in mammals, the epigenome is more highly conserved in regions of the genome where the rate of sequence change (as measured by the nucleotide substitution rate) is enhanced. The majority of interspecies variation in the epigenome is found to be in regions that are neutrally evolving. These results suggest that the epigenome acts as a buffer against the pressure of natural selection induced by phenotypic changes in organisms. This is a tantalizing picture of the dynamic role of the epigenome in evolution, and in complex organisms in general.

A chief extension to this initial, ground-breaking work that is of interest is to compare the clustering patterns and network module organization represented in these data, and study the functional role of the epigenome in that context. A recently developed algorithm, Arboretum, allows the simultaneous comparison of gene modules across multiple species, and has been successfully applied to the study of evolution in fungal Ascomycete (yeast) species. Multi-clustering methods of this kind are part of the expertise of the Roy research group (Roy et al., 2012 and 2011). In short, the necessary tools are now available to develop such an analysis for complex organisms like mammals.

Another vital topic of inquiry is the role of the epigenome in cancer. Carcinogenesis has been associated with genome-wide changes in the epigenome in adult stem cells (Aran et al., 2013), which mirror those that occur in ES cells during differentiation and proliferation (Kashyap et al., 2009). The increasing compendiums of data, as described above, provide a means for examining the epigenome in the context of cancer biology, too, as well as non-diseased states. It is possible to apply the same comparative analyses to cancer data, as described above for the comparison of species. In such a context we can learn about epigenetic processes on two very different evolutionary time scales; those of species differentiation, and organismal life. Importantly, such efforts can contribute to improved medical knowledge of cancer for the well fare of patients. Multi-clustering analyses of the kind described here can yield new insights into the dynamic and functional role of the epigenome, both in evolution and in complex organisms.


Aran, D. et al. (2013) DNA methylation of distal regulatory sites characterizes dysregulation of cancer genes. Genome Biology 14, R21

Bernstein, B. et al. (2010) The NIH epigenomics mapping consortium. Nature Biotechnology 28, 1045-1048.

Kashyap, V. et al. (2009) Regulation of stem cell pluripotency and differentiation involves a mutual regulatory circuit of the nanog, OCT4, and SOX2 pluripotency transcription factors with polycomb repressive complexes and stem cell microRNAs. Stem Cells and Development 7, 1093-1106.

Ku, M. et al. (2012) H2A . Z landscapes and dual modifications in pluripotent and multipotent stem cells underlie complex genome regulatory functions. Genome Biology 13, R85.

Maunakea, A. et al. (2010) Conserved role of intragenic DNA methylation in regulating alternative promotors. Nature 466, 253-260.

Roy, S. et al. (2011) A multiple network learning approach to capture system-wide condition-specific responses. Bioinformatics 27, 1832-1838.

Roy, S. et al. (The modENCODE Consortium ) (2010) Identification of functional elements and regulatory circuits in Drosophila modENCODE. Science 24, 1787-1797


Xiao, S. et al. (2012) Comparative epigenomic annotation of regulatory DNA. Cell 149, 1381-1392.