Segway – a tool for unsupervised segmentation of the human genome

Segway provides a method for automatically segmenting the genome into functional regions by analyzing different kinds of high-throughput data from different experiments. The approach is described in a recent paper from the Noble research lab. Segway uses a Dynamic Bayesian network (DBN) to model the interdependencies between different genomic sections, which is trained using ChIP-seq, DNase-seq, and FAIRE-seq data from ENCODE. They condensed the many discovered segment types into 25 labels which were then assigned functional categories, including familiar terms like gene start, gene middle, gene end, and enhancer. Using this labeling, they recovered many well-known genomic features.

They next compared their results to genome annotations from ChromHMM. While both models produce the same sort of output, the input is different; ChromHMM is trained only with histone modification data, while Segway uses a variety of data types. The authors find that Segway better identifies known elements, has higher segment resolution, and handles missing data better. They focus less on differences across cell type then in the ChromHMM analysis, although their model does appear to accomodate these differences. They conclude by suggesting a hierarchical segmentation approach that could make genome annotation more comprehensible.

ChromHMM – a tool for chromatin state segmentation of genomes

Chromatin marks are an important factor in the transcription regulatory network. A recent study from Ernst et al. uses chip-seq to profile nine distinct histone modifications across nine different human cell types. They developed a tool, ChromHMM, with which they segment the human genome according to the combinatorial pattern of chromatin marks in each segment. ChromHMM applies a multivariate hidden Markov model which models combinations of histone modification using Bernoulli random variables in order to learn a set of distinct chromatin states, and assign each portion of the genome to one state. For their human data, they annotate 15 chromatin states which fall into the broad categories of promoters, enhancers, insulators, transcribed, repressed, and inactive.

They found that enhancer and promoter regions vary widely in activity level between cell types, but that the general categorization of a region as an area of regulatory potential is consistant across tissues. They clustered promoters and enhancers based on chromatin state profile, and found that clusters of promoters are general across cell types, while clusters of enhancers are cell-type specific. Next, they found a strong correlation between patterns of enhancer activity and gene expression levels of the nearest gene, suggesting that distal enhancers often neighbor their target gene. They mapped enhancer regions to target genes based on correlations between enhancer activity profiles, gene expression, sequence motif enrichment, and TF expression. Finally, they found that disease-associated SNPs are significantly enriched in portions of the genome associated with strong enhancer chromatin states.

More information about this study, including the ChromHMM software, can be found on the MIT Computational Biology website.