Chromatin marks and the cell-type specificity of SNPs

This paper came out recently in Nature and combines data from ENCODE, Epigenome consortia, multiple GWAS studies  and the 1000 genomes project to address the question of cell type specificity of genetic variation affecting diseases.

The authors try to get to two related questions: (a) what cell types are associated with a diseasem, by looking at the chromatin activities surrounding SNPs associated with a disease, (b) what marks are conferring this cell-type specificity to a disease, and such marks are called the informative marks. It all boils down to computing a statistic that measures how variable the strength of a mark is for SNPs in a disease.

The authors started off with SNPS associated with different diseases from a GWAS study. This analysis was done in a per-disease basis, for example consider LDL or rheumatoid arthritis, etc. The authors found what SNPS are associated with these studies in a GWAS study and added to this list some more SNPs that were in high linkage disequilibrium with these associated SNPs. Then they obtained chromatin mark peaks for different chromatin parks in different cell types and lines from ENCODE as well as the epigenome map. Then they asked for each SNP to what extent were they associated with a particular mark in a particular cell type. This was done by defining a score which is the ratio of the height of a peak to the width of the peak.

Thus if we were to think of this data as a matrix, we would have one matrix per mark, whose columns correspond to the positions of the SNPs and the rows correspond to differnce cell types. A mark is then considered informative for a disease and cell type if all or most of the marks exhibit a high score for a few cell types. A mark is uninformative if the snps associated with the highest scores are not the same across different cell types. To compute this score of informativeness of a mark, the authors defined a metric which measures the variation in the score of SNPs for a disease across cell types. Specifically, the statistic is a sum of square differences of SNP score, and the differences are computed for each cell type and phenotype combination. If this number is small, then the mark is apparently cell-type specific. Finally the authors use a pemutation analysis to identify whether a particular score is high or low. cell-type specificity for a disease is computed by summing over the scores over all snps in a given cell type and assessing significance.

The statistic they use to define whether  a mark is informative is the sum of squared differences between each snp’s score and the mean of all snps in each disease cell type combination. If this is small, then we can assume that the mark does not vary too much, but there is no control over which cell types the mark must vary over. I am not sure how the method deals with the situation where a mark is not changing a lot across cell types.