Regulatory networks in blood cells

This recent review on mapping transcriptional regulatory networks in hematopoietic stem cells (HSC) blood cells is very nice (1). Transcriptional regulatory networks are central to specifying context-specific gene expression patterns. Such networks specifying connections between transcription factor proteins and target genes, although, it is common to also include some upstream regulators such as signaling proteins, non-specific DNA binding proteins and chromatin remodelers as potential regulators. The review highlights the need to predict the connections between regulators and target genes. In particular for the HSCs about 50 or so regulators have been identified already. For several of these there are ChIP-seq/ChIP-chip data that have been measured. But our understanding of networks in this cellular state is still far from complete. While expression-based network inference methods including module networks have been applied to this lineage, there is still a lot to do to map these networks. The ability to measure single cell expression profiles opens up new opportunities because one can potentially discriminate causal and temporal dynamics on these networks. Furthermore, the ability to make more targeted perturbations using CRISPR offers even more opportunities to build more accurate pictures of such regulatory networks.


1. Göttgens, Berthold. “Regulatory network control of blood stem cells..” Blood 125 (April 2015): 2614-2620.

Predicting spatio-temporal patterns of expression

An open question in gene regulation is how spatio-temporal patterns of gene expression is encoded in the genome. We discussed this paper in our lab meeting. This paper talks about  a two-step approach to predicting spatial  and temporal gene expression patterns in Drosophila. This prediction task is tackled as a two-step approach: (a) first find cis-regulatory modules that exhibit spatio or temporal activity, (b) second link the crms to predict spatio-temporal expression. Assume space and time is captured by the term “context”.  To do (a) the authors use a Bayesian network which is trained on known CRM-context relationships. To do (b) the authors use additional data based on insulators and H3K4me1 to predict which CRMs are associated with which genes. In (a) the CRM-context information is known only for a few hundred of the total 8000 crms that are present. So the authors use an EM idea where they use the trained Bayesian network to predict the context-specific activity of each CRM. Then using the soft labels of each CRM they predict the expression of the gene. This second model is also a Bayesian network but has additional variables for the distance between CRMs and genes and whether there is an insulator binding site.

Chromatin marks and the cell-type specificity of SNPs

This paper came out recently in Nature and combines data from ENCODE, Epigenome consortia, multiple GWAS studies  and the 1000 genomes project to address the question of cell type specificity of genetic variation affecting diseases.

The authors try to get to two related questions: (a) what cell types are associated with a diseasem, by looking at the chromatin activities surrounding SNPs associated with a disease, (b) what marks are conferring this cell-type specificity to a disease, and such marks are called the informative marks. It all boils down to computing a statistic that measures how variable the strength of a mark is for SNPs in a disease.

The authors started off with SNPS associated with different diseases from a GWAS study. This analysis was done in a per-disease basis, for example consider LDL or rheumatoid arthritis, etc. The authors found what SNPS are associated with these studies in a GWAS study and added to this list some more SNPs that were in high linkage disequilibrium with these associated SNPs. Then they obtained chromatin mark peaks for different chromatin parks in different cell types and lines from ENCODE as well as the epigenome map. Then they asked for each SNP to what extent were they associated with a particular mark in a particular cell type. This was done by defining a score which is the ratio of the height of a peak to the width of the peak.

Thus if we were to think of this data as a matrix, we would have one matrix per mark, whose columns correspond to the positions of the SNPs and the rows correspond to differnce cell types. A mark is then considered informative for a disease and cell type if all or most of the marks exhibit a high score for a few cell types. A mark is uninformative if the snps associated with the highest scores are not the same across different cell types. To compute this score of informativeness of a mark, the authors defined a metric which measures the variation in the score of SNPs for a disease across cell types. Specifically, the statistic is a sum of square differences of SNP score, and the differences are computed for each cell type and phenotype combination. If this number is small, then the mark is apparently cell-type specific. Finally the authors use a pemutation analysis to identify whether a particular score is high or low. cell-type specificity for a disease is computed by summing over the scores over all snps in a given cell type and assessing significance.

The statistic they use to define whether  a mark is informative is the sum of squared differences between each snp’s score and the mean of all snps in each disease cell type combination. If this is small, then we can assume that the mark does not vary too much, but there is no control over which cell types the mark must vary over. I am not sure how the method deals with the situation where a mark is not changing a lot across cell types.

Wisdom of crowds in network inference

The sixth DREAM consortium paper is out in Nature Methods. The consortium is a community effort for systematically comparing expression-based network inference methods. A major finding of the paper is that combining different methods has the best performance. This year’s competition analyzed datasets from  a simulated network, the E. coli regulatory network, and the yeast S. cerevisiae regulatory network. Methods do well in for simulated and E. coli, but the performance is poor for yeast.

From the cover: “An Escherichia coli ‘community’ gene regulatory network, constructed by combining the predictions of several network inference methods tested in the DREAM5 challenge. Cover by Erin Dewalt, based on an image provided by Valdo Peixoto and Daniel Marbach.”