transcription factor


Defining an essential transcription factor program for naive pluripotency

S.-J. Dunn1,*, G. Martello2,*,†‡, B. Yordanov1,*, S. Emmott1, A. G. Smith2,3,†
1Computational Science Laboratory, Microsoft Research, Cambridge CB1 2FB, UK.
2Wellcome Trust–Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge CB2 1QR, UK.
3Department of Biochemistry, University of Cambridge, Cambridge, UK.
Department of Molecular Medicine, University of Padua, 35131 Padua, Italy.
†Corresponding author. E-mail: (G.M.); (A.G.S.)
* These authors contributed equally to this work.


The gene regulatory circuitry through which pluripotent embryonic stem (ES) cells choose between self-renewal and differentiation appears vast and has yet to be distilled into an executive molecular program. We developed a data-constrained, computational approach to reduce complexity and to derive a set of functionally validated components and interaction combinations sufficient to explain observed ES cell behavior. This minimal set, the simplest version of which comprises only 16 interactions, 12 components, and three inputs, satisfies all prior specifications for self-renewal and furthermore predicts unknown and nonintuitive responses to compound genetic perturbations with an overall accuracy of 70%. We propose that propagation of ES cell identity is not determined by a vast interactome but rather can be explained by a relatively simple process of molecular computation.


Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia

Yue Li, Minggao Liang, Zhaolei Zhang


Gene expression is a combinatorial function of genetic/epigenetic factors such as copy number variation (CNV), DNA methylation (DM), transcription factors (TF) occupancy, and microRNA (miRNA) post-transcriptional regulation. At the maturity of microarray/sequencing technologies, large amounts of data measuring the genome-wide signals of those factors became available from Encyclopedia of DNA Elements (ENCODE) and The Cancer Genome Atlas (TCGA). However, there is a lack of an integrative model to take full advantage of these rich yet heterogeneous data. To this end, we developed RACER (Regression Analysis of Combined Expression Regulation), which fits the mRNA expression as response using as explanatory variables, the TF data from ENCODE, and CNV, DM, miRNA expression signals from TCGA. Briefly, RACER first infers the sample-specific regulatory activities by TFs and miRNAs, which are then used as inputs to infer specific TF/miRNA-gene interactions. Such a two-stage regression framework circumvents a common difficulty in integrating ENCODE data measured in generic cell-line with the sample-specific TCGA measurements. As a case study, we integrated Acute Myeloid Leukemia (AML) data from TCGA and the related TF binding data measured in K562 from ENCODE. As a proof-of-concept, we first verified our model formalism by 10-fold cross-validation on predicting gene expression. We next evaluated RACER on recovering known regulatory interactions, and demonstrated its superior statistical power over existing methods in detecting known miRNA/TF targets. Additionally, we developed a feature selection procedure, which identified 18 regulators, whose activities clustered consistently with cytogenetic risk groups. One of the selected regulators is miR-548p, whose inferred targets were significantly enriched for leukemia-related pathway, implicating its novel role in AML pathogenesis. Moreover, survival analysis using the inferred activities identified C-Fos as a potential AML prognostic marker. Together, we provided a novel framework that successfully integrated the TCGA and ENCODE data in revealing AML-specific regulatory program at global level.


Detection of active transcription factor binding sites with the combination of DNase hypersensitivity and histone modifications

  1. Eduardo G. Gusmao1,*,
  2. Christoph Dieterich2,
  3. Martin Zenke3,4 and
  4. Ivan G. Costa1,5,6,*

+Author Affiliations

  1. 1IZKF Computational Biology Research Group, Institute for Biomedical Engineering, RWTH Aachen University Medical School, 52074 Aachen, 2Computational RNA Biology Lab and Bioinformatics Core, Max Planck Institute for Biology of Ageing, 50931 Cologne, 3Department of Cell Biology, Institute for Biomedical Engineering, RWTH Aachen University Medical School, 52074, 4Helmholtz Institute for Biomedical Engineering, 52074, 5Aachen Institute for Advanced Study in Computational Engineering Science (AICES), RWTH Aachen University, 52062 Aachen, Germany and 6Center of Informatics, Federal University of Pernambuco, 50740560 Recife-PE, Brazil
  1. *To whom correspondence should be addressed
  • Received October 28, 2013.
  • Revision received June 27, 2014.
  • Accepted July 25, 2014.


Motivation: The identification of active transcriptional regulatory elements is crucial to understand regulatory networks driving cellular processes such as cell development and the onset of diseases. It has recently been shown that chromatin structure information, such as DNase I hypersensitivity (DHS) or histone modifications, significantly improves cell-specific predictions of transcription factor binding sites. However, no method has so far successfully combined both DHS and histone modification data to perform active binding site prediction.

Results: We propose here a method based on hidden Markov models to integrate DHS and histone modifications occupancy for the detection of open chromatin regions and active binding sites. We have created a framework that includes treatment of genomic signals, model training and genome-wide application. In a comparative analysis, our method obtained a good trade-off between sensitivity versus specificity and superior area under the curve statistics than competing methods. Moreover, our technique does not require further training or sequence information to generate binding location predictions. Therefore, the method can be easily applied on new cell types and allow flexible downstream analysis such asde novo motif finding.

Availability and implementation: Our framework is available as part of the Regulatory Genomics Toolbox. The software information and all benchmarking data are available at

Contact: or

Supplementary information: Supplementary data are available atBioinformatics online.


A Validated Regulatory Network for Th17 Cell Specification

Maria Ciofani1, 10, Aviv Madar3, 4, 10, Carolina Galan1, MacLean Sellars1, Kieran Mace3, Florencia Pauli5, Ashish Agarwal3, Wendy Huang1, Christopher N. Parkurst1, Michael Muratet5, Kim M. Newberry5, Sarah Meadows5, Alex Greenfield2, Yi Yang1, Preti Jain5, Francis K. Kirigin2, Carmen Birchmeier6, Erwin F. Wagner7, Kenneth M. Murphy8, 9, Richard M. Myers5, Richard Bonneau3, 4, Corresponding author contact information, E-mail the corresponding author, Dan R. Littman1, 9, Corresponding author contact information, E-mail the corresponding author

1 Molecular Pathogenesis Program, The Kimmel Center for Biology and Medicine of the Skirball Institute, New York University School of Medicine, New York, NY 10016, USA

2 Computational Biology Program, The Sackler Institute, New York University School of Medicine, New York, NY 10016, USA

3 Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, 10003 USA

4 Computer Science Department, Courant Institute of Mathematical Sciences, New York University, New York, NY, 10003 USA

5 HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA

6 Developmental Biology, Max Delbruck for Molecular Medicine, 13125 Berlin, Germany

7 Cancer Cell Biology Programme, Spanish National Cancer Research Centre (CNIO), E-28029 Madrid, Spain

8 Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63108, USA

9 The Howard Hughes Medical Institute



Th17 cells have critical roles in mucosal defense and are major contributors to inflammatory disease. Their differentiation requires the nuclear hormone receptor RORγt working with multiple other essential transcription factors (TFs). We have used an iterative systems approach, combining genome-wide TF occupancy, expression profiling of TF mutants, and expression time series to delineate the Th17 global transcriptional regulatory network. We find that cooperatively bound BATF and IRF4 contribute to initial chromatin accessibility and, with STAT3, initiate a transcriptional program that is then globally tuned by the lineage-specifying TF RORγt, which plays a focal deterministic role at key loci. Integration of multiple data sets allowed inference of an accurate predictive model that we computationally and experimentally validated, identifying multiple new Th17 regulators, including Fosl2, a key determinant of cellular plasticity. This interconnected network can be used to investigate new therapeutic approaches to manipulate Th17 functions in the setting of inflammatory disease.