Past Discussions


04.29.15

EBSeq-HMM: A Bayesian approach for identifying gene-expression changes in ordered RNA-seq experiments

Ning Leng1,2, Yuan Li<sup)1, Brian E. Mcintosh2, Bao Kim Nguyen2, Bret Duffin2, Shulan Tian2, James A. Thomson2,3,4, Colin Dewey5, Ron Stewart2 and Christina Kendziorski5,*

– Author Affiliations

1Department of Statistics, University of Wisconsin, Madison, WI
2Regenerative Biology, Morgridge Institute for Research, Madison, WI
3Department of Cell and Regenerative Biology, University of Wisconsin School of Medicine and Public Health, Madison, WI
4Department of Molecular, Cellular, and Developmental Biology, University of California, Santa Barbara, CA
5Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI
*To whom correspondence should be addressed. Christina Kendziorski, E-mail: kendzior@biostat.wisc.edu

  • Received October 14, 2014.
  • Revision received February 23, 2015.
  • Accepted March 30, 2015.

Abstract

Motivation: With improvements in next-generation sequencing technologies and reductions in price, ordered RNA-seq experiments are becoming common. Of primary interest in these experiments is identifying genes that are changing over time or space, for example, and then characterizing the specific expression changes. A number of robust statistical methods are available to identify genes showing differential expression among multiple conditions, but most assume conditions are exchangeable and thereby sacrifice power and precision when applied to ordered data.

Results: We propose an empirical Bayes mixture modeling approach called EBSeq-HMM. In EBSeq-HMM, an auto-regressive hidden Markov model is implemented to accommodate dependence in gene expression across ordered conditions. As demonstrated in simulation and case studies, the output proves useful in identifying differentially expressed genes and in specifying gene-specific expression paths. EBSeq-HMM may also be used for inference regarding isoform expression.

Availability: An R package containing examples and sample datasets is available at Bioconductor.

Contact: kendzior@biostat.wisc.edu

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com


04.15.15

Integrative analysis of 111 reference human epigenomes

Roadmap Epigenomics Consortium, Anshul Kundaje, et al.

Abstract:
The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.


03.18.15

Statistics requantitates the central dogma

Jingyi Jessica Li, Department of Statistics and Department of Human Genetics, University of California, Los Angeles, CA 90095, USA.
Mark D. Biggin, Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA

Abstract

Mammalian proteins are expressed at ∼103 to 108 molecules per cell (1). Differences between cell types, between normal and disease states, and between individuals are largely defined by changes in the abundance of proteins, which are in turn determined by rates of transcription, messenger RNA (mRNA) degradation, translation, and protein degradation. If the rates for one of these steps differ much more than the rates of the other three, that step would be dominant in defining the variation in protein expression. Over the past decade, system-wide studies have claimed that in animals, differences in translation rates predominate (25). On page 1112 of this issue, Jovanovic et al. (6), as well as recent studies by Battle et al. (7) and Li et al. (1), challenge this conclusion, suggesting that transcriptional control makes the larger contribution.

(full article) 

Impact of regulatory variation from RNA to protein

Alexis Battle1,2
Zia Khan3
Sidney H. Wang3
Amy Mitrano3
Michael J. Ford4
Jonathan K. Pritchard1,2,5
Yoav Gilad3

1Department of Genetics, Stanford University, Stanford, CA 94305, USA.
2Howard Hughes Medical Institute, Stanford University, Stanford, CA 94305, USA.
3Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA.
4MS Bioworks, LLC, 3950 Varsity Drive, Ann Arbor, MI 48108, USA.
5Department of Biology, Stanford University, Stanford, CA 94305, USA.

Abstract

The phenotypic consequences of expression quantitative trait loci (eQTLs) are presumably due to their effects on protein expression levels. Yet the impact of genetic variation, including eQTLs, on protein levels remains poorly understood. To address this, we mapped genetic variants that are associated with eQTLs, ribosome occupancy (rQTLs), or protein abundance (pQTLs). We found that most QTLs are associated with transcript expression levels, with consequent effects on ribosome and protein levels. However, eQTLs tend to have significantly reduced effect sizes on protein levels, which suggests that their potential impact on downstream phenotypes is often attenuated or buffered. Additionally, we identified a class of cis QTLs that affect protein abundance with little or no effect on messenger RNA or ribosome levels, which suggests that they may arise from differences in posttranslational regulation.

(full article)


02.18.15

Defining an essential transcription factor program for naive pluripotency

S.-J. Dunn1,*, G. Martello2,*,†‡, B. Yordanov1,*, S. Emmott1, A. G. Smith2,3,†
1Computational Science Laboratory, Microsoft Research, Cambridge CB1 2FB, UK.
2Wellcome Trust–Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge CB2 1QR, UK.
3Department of Biochemistry, University of Cambridge, Cambridge, UK.
Department of Molecular Medicine, University of Padua, 35131 Padua, Italy.
†Corresponding author. E-mail: graziano.martello@unipd.it (G.M.); austin.smith@cscr.cam.ac.uk (A.G.S.)
* These authors contributed equally to this work.

Abstract

The gene regulatory circuitry through which pluripotent embryonic stem (ES) cells choose between self-renewal and differentiation appears vast and has yet to be distilled into an executive molecular program. We developed a data-constrained, computational approach to reduce complexity and to derive a set of functionally validated components and interaction combinations sufficient to explain observed ES cell behavior. This minimal set, the simplest version of which comprises only 16 interactions, 12 components, and three inputs, satisfies all prior specifications for self-renewal and furthermore predicts unknown and nonintuitive responses to compound genetic perturbations with an overall accuracy of 70%. We propose that propagation of ES cell identity is not determined by a vast interactome but rather can be explained by a relatively simple process of molecular computation.


02.04.15

Conditional density-based analysis of T cell signaling in single-cell data

Smita Krishnaswamy1, Matthew H. Spitzer2, Michael Mingueneau3, Sean C. Bendall2, Oren Litvin1,Erica Stone4, Dana Pe’er1,*,†, Garry P. Nolan2,†

Author Affiliations
1Department of Biological Sciences, Department of Systems Biology, Columbia University, New York, NY, USA.
2Baxter Laboratory in Stem Cell Biology, Department of Microbiology and Immunology, Stanford University, Stanford, CA, USA.
3Division of Immunology, Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA, USA.
4Molecular Biology Section, Division of Biological Sciences, Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA.
*Corresponding author. E-mail: dpeer@biology.columbia.edu
† These authors contributed equally to this work.

Abstract:
Cellular circuits sense the environment, process signals, and compute decisions using networks of interacting proteins. To model such a system, the abundance of each activated protein species can be described as a stochastic function of the abundance of other proteins. High-dimensional single-cell technologies, such as mass cytometry, offer an opportunity to characterize signaling circuit-wide. However, the challenge of developing and applying computational approaches to interpret such complex data remains. Here, we developed computational methods, based on established statistical concepts, to characterize signaling network relationships by quantifying the strengths of network edges and deriving signaling response functions. In comparing signaling between naïve and antigen-exposed CD4+ T lymphocytes, we find that although these two cell subtypes had similarly wired networks, naïve cells transmitted more information along a key signaling cascade than did antigen-exposed cells. We validated our characterization on mice lacking the extracellular-regulated mitogen-activated protein kinase (MAPK) ERK2, which showed stronger influence of pERK on pS6 (phosphorylated-ribosomal protein S6), in naïve cells as compared with antigen-exposed cells, as predicted. We demonstrate that by using cell-to-cell variation inherent in single-cell data, we can derive response functions underlying molecular circuits and drive the understanding of how cells process signals.


01.21.15

CellNet: Network Biology Applied to Stem Cell Engineering

Patrick Cahan, Hu Li, Samantha A. Morris, Edroaldo Lummertz da Rocha, George Q. Daley, James J. Collins5,
doi:10.1016/j.cell.2014.07.020

Refers To
Samantha A. Morris, Patrick Cahan, Hu Li, Anna M. Zhao, Adrianna K. San Roman, Ramesh A. Shivdasani, James J. Collins, George Q. Daley
Dissecting Engineered Cell Types and Enhancing Cell Fate Conversion via CellNet
Cell, Volume 158, Issue 4, 14 August 2014, Pages 889-902
PDF (4068 K) Supplementary content
Referred to by
Kee-Pyo Kim, Hans R. Schöler
CellNet—Where Your Cells Are Standing
Cell, Volume 158, Issue 4, 14 August 2014, Pages 699-701
PDF (596 K)
Samantha A. Morris, Patrick Cahan, Hu Li, Anna M. Zhao, Adrianna K. San Roman, Ramesh A. Shivdasani, James J. Collins, George Q. Daley
Dissecting Engineered Cell Types and Enhancing Cell Fate Conversion via CellNet
Cell, Volume 158, Issue 4, 14 August 2014, Pages 889-902
PDF (4068 K) Supplementary content

Summary
Somatic cell reprogramming, directed differentiation of pluripotent stem cells, and direct conversions between differentiated cell lineages represent powerful approaches to engineer cells for research and regenerative medicine. We have developed CellNet, a network biology platform that more accurately assesses the fidelity of cellular engineering than existing methodologies and generates hypotheses for improving cell derivations. Analyzing expression data from 56 published reports, we found that cells derived via directed differentiation more closely resemble their in vivo counterparts than products of direct conversion, as reflected by the establishment of target cell-type gene regulatory networks (GRNs). Furthermore, we discovered that directly converted cells fail to adequately silence expression programs of the starting population and that the establishment of unintended GRNs is common to virtually every cellular engineering paradigm. CellNet provides a platform for quantifying how closely engineered cell populations resemble their target cell type and a rational strategy to guide enhanced cellular engineering.

 

Dissecting Engineered Cell Types and Enhancing Cell Fate Conversion via CellNet


01.07.15

Conservation of trans-acting circuitry during mammalian regulatory evolution

 

Andrew B. Stergachis, Shane Neph, Richard Sandstrom, Eric Haugen, Alex P. Reynolds, Miaohua Zhang, Rachel Byron, Theresa Canfield, Sandra Stelhing-Sun, Kristen Lee, Robert E. Thurman, Shinny Vong, Daniel Bates, Fidencio Neri, Morgan Diegel, Erika Giste, Douglas Dunn, Jeff Vierstra, R. Scott Hansen, Audra K. Johnson, Peter J. Sabo, Matthew S. Wilken, Thomas A. Reh, Piper M. Treuting, Rajinder Kaul et al.

Nature 515, 365–370 (20 November 2014) doi:10.1038/nature13972
Received 21 February 2014 Accepted 15 October 2014 Published online 19 November 2014

Abstract

The basic body plan and major physiological axes have been highly conserved during mammalian evolution, yet only a small fraction of the human genome sequence appears to be subject to evolutionary constraint. To quantify cis- versus trans-acting contributions to mammalian regulatory evolution, we performed genomic DNase I footprinting of the mouse genome across 25 cell and tissue types, collectively defining ~8.6 million transcription factor (TF) occupancy sites at nucleotide resolution. Here we show that mouse TF footprints conjointly encode a regulatory lexicon that is ~95% similar with that derived from human TF footprints. However, only ~20% of mouse TF footprints have human orthologues. Despite substantial turnover of the cis-regulatory landscape, nearly half of all pairwise regulatory interactions connecting mouse TF genes have been maintained in orthologous human cell types through evolutionary innovation of TF recognition sequences. Furthermore, the higher-level organization of mouse TF-to-TF connections into cellular network architectures is nearly identical with human. Our results indicate that evolutionary selection on mammalian gene regulation is targeted chiefly at the level of trans-regulatory circuitry, enabling and potentiating cis-regulatory plasticity.


12.10.14

Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia

Yue Li, Minggao Liang, Zhaolei Zhang

Abstract

Gene expression is a combinatorial function of genetic/epigenetic factors such as copy number variation (CNV), DNA methylation (DM), transcription factors (TF) occupancy, and microRNA (miRNA) post-transcriptional regulation. At the maturity of microarray/sequencing technologies, large amounts of data measuring the genome-wide signals of those factors became available from Encyclopedia of DNA Elements (ENCODE) and The Cancer Genome Atlas (TCGA). However, there is a lack of an integrative model to take full advantage of these rich yet heterogeneous data. To this end, we developed RACER (Regression Analysis of Combined Expression Regulation), which fits the mRNA expression as response using as explanatory variables, the TF data from ENCODE, and CNV, DM, miRNA expression signals from TCGA. Briefly, RACER first infers the sample-specific regulatory activities by TFs and miRNAs, which are then used as inputs to infer specific TF/miRNA-gene interactions. Such a two-stage regression framework circumvents a common difficulty in integrating ENCODE data measured in generic cell-line with the sample-specific TCGA measurements. As a case study, we integrated Acute Myeloid Leukemia (AML) data from TCGA and the related TF binding data measured in K562 from ENCODE. As a proof-of-concept, we first verified our model formalism by 10-fold cross-validation on predicting gene expression. We next evaluated RACER on recovering known regulatory interactions, and demonstrated its superior statistical power over existing methods in detecting known miRNA/TF targets. Additionally, we developed a feature selection procedure, which identified 18 regulators, whose activities clustered consistently with cytogenetic risk groups. One of the selected regulators is miR-548p, whose inferred targets were significantly enriched for leukemia-related pathway, implicating its novel role in AML pathogenesis. Moreover, survival analysis using the inferred activities identified C-Fos as a potential AML prognostic marker. Together, we provided a novel framework that successfully integrated the TCGA and ENCODE data in revealing AML-specific regulatory program at global level.


11.26.14

Methods for time series analysis of RNA-seq data with application to human Th17 cell differentiation

  1. Tarmo Äijö1,*,
  2. Vincent Butty2,
  3. Zhi Chen3,
  4. Verna Salo3,
  5. Subhash Tripathi3,
  6. Christopher B. Burge2,
  7. Riitta Lahesmaa3 and
  8. Harri Lähdesmäki1,3,*

+Author Affiliations


  1. 1Department of Information and Computer Science, Aalto University, FI-00076 Aalto, Finland, 2Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA and 3Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520 Turku, Finland
  1. *To whom correspondence should be addressed

Abstract

Motivation: Gene expression profiling using RNA-seq is a powerful technique for screening RNA species’ landscapes and their dynamics in an unbiased way. While several advanced methods exist for differential expression analysis of RNA-seq data, proper tools to anal.yze RNA-seq time-course have not been proposed.

Results: In this study, we use RNA-seq to measure gene expression during the early human T helper 17 (Th17) cell differentiation and Tcell activation (Th0). To quantify Th17specific gene expression dynamics, we present a novel statistical methodology, DyNB, for analyzing time-course RNA-seq data. We use non-parametric Gaussian processes to model temporal correlation in gene expression and combine that with negative binomial likelihood for the count data. To account for experimentspecific biases in gene expression dynamics, such as differences in cell differentiation efficiencies, we propose a method to rescale the dynamics between replicated measurements. We develop an MCMC sampling method to make inference of differential expression dynamics between conditions. DyNB identifies several known and novel genes involved in Th17 differentiation. Analysis of differentiation efficiencies revealed consistent patterns in gene expression dynamics between different cultures. We use qRT-PCR to validate differential expression and differentiation efficiencies for selected genes. Comparison of the results with those obtained via traditional timepointwise analysis shows that time-course analysis together with time rescaling between cultures identifies differentially expressed genes which would not otherwise be detected.

Availability: An implementation of the proposed computational methods will be available at http://research.ics.aalto.fi/csb/software/

Contact: tarmo.aijo@aalto.fi or harri.lahdesmaki@aalto.fi

Supplementary information: Supplementary data are available atBioinformatics online.


11.12.14

Detection of active transcription factor binding sites with the combination of DNase hypersensitivity and histone modifications

  1. Eduardo G. Gusmao1,*,
  2. Christoph Dieterich2,
  3. Martin Zenke3,4 and
  4. Ivan G. Costa1,5,6,*

+Author Affiliations


  1. 1IZKF Computational Biology Research Group, Institute for Biomedical Engineering, RWTH Aachen University Medical School, 52074 Aachen, 2Computational RNA Biology Lab and Bioinformatics Core, Max Planck Institute for Biology of Ageing, 50931 Cologne, 3Department of Cell Biology, Institute for Biomedical Engineering, RWTH Aachen University Medical School, 52074, 4Helmholtz Institute for Biomedical Engineering, 52074, 5Aachen Institute for Advanced Study in Computational Engineering Science (AICES), RWTH Aachen University, 52062 Aachen, Germany and 6Center of Informatics, Federal University of Pernambuco, 50740560 Recife-PE, Brazil
  1. *To whom correspondence should be addressed
  • Received October 28, 2013.
  • Revision received June 27, 2014.
  • Accepted July 25, 2014.

Abstract

Motivation: The identification of active transcriptional regulatory elements is crucial to understand regulatory networks driving cellular processes such as cell development and the onset of diseases. It has recently been shown that chromatin structure information, such as DNase I hypersensitivity (DHS) or histone modifications, significantly improves cell-specific predictions of transcription factor binding sites. However, no method has so far successfully combined both DHS and histone modification data to perform active binding site prediction.

Results: We propose here a method based on hidden Markov models to integrate DHS and histone modifications occupancy for the detection of open chromatin regions and active binding sites. We have created a framework that includes treatment of genomic signals, model training and genome-wide application. In a comparative analysis, our method obtained a good trade-off between sensitivity versus specificity and superior area under the curve statistics than competing methods. Moreover, our technique does not require further training or sequence information to generate binding location predictions. Therefore, the method can be easily applied on new cell types and allow flexible downstream analysis such asde novo motif finding.

Availability and implementation: Our framework is available as part of the Regulatory Genomics Toolbox. The software information and all benchmarking data are available at http://costalab.org/wp/dh-hmm.

Contact: ivan.costa@rwth-aachen.de or eduardo.gusmao@rwth-aachen.de

Supplementary information: Supplementary data are available atBioinformatics online.