ppointer

08.31.15

This entry was posted in methodological network_inference on August 28, 2015 by ppointer

Deep Learning

In the last two months, a couple of groups have published papers applying deep learning to problems related to gene regulation: protein-nucleic acid binding specificity [1] and chromatin state [2]. We will be talking about these soon.

Before discussing these papers, we think it will be useful to give people some time to get familiar with the fundamentals of artificial neural networks and deep learning. So, this coming *Monday* at our new time of 12 noon, we’ll have a meeting to talk about deep learning and work through each other’s questions. Beforehand, please check out some of the following resources and bring questions (or expertise you’d like to share!).

At the meeting, we’ll walk through the topics in this Nature review: http://www.nature.com/nature/journal/v521/n7553/full/nature14539.html

More resources:

Lecture slides from Mark’s machine learning class: https://www.biostat.wisc.edu/~craven/cs760/lectures/ANNs-1.pdf, ANNs-2.pdf

Intro to neural networks from a programming perspective (just skimmed this one; looks like an interesting presentation): http://karpathy.github.io/neuralnets/

[1] DeepBind (Alipanahi et al, Nature Biotech 2015)
http://www.nature.com/nbt/journal/v33/n8/full/nbt.3300.html
[2] DeepSEA (Zhou & Troyanskaya, Nature Methods 2015)
http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.3547.html

08.19.15

This entry was posted in differentiation methodological network_inference on August 14, 2015 by ppointer

Wanderlust with special guest Monacle

“Single-Cell Trajectory Detection Uncovers Progression and Regulatory Coordination in Human B Cell Development”

Bendall et al, Cell 2014

Abstract

Tissue regeneration is an orchestrated progression of cells from an immature state to a mature one, conventionally represented as distinctive cell subsets. A continuum of transitional cell states exists between these discrete stages. We combine the depth of single-cell mass cytometry and an algorithm developed to leverage this continuum by aligning single cells of a given lineage onto a unified trajectory that accurately predicts the developmental path de novo. Applied to human B cell lymphopoiesis, the algorithm (termed Wanderlust) constructed trajectories spanning from hematopoietic stem cells through to naive B cells. This trajectory revealed nascent fractions of B cell progenitors and aligned them with developmentally cued regulatory signaling including IL-7/STAT5 and cellular events such as immunoglobulin rearrangement, highlighting checkpoints across which regulatory signals are rewired paralleling changes in cellular state. This study provides a comprehensive analysis of human B lymphopoiesis, laying a foundation to apply this approach to other tissues and “corrupted” developmental processes including cancer.

Monocle method

(Trapnell et al, Nature 2014)

Abstract

Defining the transcriptional dynamics of a temporal process such as cell differentiation is challenging owing to the high variability in gene expression between individual cells. Time-series gene expression analyses of bulk cells have difficulty distinguishing early and late phases of a transcriptional cascade or identifying rare subpopulations of cells, and single-cell proteomic methods rely on a priori knowledge of key distinguishing markers. Here we describe Monocle, an unsupervised algorithm that increases the temporal resolution of transcriptome dynamics using single-cell RNA-Seq data collected at multiple time points. Applied to the differentiation of primary human myoblasts, Monocle revealed switch-like changes in expression of key regulatory factors, sequential waves of gene regulation, and expression of regulators that were not known to act in differentiation. We validated some of these predicted regulators in a loss-of function screen. Monocle can in principle be used to recover single-cell gene expression kinetics from a wide array of cellular processes, including differentiation, proliferation and oncogenic transformation.

08.05.15

This entry was posted in Abstracts discussion and tagged RNA-seq Systems Biology on August 4, 2015 by ppointer

Computational and analytical challenges in single-cell transcriptomics

Oliver Stegle1, Sarah A. Teichmann1,2 and John C. Marioni1,2

1European Molecular Biology Laboratory European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. Correspondence to J.C.M. e-mail: marioni@ebi.ac.uk doi:10.1038/nrg3833 Published online 28 January 2015

Abstract

The development of high-throughput RNA sequencing (RNA-seq) at the single-cell level has already led to profound new discoveries in biology, ranging from the identification of novel cell types to the study of global patterns of stochastic gene expression. Alongside the technological breakthroughs that have facilitated the large-scale generation of single-cell transcriptomic data, it is important to consider the specific computational and analytical challenges that still have to be overcome. Although some tools for analysing RNA-seq data from bulk cell populations can be readily applied to single-cell RNA-seq data, many new computational strategies are required to fully exploit this data type and to enable a comprehensive yet detailed study of gene expression at the single-cell level.

07.22.15

This entry was posted in Abstracts discussion and tagged correlation proportionality Systems Biology WID on July 17, 2015 by ppointer

Proportionality: A Valid Alternative to Correlation for Relative Data

David Lovell
Queensland University of Technology, Brisbane, Australia
Vera Pawlowsky-Glahn
Dept. d’Informàtica, Matemàtica Aplicada i Estadística. U. de Girona, España
Juan José Egozcue
Dept. Applied Mathematics III, U. Politécnica de Catalunya, Barcelona, Spain
Samuel Marguerat
MRC Clinical Sciences Centre, Imperial College London, United Kingdom
Jürg Bähler
Research Department of Genetics, Evolution and Environment, University College London, United Kingdom

Abstract

In the life sciences, many measurement methods yield only the relative abundances of different components in a sample. With such relative—or compositional—data, differential expression needs careful interpretation, and correlation—a statistical workhorse for analyzing pairwise relationships—is an inappropriate measure of association. Using yeast gene expression data we show how correlation can be misleading and present proportionality as a valid alternative for relative data. We show how the strength of proportionality between two variables can be meaningfully and interpretably described by a new statistic ϕ which can be used instead of correlation as the basis of familiar analyses and visualisation methods, including co-expression networks and clustered heatmaps. While the main aim of this study is to present proportionality as a means to analyse relative data, it also raises intriguing questions about the molecular mechanisms underlying the proportional regulation of a range of yeast genes.

06.10.15

This entry was posted in Abstracts and tagged GTEx Systems Biology transcriptome on June 8, 2015 by ppointer

The human transcriptome across tissues and individuals

Mele, Ferreira, & Reverter et al, 2015

Abstract
Transcriptional regulation and posttranscriptional processing underlie many cellular and organismal phenotypes. We used RNA sequence data generated by Genotype-Tissue Expression (GTEx) project to investigate the patterns of transcriptome variation across individuals and tissues. Tissues exhibit characteristic transcriptional signatures that show stability in postmortem samples. These signatures are dominated by a relatively small number of genes—which is most clearly seen in blood—though few are exclusive to a particular tissue and vary more across tissues than individuals. Genes exhibiting high interindividual expression variation include disease candidates associated with sex, ethnicity, and age. Primary transcription is the major driver of cellular specificity, with splicing playing mostly a complementary role; except for the brain, which exhibits a more divergent splicing program. Variation in splicing, despite its stochasticity, may play in contrast a comparatively greater role in defining individual phenotypes.

The human transcriptome across tissues and individuals

05.13.15

This entry was posted in Abstracts and tagged genomic profiling iCluster Systems Biology on May 11, 2015 by ppointer

Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis

Ronglai Shen¹,*, Adam B. Olshen² and Marc Ladanyi³

Author Affiliations
¹Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY,
²Department of Epidemiology and Biostatistics and Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, CA
³Department of Pathology and Human Oncology and Pathogenesis Program, Memorial Sloan-Kettering Cancer Center, New York, NY, USA

*To whom correspondence should be addressed.

Received June 22, 2009.
Revision received August 25, 2009.
Accepted September 9, 2009.

Abstract
Motivation: The molecular complexity of a tumor manifests itself at the genomic, epigenomic, transcriptomic and proteomic levels. Genomic profiling at these multiple levels should allow an integrated characterization of tumor etiology. However, there is a shortage of effective statistical and bioinformatic tools for truly integrative data analysis. The standard approach to integrative clustering is separate clustering followed by manual integration. A more statistically powerful approach would incorporate all data types simultaneously and generate a single integrated cluster assignment.
Methods: We developed a joint latent variable model for integrative clustering. We call the resulting methodology iCluster. iCluster incorporates flexible modeling of the associations between different data types and the variance–covariance structure within data types in a single framework, while simultaneously reducing the dimensionality of the datasets. Likelihood-based inference is obtained through the Expectation–Maximization algorithm.
Results: We demonstrate the iCluster algorithm using two examples of joint analysis of copy number and gene expression data, one from breast cancer and one from lung cancer. In both cases, we identified subtypes characterized by concordant DNA copy number changes and gene expression as well as unique profiles specific to one or the other in a completely automated fashion. In addition, the algorithm discovers potentially novel subtypes by combining weak yet consistent alteration patterns across data types.
Availability: R code to implement iCluster can be downloaded at http://www.mskcc.org/mskcc/html/85130.cfm
Contact: shenr@mskcc.org
Supplementary information: Supplementary data are available at Bioinformatics online.

04.29.15

This entry was posted in differential_expression methodological and tagged differential_expression on April 24, 2015 by ppointer

EBSeq-HMM: A Bayesian approach for identifying gene-expression changes in ordered RNA-seq experiments

Ning Leng^1,2, Yuan Li<sup)1, Brian E. Mcintosh², Bao Kim Nguyen², Bret Duffin², Shulan Tian², James A. Thomson^2,3,4, Colin Dewey⁵, Ron Stewart² and Christina Kendziorski⁵,*

– Author Affiliations

¹Department of Statistics, University of Wisconsin, Madison, WI
²Regenerative Biology, Morgridge Institute for Research, Madison, WI
³Department of Cell and Regenerative Biology, University of Wisconsin School of Medicine and Public Health, Madison, WI
⁴Department of Molecular, Cellular, and Developmental Biology, University of California, Santa Barbara, CA
⁵Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI
*To whom correspondence should be addressed. Christina Kendziorski, E-mail: kendzior@biostat.wisc.edu

Received October 14, 2014.
Revision received February 23, 2015.
Accepted March 30, 2015.

Abstract

Motivation: With improvements in next-generation sequencing technologies and reductions in price, ordered RNA-seq experiments are becoming common. Of primary interest in these experiments is identifying genes that are changing over time or space, for example, and then characterizing the specific expression changes. A number of robust statistical methods are available to identify genes showing differential expression among multiple conditions, but most assume conditions are exchangeable and thereby sacrifice power and precision when applied to ordered data.

Results: We propose an empirical Bayes mixture modeling approach called EBSeq-HMM. In EBSeq-HMM, an auto-regressive hidden Markov model is implemented to accommodate dependence in gene expression across ordered conditions. As demonstrated in simulation and case studies, the output proves useful in identifying differentially expressed genes and in specifying gene-specific expression paths. EBSeq-HMM may also be used for inference regarding isoform expression.

Availability: An R package containing examples and sample datasets is available at Bioconductor.

Contact: kendzior@biostat.wisc.edu

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

04.15.15

This entry was posted in Abstracts and tagged epigenomes Kundaje Roadmap Epigenomics consortium Systems Biology on April 8, 2015 by ppointer

Integrative analysis of 111 reference human epigenomes

Roadmap Epigenomics Consortium, Anshul Kundaje, et al.

Abstract:
The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.

03.18.15

This entry was posted in Abstracts and tagged eQTLs mRNA Systems Biology on March 13, 2015 by ppointer

Statistics requantitates the central dogma

Jingyi Jessica Li, Department of Statistics and Department of Human Genetics, University of California, Los Angeles, CA 90095, USA.
Mark D. Biggin, Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA

Abstract

Mammalian proteins are expressed at ∼10³ to 10⁸ molecules per cell (1). Differences between cell types, between normal and disease states, and between individuals are largely defined by changes in the abundance of proteins, which are in turn determined by rates of transcription, messenger RNA (mRNA) degradation, translation, and protein degradation. If the rates for one of these steps differ much more than the rates of the other three, that step would be dominant in defining the variation in protein expression. Over the past decade, system-wide studies have claimed that in animals, differences in translation rates predominate (2–5). On page 1112 of this issue, Jovanovic et al. (6), as well as recent studies by Battle et al. (7) and Li et al. (1), challenge this conclusion, suggesting that transcriptional control makes the larger contribution.

(full article)

Impact of regulatory variation from RNA to protein

Alexis Battle^1,2
Zia Khan³
Sidney H. Wang³
Amy Mitrano³
Michael J. Ford⁴
Jonathan K. Pritchard^1,2,5
Yoav Gilad³

¹Department of Genetics, Stanford University, Stanford, CA 94305, USA.
²Howard Hughes Medical Institute, Stanford University, Stanford, CA 94305, USA.
³Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA.
⁴MS Bioworks, LLC, 3950 Varsity Drive, Ann Arbor, MI 48108, USA.
⁵Department of Biology, Stanford University, Stanford, CA 94305, USA.

Abstract

The phenotypic consequences of expression quantitative trait loci (eQTLs) are presumably due to their effects on protein expression levels. Yet the impact of genetic variation, including eQTLs, on protein levels remains poorly understood. To address this, we mapped genetic variants that are associated with eQTLs, ribosome occupancy (rQTLs), or protein abundance (pQTLs). We found that most QTLs are associated with transcript expression levels, with consequent effects on ribosome and protein levels. However, eQTLs tend to have significantly reduced effect sizes on protein levels, which suggests that their potential impact on downstream phenotypes is often attenuated or buffered. Additionally, we identified a class of cis QTLs that affect protein abundance with little or no effect on messenger RNA or ribosome levels, which suggests that they may arise from differences in posttranslational regulation.

(full article)

02.18.15

This entry was posted in Abstracts and tagged pluripotency SysBio Systems Biology transcription factor on February 9, 2015 by ppointer

Defining an essential transcription factor program for naive pluripotency

S.-J. Dunn1,*, G. Martello2,*,†‡, B. Yordanov1,*, S. Emmott1, A. G. Smith2,3,†
1Computational Science Laboratory, Microsoft Research, Cambridge CB1 2FB, UK.
2Wellcome Trust–Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge CB2 1QR, UK.
3Department of Biochemistry, University of Cambridge, Cambridge, UK.
Department of Molecular Medicine, University of Padua, 35131 Padua, Italy.
†Corresponding author. E-mail: graziano.martello@unipd.it (G.M.); austin.smith@cscr.cam.ac.uk (A.G.S.)
* These authors contributed equally to this work.

Abstract

The gene regulatory circuitry through which pluripotent embryonic stem (ES) cells choose between self-renewal and differentiation appears vast and has yet to be distilled into an executive molecular program. We developed a data-constrained, computational approach to reduce complexity and to derive a set of functionally validated components and interaction combinations sufficient to explain observed ES cell behavior. This minimal set, the simplest version of which comprises only 16 interactions, 12 components, and three inputs, satisfies all prior specifications for self-renewal and furthermore predicts unknown and nonintuitive responses to compound genetic perturbations with an overall accuracy of 70%. We propose that propagation of ES cell identity is not determined by a vast interactome but rather can be explained by a relatively simple process of molecular computation.