szhang256


Network enhancement as a general method to denoise weighted biological networks

Our next meeting will be at 1pm on Dec 10th, in room 4160 of the Discovery building. Our Selected paper is Network enhancement as a general method to denoise weighted biological networks.
The abstract is as follows.

Networks are ubiquitous in biology where they encode connectivity patterns at all scales of organization, from molecular to the biome. However, biological networks are noisy due to the limitations of measurement technology and inherent natural variation, which can hamper discovery of network patterns and dynamics. We propose Network Enhancement (NE), a method for improving the signal-to-noise ratio of undirected, weighted networks. NE uses a doubly stochastic matrix operator that induces sparsity and provides a closed-form solution that increases spectral eigengap of the input network. As a result, NE removes weak edges, enhances real connections, and leads to better downstream performance. Experiments show that NE improves gene–function prediction by denoising tissue-specific interaction networks, alleviates interpretation of noisy Hi-C contact maps from the human genome, and boosts fine-grained identification accuracy of species. Our results indicate that NE is widely applicable for denoising biological networks.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


RNA velocity of single cells

Our next meeting will be at 1pm on Nov 12th, in room 4160 of the Discovery building. Our Selected paper is RNA velocity of single cells.
The abstract is as follows.

RNA abundance is a powerful indicator of the state of individual cells. Single-cell RNA sequencing can reveal RNA abundance with high quantitative accuracy, sensitivity and throughput. However, this approach captures only a static snapshot at a point in time, posing a challenge for the analysis of time-resolved phenomena such as embryogenesis or tissue regeneration. Here we show that RNA velocity—the time derivative of the gene expression state—can be directly estimated by distinguishing between unspliced and spliced mRNAs in common single-cell RNA sequencing protocols. RNA velocity is a high-dimensional vector that predicts the future state of individual cells on a timescale of hours. We validate its accuracy in the neural crest lineage, demonstrate its use on multiple published datasets and technical platforms, reveal the branching lineage tree of the developing mouse hippocampus, and examine the kinetics of transcription in human embryonic brain. We expect RNA velocity to greatly aid the analysis of developmental lineages and cellular dynamics, particularly in humans.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


MATCHER: manifold alignment reveals correspondence between single cell transcriptome and epigenome dynamics

Our next meeting will be at 1pm on Oct 29th, in room 4160 of the Discovery building. Our Selected paper is MATCHER: manifold alignment reveals correspondence between single cell transcriptome and epigenome dynamics.
The abstract is as follows.

Single cell experimental techniques reveal transcriptomic and epigenetic heterogeneity among cells, but how these are related is unclear. We present MATCHER, an approach for integrating multiple types of single cell measurements. MATCHER uses manifold alignment to infer single cell multi-omic profiles from transcriptomic and epigenetic measurements performed on different cells of the same type. Using scM&T-seq and sc-GEM data, we confirm that MATCHER accurately predicts true single cell correlations between DNA methylation and gene expression without using known cell correspondences. MATCHER also reveals new insights into the dynamic interplay between the transcriptome and epigenome in single embryonic stem cells and induced pluripotent stem cells.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Kipoi: accelerating the community exchange and reuse of predictive models for genomics

Our next meeting will be at 1pm on Oct 1st, in room 4130 of the Discovery building. Our Selected paper is Kipoi: accelerating the community exchange and reuse of predictive models for genomics.
The abstract is as follows.

Advanced machine learning models applied to large-scale genomics datasets hold the promise to be major drivers for genome science. Once trained, such models can serve as a tool to probe the relationships between data modalities, including the effect of genetic variants on phenotype. However, lack of standardization and limited accessibility of trained models have hampered their impact in practice. To address this, we present Kipoi, a collaborative initiative to define standards and to foster reuse of trained models in genomics. Already, the Kipoi repository contains over 2,000 trained models that cover canonical prediction tasks in transcriptional and post-transcriptional gene regulation. The Kipoi model standard grants automated software installation and provides unified interfaces to apply and interpret models. We illustrate Kipoi through canonical use cases, including model benchmarking, transfer learning, variant effect prediction, and building new models from existing ones. By providing a unified framework to archive, share, access, use, and build on models developed by the community, Kipoi will foster the dissemination and use of machine learning models in genomics.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Alignment of single-cell trajectories to compare cellular expression dynamics

Our next meeting will be at 2pm on April 23rd, in room 4160 of the Discovery building. Our Selected paper is Alignment of single-cell trajectories to compare cellular expression dynamics.
The abstract is as follows.

Single-cell RNA sequencing and high-dimensional cytometry can be used to generate detailed trajectories of dynamic biological processes such as differentiation or development. Here we present cellAlign, a quantitative framework for comparing expression dynamics within and between single-cell trajectories. By applying cellAlign to mouse and human embryonic developmental trajectories, we systematically delineate differences in the temporal regulation of gene expression programs that would otherwise be masked.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Using deep learning to model the hierarchical structure and function of a cell

Our next meeting will be at 2pm on April 9th, in room 4160 of the Discovery building. Our Selected paper is Using deep learning to model the hierarchical structure and function of a cell.
The abstract is as follows.

Although artificial neural networks are powerful classifiers, their internal structures are hard to interpret. In the life sciences, extensive knowledge of cell biology provides an opportunity to design visible neural networks (VNNs) that couple the model’s inner workings to those of real systems. Here we develop DCell, a VNN embedded in the hierarchical structure of 2,526 subsystems comprising a eukaryotic cell (http://d-cell.ucsd.edu/). Trained on several million genotypes, DCell simulates cellular growth nearly as accurately as laboratory observations. During simulation, genotypes induce patterns of subsystem activities, enabling in silico investigations of the molecular mechanisms underlying genotype–phenotype associations. These mechanisms can be validated, and many are unexpected; some are governed by Boolean logic. Cumulatively, 80% of the importance for growth prediction is captured by 484 subsystems (21%), reflecting the emergence of a complex phenotype. DCell provides a foundation for decoding the genetics of disease, drug resistance and synthetic life.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Virtual ChIP-seq: predicting transcription factor binding by learning from the transcriptome

Our next meeting will be at 2pm on Mar 12th, in room 4160 of the Discovery building. Our Selected paper is Virtual ChIP-seq: predicting transcription factor binding by learning from the transcriptome.
.
The abstract is as follows.

Motivation: Identifying transcription factor binding sites is the first step in pinpointing non-coding mutations that disrupt the regulatory function of transcription factors and promote disease. ChIP-seq is the most common method for identifying binding sites, but performing it on patient samples is hampered by the amount of available biological material and the cost of the experiment. Existing methods for computational prediction of regulatory elements primarily predict binding in genomic regions with sequence similarity to known transcription factor sequence preferences. This has limited efficacy since most binding sites do not resemble known transcription factor sequence motifs, and many transcription factors are not even sequence-specific.

Results: We developed Virtual ChIP-seq, which predicts binding of individual transcription factors in new cell types using an artificial neural network that integrates ChIP-seq results from other cell types and chromatin accessibility data in the new cell type. Virtual ChIP-seq also uses learned associations between gene expression and transcription factor binding at specific genomic regions. This approach outperforms methods that use transcription factor sequence preferences in the form of position weight matrices, predicting binding for transcription factors (accuracy > 0.99; Matthews correlation coefficient > 0.3). In at least one validation cell type, performance of Virtual ChIP-seq is higher than all participants of the DREAM Challenge for in vivo transcription factor binding site prediction in 4 of 9 transcription factors that we could compare to.

 

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


deepNF: Deep network fusion for protein function prediction

Our next meeting will be at 2pm on Feb 26th, in room 4160 of the Discovery building. Our Selected paper is deepNF: Deep network fusion for protein function prediction.
.
The abstract is as follows.

The prevalence of high-throughput experimental methods has resulted in an abundance of large-scale molecular and functional interaction networks. The connectivity of these networks provide a rich source of information for inferring functional annotations for genes and proteins. An important challenge has been to develop methods for combining these heterogeneous networks to extract useful protein feature representations for function prediction. Most of the existing approaches for network integration use shallow models that cannot capture complex and highly-nonlinear network structures. Thus, we propose deepNF, a network fusion method based on Multimodal Deep Autoencoders to extract high-level features of proteins from multiple heterogeneous interaction networks. We apply this method to combine STRING networks to construct a common low-dimensional representation containing high-level protein features. We use separate layers for different network types in the early stages of the multimodal autoencoder, later connecting all the layers into a single bottleneck layer from which we extract features to predict protein function. We compare the cross-validation and temporal holdout predictive performance of our method with state-of-the-art methods, including the recently proposed method Mashup. Our results show that our method outperforms previous methods for both human and yeast STRING networks. We also show substantial improvement in the performance of our method in predicting GO terms of varying type and specificity.

 

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Clustering gene expression time series data using an infinite Gaussian process mixture model

Our next meeting will be at 2pm on Feb 12th, in room 4160 of the Discovery building. Our Selected paper is Clustering gene expression time series data using an infinite Gaussian process mixture model.
.
The abstract is as follows.

Transcriptome-wide time series expression profiling is used to characterize the cellular response to environmental perturbations. The first step to analyzing transcriptional response data is often to cluster genes with similar responses. Here, we present a nonparametric model-based method, Dirichlet process Gaussian process mixture model (DPGP), which jointly models data clusters with a Dirichlet process and temporal dependencies with Gaussian processes. We demonstrate the accuracy of DPGP in comparison to state-of-the-art approaches using hundreds of simulated data sets. To further test our method, we apply DPGP to published microarray data from a microbial model organism exposed to stress and to novel RNA-seq data from a human cell line exposed to the glucocorticoid dexamethasone. We validate our clusters by examining local transcription factor binding and histone modifications. Our results demonstrate that jointly modeling cluster number and temporal dependencies can reveal shared regulatory mechanisms. DPGP software is freely available online at https://github.com/PrincetonUniversity/DP_GP_cluster.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


NetSig: network-based discovery from cancer genomes

Our next meeting will be at 2pm on Jan 29th, in room 4160 of the Discovery building. Our Selected paper is NetSig: network-based discovery from cancer genomes.
The abstract is as follows.

Methods that integrate molecular network information and tumor genome data could complement gene-based statistical tests to identify likely new cancer genes; but such approaches are challenging to validate at scale, and their predictive value remains unclear. We developed a robust statistic (NetSig) that integrates protein interaction networks with data from 4,742 tumor exomes. NetSig can accurately classify known driver genes in 60% of tested tumor types and predicts 62 new driver candidates. Using a quantitative experimental framework to determine in vivo tumorigenic potential in mice, we found that NetSig candidates induce tumors at rates that are comparable to those of known oncogenes and are ten-fold higher than those of random genes. By reanalyzing nine tumor-inducing NetSig candidates in 242 patients with oncogene-negative lung adenocarcinomas, we find that two (AKT2 and TFDP2) are significantly amplified. Our study presents a scalable integrated computational and experimental workflow to expand discovery from cancer genomes.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.