February 2018 – SysBio Journal Club

deepNF: Deep network fusion for protein function prediction

This entry was posted in deep_learning protein_function on February 22, 2018 by szhang256

Our next meeting will be at 2pm on Feb 26th, in room 4160 of the Discovery building. Our Selected paper is deepNF: Deep network fusion for protein function prediction.
.
The abstract is as follows.

The prevalence of high-throughput experimental methods has resulted in an abundance of large-scale molecular and functional interaction networks. The connectivity of these networks provide a rich source of information for inferring functional annotations for genes and proteins. An important challenge has been to develop methods for combining these heterogeneous networks to extract useful protein feature representations for function prediction. Most of the existing approaches for network integration use shallow models that cannot capture complex and highly-nonlinear network structures. Thus, we propose deepNF, a network fusion method based on Multimodal Deep Autoencoders to extract high-level features of proteins from multiple heterogeneous interaction networks. We apply this method to combine STRING networks to construct a common low-dimensional representation containing high-level protein features. We use separate layers for different network types in the early stages of the multimodal autoencoder, later connecting all the layers into a single bottleneck layer from which we extract features to predict protein function. We compare the cross-validation and temporal holdout predictive performance of our method with state-of-the-art methods, including the recently proposed method Mashup. Our results show that our method outperforms previous methods for both human and yeast STRING networks. We also show substantial improvement in the performance of our method in predicting GO terms of varying type and specificity.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.

Clustering gene expression time series data using an infinite Gaussian process mixture model

This entry was posted in clustering Gene regulation Time Series on February 7, 2018 by szhang256

Our next meeting will be at 2pm on Feb 12th, in room 4160 of the Discovery building. Our Selected paper is Clustering gene expression time series data using an infinite Gaussian process mixture model.
.
The abstract is as follows.

Transcriptome-wide time series expression profiling is used to characterize the cellular response to environmental perturbations. The first step to analyzing transcriptional response data is often to cluster genes with similar responses. Here, we present a nonparametric model-based method, Dirichlet process Gaussian process mixture model (DPGP), which jointly models data clusters with a Dirichlet process and temporal dependencies with Gaussian processes. We demonstrate the accuracy of DPGP in comparison to state-of-the-art approaches using hundreds of simulated data sets. To further test our method, we apply DPGP to published microarray data from a microbial model organism exposed to stress and to novel RNA-seq data from a human cell line exposed to the glucocorticoid dexamethasone. We validate our clusters by examining local transcription factor binding and histone modifications. Our results demonstrate that jointly modeling cluster number and temporal dependencies can reveal shared regulatory mechanisms. DPGP software is freely available online at https://github.com/PrincetonUniversity/DP_GP_cluster.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.