Monthly Archives: February 2017


Mutation effects predicted from sequence co-variation

Our next meeting will be at 3:00 on February 24th, in room 4160 of the Discovery building. Our Selected paper is Mutation effects predicted from sequence co-variation.
The abstract is as follows.

Many high-throughput experimental technologies have been developed to assess the effects of large numbers of mutations (variation) on phenotypes. However, designing functional assays for these methods is challenging, and systematic testing of all combinations is impossible, so robust methods to predict the effects of genetic variation are needed. Most prediction methods exploit evolutionary sequence conservation but do not consider the interdependencies of residues or bases. We present EVmutation, an unsupervised statistical method for predicting the effects of mutations that explicitly captures residue dependencies between positions. We validate EVmutation by comparing its predictions with outcomes of high-throughput mutagenesis experiments and measurements of human disease mutations and show that it outperforms methods that do not account for epistasis. EVmutation can be used to assess the quantitative effects of mutations in genes of any organism. We provide pre-computed predictions for ~7,000 human proteins at http://evmutation.org/.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.


Affinity regression predicts the recognition code of nucleic acid–binding proteins

Our next meeting will be at 3:00 on February 10th, in room 4160 of the Discovery building. Our Selected paper is Affinity regression predicts the recognition code of nucleic acid–binding proteins.
The abstract is as follows.

Predicting the affinity profiles of nucleic acid–binding proteins directly from the protein sequence is a challenging problem. We present a statistical approach for learning the recognition code of a family of transcription factors or RNA-binding proteins (RBPs) from high-throughput binding data. Our method, called affinity regression, trains on protein binding microarray (PBM) or RNAcompete data to learn an interaction model between proteins and nucleic acids using only protein domain and probe sequences as inputs. When trained on mouse homeodomain PBM profiles, our model correctly identifies residues that confer DNA-binding specificity and accurately predicts binding motifs for an independent set of divergent homeodomains. Similarly, when trained on RNAcompete profiles for diverse RBPs, our model correctly predicts the binding affinities of held-out proteins and identifies key RNA-binding residues, despite the high level of sequence divergence across RBPs. We expect that the method will be broadly applicable to modeling and predicting paired macromolecular interactions in settings where high-throughput affinity data are available.

We welcome all who can join us for this discussion. Feel free to begin that discussion in the comments section below.