MutPred-LOF is a machine learning-based method and software package that integrates genetic and molecular data to reason probabilistically about the pathogenicity of frameshifting indels and stop gain variants. The mmodel provides both a general pathogenicity prediction, and a ranked list of specific molecular alterations potentially affecting phenotype. It is trained on a set of pathogenic and unlabeled (putatively neutral) variants obtained from the Human Gene Mutation Database (HGMD) [1], ClinVar [2], and ExAC [3]. The MutPredLOF model is a bagged ensemble of 100 feed-forward neural networks, each trained on a balanced subset of pathogenic and unlabeled variants.


MutPredLOF was developed by Kymberleigh Pagel at Indiana University Bloomington, and was a joint project of the Mooney group at the University of Washington and the Radivojac group at Indiana University. The Iakoucheva and Sebat groups at the University of California, San Diego provided additional validation and support. More information on the method and detailed instructions can be seen in the help page.


Citing MutPredLOF

Kymberleigh A. Pagel, Vikas Pejaver, Guan Ning Lin, Hyun-Jun Nam, Matthew Mort, David N. Cooper, Jonathan Sebat, Lilia M. Iakoucheva, Sean D. Mooney, Predrag Radivojac; When loss-of-function is loss of function: assessing mutational signatures and impact of loss-of-function genetic variants. Bioinformatics 2017; 33 (14): i389-i398. doi: 10.1093/bioinformatics/btx272 Link


ASD variants with MutPred-LOF predictions are available here


Supported by

This work is funded by:

  • NIH R01LM009722 (PI: Mooney)

  • NIH R01MH105524 (PI: Iakoucheva and Radivojac)

  • NIH R01MH104766 (PI: Iakoucheva)

  • NIH R01MH076431 (PI: Sebat)

  • Indiana University Precision Health Initiative (PI: Radivojac)


Training data:

Training data available here. The data does not include HGMD variants because they are under license and we cannot distribute them. Please contact HGMD directly for these variants (we used July 2016 version of the database).


The MutPred suite

Beyond MutPred, several other tools have been developed as part of the MutPred project. They are list ed below in chronological order of their development:

  • MutPred: A random forest model to predict the effect of single amino acid substitutions. Trained on a smaller training set with fewer features (and predicted molecular mechanisms). Proof-of-concept predecessor to MutPred2.

  • MutPred2: A bagged ensemble of 30 feed-forward neural networks that integrates genetic and molecular data to reason probabilistically about the pathogenicity of amino acid substitutions.

  • Functional regulatory SNP predictor: an ensemble of decision trees for the prediction of SNPs in regulatory regions that impact gene expression. A new and enhanced method, called RSVP has been developed, and uses an expanded feature set that includes information from ENCODE.

  • MutPred-Splice: a random forest-based approach to prioritize exonic variants (missense or samesense) which are likely to disrupt pre-mRNA splicing from whole-genome sequencing data sets.


References

1. Stenson PD, Mort M, Ball EV, Evans K, Hayden M, Heywood S, Hussain M, Phillips AD, Cooper DN. The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Hum Genet (2017)

2. Landrum MJ, Lee JM, Benson M, Brown G, Chao C, Chitipiralla S, Gu B, Hart J, Hoffman D, Hoover J, Jang W, Katz K, Ovetsky M, Riley G, Sethi A, Tully R, Villamarin-Salomon R, Rubinstein W, Maglott DR. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res (2015).

3. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O'Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB, Tukiainen T, Birnbaum DP, Kosmicki JA, Duncan LE, Estrada K, Zhao F, Zou J, Pierce-Hoffman E, Berghout J, Cooper DN, Deflaux N, DePristo M, Do R, Flannick J, Fromer M, Gauthier L, Goldstein J, Gupta N, Howrigan D, Kiezun A, Kurki MI, Moonshine AL, Natarajan P, Orozco L, Peloso GM, Poplin R, Rivas MA, Ruano-Rubio V, Rose SA, Ruderfer DM, Shakir K, Stenson PD, Stevens C, Thomas BP, Tiao G, Tusie-Luna MT, Weisburd B, Won HH, Yu D, Altshuler DM, Ardissino D, Boehnke M, Danesh J, Donnelly S, Elosua R, Florez JC, Gabriel SB, Getz G, Glatt SJ, Hultman CM, Kathiresan S, Laakso M, McCarroll S, McCarthy MI, McGovern D, McPherson R, Neale BM, Palotie A, Purcell SM, Saleheen D, Scharf JM, Sklar P, Sullivan PF, Tuomilehto J, Tsuang MT, Watkins HC, Wilson JG, Daly MJ, MacArthur DG, Exome Aggregation Consortium.. Analysis of protein-coding genetic variation in 60,706 humans. Nature (2016) 536(7616):285-291.