The web server can be used for small-scale data sets (at most 100 substitutions, with no input sequence of length > 35,000 residues). Currently, this web site provides MutPred2.0. It requires protein sequences in the FASTA format, a list of amino acid substitutions in the corresponding FASTA headers (separated by spaces only), and a valid email address. The protein ID cannot contain spaces, semi-colons or commas.
The standalone executable can be used for genome-scale data sets. In addition to the standard MutPred2 input format (see above), the MutPred2 software also supports the output file from ANNOVAR's coding_change.pl program. This enables the straightforward movement between VCF files and MutPred2. To install and run MutPred2, you will need about 50 GB of hard disk space and at least 4 GB RAM. Click on the link below to download.
Data files include a tab-delimited file containing the subset of MutPred2's training data that is freely shareable, binary data files that MutPred2 depends on, and binary files for the learned machine learning models for MutPred2 to generate features and make predictions on. MutPred2 code requires MATLAB release 2017b or earlier to work properly. Details about the structure and setup of code are provided in the README file.