Web server

Installation

Command-line

Output

The MutPred-Indel web server

The input data should be in a modified FASTA format with each variant represented by a set of two ordered sequences: the unmodified wildtype protein sequence and then the mutant protein sequence. The sequence ID does not need to conform to any particular format. An example is provided below. The web server allows for predictions on 100 variants (the number of sequences does not matter). Every protein sequence must be of length >30 and <30,000 residues. Computation time is proportional to the length of a sequence and the number of variants.

>NP_057295|SEC31A Y90del wildtype
MKLKEVDRTAMQAWSPAQNHPIYLATGTSAQQLDATFSTNASLEIFELDLSDPSLDMKSCATFSSSHRYHKLIWGPYKMDSKGDVSGVLIAGGENGNII
LYDPSKIIAGDKEVVIAQNDKHTGPVRALDVNIFQTNLVASGANESEIYIWDLNNFATPMTPGAKTQPPEDISCIAWNRQVQHILASASPSGRATVWDL
RKNEPIIKVSDHSNRMHCSGLAWHPDVATQMVLASEDDRLPVIQMWDLRFASSPLRVLENHARGILAIAWSMADPELLLSCGKDAKILCSNPNTGEVLY
ELPTNTQWCFDIQWCPRNPAVLSAASFDGRISVYSIMGGSTDGLRQKQVDKLSSSFGNLDPFGTGQPLPPLQIPQQTAQHSIVLPLKKPPKWIRRPVGA
SFSFGGKLVTFENVRMPSHQGAEQQQQQHHVFISQVVTEKEFLSRSDQLQQAVQSQGFINYCQKKIDASQTEFEKNVWSFLKVNFEDDSRGKYLELLGY
RKEDLGKKHIKEEKEESEFLPSSGGTFNISVSGDIDGLITQALLTGNFESAVDLCLHDNRMADAIILAIAGGQELLARTQKKYFAKSQSKITRLITAVV
MKNWKEIVESCDLKNWREALAAVLTYAKPDEFSALCDLLGTRLENEGDSLLQTQACLCYICAGNVEKLVACWTKAQDGSHPLSLQDLIEKVVILRKAVQ
LTQAMDTSTVGVLLAAKMSQYANLLAAQGSIAAALAFLPDNTNQPNIMQLRDRLCRAQGEPVAGHESPKIPYEKQQLPKGRPGPVAGHHQMPRVQTQQY
YPHGENPPPPGFIMHGNVNPNAAGQLPTSPGHMHTQVPPYPQPQPYQPAQPYPFGTGGSAMYRPQQPVAPPTSNAYPNTPYISSASSYTGQSQLYAAQH
QASSPTSSPATSFPPPPSSGASFQHGGPGAPPSSSAYALPPGTTGTLPAASELPASQRTGPQNGWNDPPALNRVPKKKKMPENFMPPVPITSPIMNPLG
DPQSQMLQQQPSAPVPLSSQSSFPQPHLPGGQPFHGVQQPLGQTGMPPSFSKPNIEGAPGAPIGNTFQHVQSLPTKKITKKPIPDEHLILKTTFEDLIQ
RCLSSATDPQTKRKLDDASKRLEFLYDKLREQTLSPTITSGLHNIARSIETRNYSEGLTMHTHIVSTSNFSETSAFMPVLKVVLTQANKLGV
>NP_057295|SEC31A Y90del mutant
MKLKEVDRTAMQAWSPAQNHPIYLATGTSAQQLDATFSTNASLEIFELDLSDPSLDMKSCATFSSSHRHKLIWGPYKMDSKGDVSGVLIAGGENGNII
LYDPSKIIAGDKEVVIAQNDKHTGPVRALDVNIFQTNLVASGANESEIYIWDLNNFATPMTPGAKTQPPEDISCIAWNRQVQHILASASPSGRATVWDL
RKNEPIIKVSDHSNRMHCSGLAWHPDVATQMVLASEDDRLPVIQMWDLRFASSPLRVLENHARGILAIAWSMADPELLLSCGKDAKILCSNPNTGEVLY
ELPTNTQWCFDIQWCPRNPAVLSAASFDGRISVYSIMGGSTDGLRQKQVDKLSSSFGNLDPFGTGQPLPPLQIPQQTAQHSIVLPLKKPPKWIRRPVGA
SFSFGGKLVTFENVRMPSHQGAEQQQQQHHVFISQVVTEKEFLSRSDQLQQAVQSQGFINYCQKKIDASQTEFEKNVWSFLKVNFEDDSRGKYLELLGY
RKEDLGKKHIKEEKEESEFLPSSGGTFNISVSGDIDGLITQALLTGNFESAVDLCLHDNRMADAIILAIAGGQELLARTQKKYFAKSQSKITRLITAVV
MKNWKEIVESCDLKNWREALAAVLTYAKPDEFSALCDLLGTRLENEGDSLLQTQACLCYICAGNVEKLVACWTKAQDGSHPLSLQDLIEKVVILRKAVQ
LTQAMDTSTVGVLLAAKMSQYANLLAAQGSIAAALAFLPDNTNQPNIMQLRDRLCRAQGEPVAGHESPKIPYEKQQLPKGRPGPVAGHHQMPRVQTQQY
YPHGENPPPPGFIMHGNVNPNAAGQLPTSPGHMHTQVPPYPQPQPYQPAQPYPFGTGGSAMYRPQQPVAPPTSNAYPNTPYISSASSYTGQSQLYAAQH
QASSPTSSPATSFPPPPSSGASFQHGGPGAPPSSSAYALPPGTTGTLPAASELPASQRTGPQNGWNDPPALNRVPKKKKMPENFMPPVPITSPIMNPLG
DPQSQMLQQQPSAPVPLSSQSSFPQPHLPGGQPFHGVQQPLGQTGMPPSFSKPNIEGAPGAPIGNTFQHVQSLPTKKITKKPIPDEHLILKTTFEDLIQ
RCLSSATDPQTKRKLDDASKRLEFLYDKLREQTLSPTITSGLHNIARSIETRNYSEGLTMHTHIVSTSNFSETSAFMPVLKVVLTQANKLGV
>NP_006588|HSPA8 T274insHQ wildtype
MSKGPAVGIDLGTTYSCVGVFQHGKVEIIANDQGNRTTPSYVAFTDTERLIGDAAKNQVAMNPTNTVFDAKRLIGRRFDDAVVQSDMKHWPFMVVNDAG
RPKVQVEYKGETKSFYPEEVSSMVLTKMKEIAEAYLGKTVTNAVVTVPAYFNDSQRQATKDAGTIAGLNVLRIINEPTAAAIAYGLDKKVGAERNVLIF
DLGGGTFDVSILTIEDGIFEVKSTAGDTHLGGEDFDNRMVNHFIAEFKRKHKKDISENKRAVRRLRTACERAKRTLSSSTQASIEIDSLYEGIDFYTSI
TRARFEELNADLFRGTLDPVEKALRDAKLDKSQIHDIVLVGGSTRIPKIQKLLQDFFNGKELNKSINPDEAVAYGAAVQAAILSGDKSENVQDLLLLDV
TPLSLGIETAGGVMTVLIKRNTTIPTKQTQTFTTYSDNQPGVLIQVYEGERAMTKDNNLLGKFELTGIPPAPRGVPQIEVTFDIDANGILNVSAVDKST
GKENKITITNDKGRLSKEDIERMVQEAEKYKAEDEKQRDKVSSKNSLESYAFNMKATVEDEKLQGKINDEDKQKILDKCNEIINWLDKNQTAEKEEFEH
QQKELEKVCNPIITKLYQSAGGMPGGMPGGFPGGGAPPSGGASSGPTIEEVD
>NP_006588|HSPA8 T274insHQ mutant
MSKGPAVGIDLGTTYSCVGVFQHGKVEIIANDQGNRTTPSYVAFTDTERLIGDAAKNQVAMNPTNTVFDAKRLIGRRFDDAVVQSDMKHWPFMVVNDAG
RPKVQVEYKGETKSFYPEEVSSMVLTKMKEIAEAYLGKTVTNAVVTVPAYFNDSQRQATKDAGTIAGLNVLRIINEPTAAAIAYGLDKKVGAERNVLIF
DLGGGTFDVSILTIEDGIFEVKSTAGDTHLGGEDFDNRMVNHFIAEFKRKHKKDISENKRAVRRLRTACERAKRTHQLSSSTQASIEIDSLYEGIDFYT
SITRARFEELNADLFRGTLDPVEKALRDAKLDKSQIHDIVLVGGSTRIPKIQKLLQDFFNGKELNKSINPDEAVAYGAAVQAAILSGDKSENVQDLLLL
DVTPLSLGIETAGGVMTVLIKRNTTIPTKQTQTFTTYSDNQPGVLIQVYEGERAMTKDNNLLGKFELTGIPPAPRGVPQIEVTFDIDANGILNVSAVDK
STGKENKITITNDKGRLSKEDIERMVQEAEKYKAEDEKQRDKVSSKNSLESYAFNMKATVEDEKLQGKINDEDKQKILDKCNEIINWLDKNQTAEKEEF
EHQQKELEKVCNPIITKLYQSAGGMPGGMPGGFPGGGAPPSGGASSGPTIEEVD



Results from MutPred-Indel will be sent to the email address provided.


Installing MutPred-Indel

After downloading the tarball package, unpack it:

tar -xzvf MutPredIndel_compiled.tar.gz

Running MutPred-Indel

The actual shell script that runs MutPred-Indel is called run_mutpredindel.sh. The input format is the same as that for the web application. MutPred-Indel can be run using the following command:

run_mutpredindel.sh mcr_directory input_file.fasta output_file_prefix 

Command-line arguments: all argument information can be displayed by simply typing run_mutpredindel.sh without any command-line arguments.

run_mutpredindel.sh
   USAGE: mutpredindel MCR_directory input_file output_filename  	

Interpreting the results

The output of MutPred-Indel consists of a general score (g), i.e., the probability that the framshifting or stop gain variant is pathogenic. This score is the average of the scores from all neural networks in MutPred-Indel. If interpreted as a probability, a score threshold of 0.50 would suggest pathogenicity. However, in our evaluations, we have estimated that a threshold of 0.50 yields a false positive rate (fpr) of 10% and that of 0.70 yields an fpr of 5%.

MutPred-Indel also outputs property scores that reflect the impact of a variant on different properties. An empirical P-value (P) is calculated as the fraction of putatively neutral variants in MutPred-Indel's training set with an amount of impacted residues >= to that amount for the given variant. A P-value threshold of 0.05 means that, under the null hypothesis, we expect 5% of putatively neutral variants to impact the particular property to the extent that the given variant does. These P-values are specific to each property.