Web server |
Installation |
Command-line |
Output |
The input data should be in a modified FASTA format with each variant represented by a set of two ordered sequences: the unmodified wildtype protein sequence and then the mutant protein sequence. The sequence ID does not need to conform to any particular format. An example is provided below. The web server allows for predictions on 100 variants (the number of sequences does not matter). Every protein sequence must be of length >30 and <30,000 residues. Computation time is proportional to the length of a sequence and the number of variants.
>NP_057295|SEC31A V88X wildtype MKLKEVDRTAMQAWSPAQNHPIYLATGTSAQQLDATFSTNASLEIFELDLSDPSLDMKSCATFSSSHRYHKLIWGPYKMDSKGDVSGVLIAGGENGNII LYDPSKIIAGDKEVVIAQNDKHTGPVRALDVNIFQTNLVASGANESEIYIWDLNNFATPMTPGAKTQPPEDISCIAWNRQVQHILASASPSGRATVWDL RKNEPIIKVSDHSNRMHCSGLAWHPDVATQMVLASEDDRLPVIQMWDLRFASSPLRVLENHARGILAIAWSMADPELLLSCGKDAKILCSNPNTGEVLY ELPTNTQWCFDIQWCPRNPAVLSAASFDGRISVYSIMGGSTDGLRQKQVDKLSSSFGNLDPFGTGQPLPPLQIPQQTAQHSIVLPLKKPPKWIRRPVGA SFSFGGKLVTFENVRMPSHQGAEQQQQQHHVFISQVVTEKEFLSRSDQLQQAVQSQGFINYCQKKIDASQTEFEKNVWSFLKVNFEDDSRGKYLELLGY RKEDLGKKHIKEEKEESEFLPSSGGTFNISVSGDIDGLITQALLTGNFESAVDLCLHDNRMADAIILAIAGGQELLARTQKKYFAKSQSKITRLITAVV MKNWKEIVESCDLKNWREALAAVLTYAKPDEFSALCDLLGTRLENEGDSLLQTQACLCYICAGNVEKLVACWTKAQDGSHPLSLQDLIEKVVILRKAVQ LTQAMDTSTVGVLLAAKMSQYANLLAAQGSIAAALAFLPDNTNQPNIMQLRDRLCRAQGEPVAGHESPKIPYEKQQLPKGRPGPVAGHHQMPRVQTQQY YPHGENPPPPGFIMHGNVNPNAAGQLPTSPGHMHTQVPPYPQPQPYQPAQPYPFGTGGSAMYRPQQPVAPPTSNAYPNTPYISSASSYTGQSQLYAAQH QASSPTSSPATSFPPPPSSGASFQHGGPGAPPSSSAYALPPGTTGTLPAASELPASQRTGPQNGWNDPPALNRVPKKKKMPENFMPPVPITSPIMNPLG DPQSQMLQQQPSAPVPLSSQSSFPQPHLPGGQPFHGVQQPLGQTGMPPSFSKPNIEGAPGAPIGNTFQHVQSLPTKKITKKPIPDEHLILKTTFEDLIQ RCLSSATDPQTKRKLDDASKRLEFLYDKLREQTLSPTITSGLHNIARSIETRNYSEGLTMHTHIVSTSNFSETSAFMPVLKVVLTQANKLGV >NP_057295|SEC31A V88X mutant MKLKEVDRTAMQAWSPAQNHPIYLATGTSAQQLDATFSTNASLEIFELDLSDPSLDMKSCATFSSSHRYHKLIWGPYKMDSKGDVSGV >NP_006588|HSPA8 Q473fs wildtype MSKGPAVGIDLGTTYSCVGVFQHGKVEIIANDQGNRTTPSYVAFTDTERLIGDAAKNQVAMNPTNTVFDAKRLIGRRFDDAVVQSDMKHWPFMVVNDAG RPKVQVEYKGETKSFYPEEVSSMVLTKMKEIAEAYLGKTVTNAVVTVPAYFNDSQRQATKDAGTIAGLNVLRIINEPTAAAIAYGLDKKVGAERNVLIF DLGGGTFDVSILTIEDGIFEVKSTAGDTHLGGEDFDNRMVNHFIAEFKRKHKKDISENKRAVRRLRTACERAKRTLSSSTQASIEIDSLYEGIDFYTSI TRARFEELNADLFRGTLDPVEKALRDAKLDKSQIHDIVLVGGSTRIPKIQKLLQDFFNGKELNKSINPDEAVAYGAAVQAAILSGDKSENVQDLLLLDV TPLSLGIETAGGVMTVLIKRNTTIPTKQTQTFTTYSDNQPGVLIQVYEGERAMTKDNNLLGKFELTGIPPAPRGVPQIEVTFDIDANGILNVSAVDKST GKENKITITNDKGRLSKEDIERMVQEAEKYKAEDEKQRDKVSSKNSLESYAFNMKATVEDEKLQGKINDEDKQKILDKCNEIINWLDKNQTAEKEEFEH QQKELEKVCNPIITKLYQSAGGMPGGMPGGFPGGGAPPSGGASSGPTIEEVD >NP_006588|HSPA8 Q473fs mutant MSKGPAVGIDLGTTYSCVGVFQHGKVEIIANDQGNRTTPSYVAFTDTERLIGDAAKNQVAMNPTNTVFDAKRLIGRRFDDAVVQSDMKHWPFMVVNDAG RPKVQVEYKGETKSFYPEEVSSMVLTKMKEIAEAYLGKTVTNAVVTVPAYFNDSQRQATKDAGTIAGLNVLRIINEPTAAAIAYGLDKKVGAERNVLIF DLGGGTFDVSILTIEDGIFEVKSTAGDTHLGGEDFDNRMVNHFIAEFKRKHKKDISENKRAVRRLRTACERAKRTLSSSTQASIEIDSLYEGIDFYTSI TRARFEELNADLFRGTLDPVEKALRDAKLDKSQIHDIVLVGGSTRIPKIQKLLQDFFNGKELNKSINPDEAVAYGAAVQAAILSGDKSENVQDLLLLDV TPLSLGIETAGGVMTVLIKRNTTIPTKQTQTFTTYSDNQPGVLIQVYEGERAMTKDNNLLGKFELTGIPPAPRGVPNMQ
The other information that the web server asks for is an email address. Results from MutPredLOF will be sent to the email address provided.
After downloading the tarball package, unpack it:
tar -xzvf MutPredLOF_compiled.tar.gz
The actual shell script that runs MutPredLOF is called run_mutpredlof.sh. The input format is the same as that for the web application. MutPredLOF can be run using the following command:
run_mutpredlof.sh mcr_directory input_file.fasta output_file_prefix
Command-line arguments: all argument information can be displayed by simply typing run_mutpredlof.sh without any command-line arguments.
run_mutpredlof.sh USAGE: mutpredlof MCR_directory input_file output_filename
The output of MutPredLOF consists of a general score (g), i.e., the probability that the framshifting or stop gain variant is pathogenic. This score is the average of the scores from all neural networks in MutPredLOF. If interpreted as a probability, a score threshold of 0.50 would suggest pathogenicity. However, in our evaluations, we have estimated that a threshold of 0.50 yields a false positive rate (fpr) of 10% and that of 0.70 yields an fpr of 5%.
MutPredLOF also outputs property scores that reflect the impact of a variant on different properties. An empirical P-value (P) is calculated as the fraction of putatively neutral variants in MutPredLOF's training set with an amount of impacted residues >= to that amount for the given variant. A P-value threshold of 0.05 means that, under the null hypothesis, we expect 5% of putatively neutral variants to impact the particular property to the extent that the given variant does. These P-values are specific to each property.