MutPredLOF: a predictor of impactful frameshifting and stop gain variants

Web server

Installation

Command-line

Output

The MutPredLOF web server

The input data should be in a modified FASTA format with each variant represented by a set of two ordered sequences: the unmodified wildtype protein sequence and then the mutant protein sequence. The sequence ID does not need to conform to any particular format. An example is provided below. The web server allows for predictions on 100 variants (the number of sequences does not matter). Every protein sequence must be of length >30 and <30,000 residues. Computation time is proportional to the length of a sequence and the number of variants.

>NP_057295|SEC31A V88X wildtype
MKLKEVDRTAMQAWSPAQNHPIYLATGTSAQQLDATFSTNASLEIFELDLSDPSLDMKSCATFSSSHRYHKLIWGPYKMDSKGDVSGVLIAGGENGNII
LYDPSKIIAGDKEVVIAQNDKHTGPVRALDVNIFQTNLVASGANESEIYIWDLNNFATPMTPGAKTQPPEDISCIAWNRQVQHILASASPSGRATVWDL
RKNEPIIKVSDHSNRMHCSGLAWHPDVATQMVLASEDDRLPVIQMWDLRFASSPLRVLENHARGILAIAWSMADPELLLSCGKDAKILCSNPNTGEVLY
ELPTNTQWCFDIQWCPRNPAVLSAASFDGRISVYSIMGGSTDGLRQKQVDKLSSSFGNLDPFGTGQPLPPLQIPQQTAQHSIVLPLKKPPKWIRRPVGA
SFSFGGKLVTFENVRMPSHQGAEQQQQQHHVFISQVVTEKEFLSRSDQLQQAVQSQGFINYCQKKIDASQTEFEKNVWSFLKVNFEDDSRGKYLELLGY
RKEDLGKKHIKEEKEESEFLPSSGGTFNISVSGDIDGLITQALLTGNFESAVDLCLHDNRMADAIILAIAGGQELLARTQKKYFAKSQSKITRLITAVV
MKNWKEIVESCDLKNWREALAAVLTYAKPDEFSALCDLLGTRLENEGDSLLQTQACLCYICAGNVEKLVACWTKAQDGSHPLSLQDLIEKVVILRKAVQ
LTQAMDTSTVGVLLAAKMSQYANLLAAQGSIAAALAFLPDNTNQPNIMQLRDRLCRAQGEPVAGHESPKIPYEKQQLPKGRPGPVAGHHQMPRVQTQQY
YPHGENPPPPGFIMHGNVNPNAAGQLPTSPGHMHTQVPPYPQPQPYQPAQPYPFGTGGSAMYRPQQPVAPPTSNAYPNTPYISSASSYTGQSQLYAAQH
QASSPTSSPATSFPPPPSSGASFQHGGPGAPPSSSAYALPPGTTGTLPAASELPASQRTGPQNGWNDPPALNRVPKKKKMPENFMPPVPITSPIMNPLG
DPQSQMLQQQPSAPVPLSSQSSFPQPHLPGGQPFHGVQQPLGQTGMPPSFSKPNIEGAPGAPIGNTFQHVQSLPTKKITKKPIPDEHLILKTTFEDLIQ
RCLSSATDPQTKRKLDDASKRLEFLYDKLREQTLSPTITSGLHNIARSIETRNYSEGLTMHTHIVSTSNFSETSAFMPVLKVVLTQANKLGV
>NP_057295|SEC31A V88X mutant
MKLKEVDRTAMQAWSPAQNHPIYLATGTSAQQLDATFSTNASLEIFELDLSDPSLDMKSCATFSSSHRYHKLIWGPYKMDSKGDVSGV
>NP_006588|HSPA8 Q473fs wildtype
MSKGPAVGIDLGTTYSCVGVFQHGKVEIIANDQGNRTTPSYVAFTDTERLIGDAAKNQVAMNPTNTVFDAKRLIGRRFDDAVVQSDMKHWPFMVVNDAG
RPKVQVEYKGETKSFYPEEVSSMVLTKMKEIAEAYLGKTVTNAVVTVPAYFNDSQRQATKDAGTIAGLNVLRIINEPTAAAIAYGLDKKVGAERNVLIF
DLGGGTFDVSILTIEDGIFEVKSTAGDTHLGGEDFDNRMVNHFIAEFKRKHKKDISENKRAVRRLRTACERAKRTLSSSTQASIEIDSLYEGIDFYTSI
TRARFEELNADLFRGTLDPVEKALRDAKLDKSQIHDIVLVGGSTRIPKIQKLLQDFFNGKELNKSINPDEAVAYGAAVQAAILSGDKSENVQDLLLLDV
TPLSLGIETAGGVMTVLIKRNTTIPTKQTQTFTTYSDNQPGVLIQVYEGERAMTKDNNLLGKFELTGIPPAPRGVPQIEVTFDIDANGILNVSAVDKST
GKENKITITNDKGRLSKEDIERMVQEAEKYKAEDEKQRDKVSSKNSLESYAFNMKATVEDEKLQGKINDEDKQKILDKCNEIINWLDKNQTAEKEEFEH
QQKELEKVCNPIITKLYQSAGGMPGGMPGGFPGGGAPPSGGASSGPTIEEVD
>NP_006588|HSPA8 Q473fs mutant
MSKGPAVGIDLGTTYSCVGVFQHGKVEIIANDQGNRTTPSYVAFTDTERLIGDAAKNQVAMNPTNTVFDAKRLIGRRFDDAVVQSDMKHWPFMVVNDAG
RPKVQVEYKGETKSFYPEEVSSMVLTKMKEIAEAYLGKTVTNAVVTVPAYFNDSQRQATKDAGTIAGLNVLRIINEPTAAAIAYGLDKKVGAERNVLIF
DLGGGTFDVSILTIEDGIFEVKSTAGDTHLGGEDFDNRMVNHFIAEFKRKHKKDISENKRAVRRLRTACERAKRTLSSSTQASIEIDSLYEGIDFYTSI
TRARFEELNADLFRGTLDPVEKALRDAKLDKSQIHDIVLVGGSTRIPKIQKLLQDFFNGKELNKSINPDEAVAYGAAVQAAILSGDKSENVQDLLLLDV
TPLSLGIETAGGVMTVLIKRNTTIPTKQTQTFTTYSDNQPGVLIQVYEGERAMTKDNNLLGKFELTGIPPAPRGVPNMQ

The other information that the web server asks for is an email address. Results from MutPredLOF will be sent to the email address provided.

Installing MutPredLOF

After downloading the tarball package, unpack it:

tar -xzvf MutPredLOF_compiled.tar.gz

Running MutPredLOF

The actual shell script that runs MutPredLOF is called run_mutpredlof.sh. The input format is the same as that for the web application. MutPredLOF can be run using the following command:

run_mutpredlof.sh mcr_directory input_file.fasta output_file_prefix

Command-line arguments: all argument information can be displayed by simply typing run_mutpredlof.sh without any command-line arguments.

run_mutpredlof.sh
   USAGE: mutpredlof MCR_directory input_file output_filename

Interpreting the results

The output of MutPredLOF consists of a general score (g), i.e., the probability that the framshifting or stop gain variant is pathogenic. This score is the average of the scores from all neural networks in MutPredLOF. If interpreted as a probability, a score threshold of 0.50 would suggest pathogenicity. However, in our evaluations, we have estimated that a threshold of 0.50 yields a false positive rate (fpr) of 10% and that of 0.70 yields an fpr of 5%.

MutPredLOF also outputs property scores that reflect the impact of a variant on different properties. An empirical P-value (P) is calculated as the fraction of putatively neutral variants in MutPredLOF's training set with an amount of impacted residues >= to that amount for the given variant. A P-value threshold of 0.05 means that, under the null hypothesis, we expect 5% of putatively neutral variants to impact the particular property to the extent that the given variant does. These P-values are specific to each property.