Polyphen-2 Classifier Model
2
0
Entering edit mode
11.9 years ago

On the polyphen-2 webpage

http://genetics.bwh.harvard.edu/pph2/bgi.shtml

There are two options for the classifier model:

HumDiv HumVar

Could anyone explain what these are? Is there a documentation for this?

Thanks, forum.

• 10k views
ADD COMMENT
3
Entering edit mode
11.9 years ago

ftp://genetics.bwh.harvard.edu/pph2/training/README

PolyPhen-2 v2.2.2 training sets statistics (2011_12):

HumDiv: 5564 deleterious + 7539 neutral mutations from the same set of 978 human proteins.

HumVar: 22196 deleterious + 21119 neutral mutations in 9679 human proteins, no restriction on deleterious and neutral mutations coming from same proteins.

HumDiv is Mendelian disease variants vs. divergence from close mammalian homologs of human proteins (>=95% sequence identity).

HumVar is all human variants associated with some disease (except cancer mutations) or loss of activity/function vs. common (minor allele frequency >1%) human polymorphism with no reported association with a disease of other effect.

ADD COMMENT
0
Entering edit mode

and in case of discrepancies: let's say "possibly" in HumDiv and "benign" in HumVar, which one is more reliable to be disease-associated? I mean, a variant can not be damaging for a Mendelian disease but not associated to any other...or there is something I am missing?

Thanks a lot in advance

ADD REPLY
2
Entering edit mode
11.9 years ago
Vikas Bansal ★ 2.4k

These are already explained here

PolyPhen-2 predicts the functional significance of an allele replacement from its individual features by Naïve Bayes classifier trained using supervised machine-learning.

Two pairs of datasets were used to train and test PolyPhen-2 prediction models. The first pair, HumDiv, was compiled from all damaging alleles with known effects on the molecular function causing human Mendelian diseases, present in the UniProtKB database, together with differences between human proteins and their closely related mammalian homologs, assumed to be non-damaging. The second pair, HumVar, consisted of all human disease-causing mutations from UniProtKB, together with common human nsSNPs (MAF>1%) without annotated involvement in disease, which were treated as non-damaging.

The user can choose between HumDiv- and HumVar-trained PolyPhen-2 models. Diagnostics of Mendelian diseases requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles. Thus, HumVar-trained model should be used for this task. In contrast, HumDiv-trained model should be used for evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data, where even mildly deleterious alleles must be treated as damaging.

For a mutation, PolyPhen-2 calculates Naïve Bayes posterior probability that this mutation is damaging and reports estimates of false positive rate (FPR, the chance that the mutation is classified as damaging when it is in fact non-damaging) and true positive rate (TPR, the chance that the mutation is classified as damaging when it is indeed damaging). A mutation is also appraised qualitatively, as benign, possibly damaging, or probably damaging based on pairs of false positive rate (FPR) thresholds, optimized separately for each model (e.g., HumDiv and HumVar).

Current version 2.1.0 of the PolyPhen-2 uses 5% / 10% FPR for HumDiv model and 10% / 20% FPR for HumVar model as the thresholds for this ternary classification. Mutations with their posterior probability scores associated   with estimated false positive rates at or below the first (lower) FPR value are predicted to be probably damaging (more confident prediction). Mutations with the posterior probabilities associated with false positive rates at or below the second (higher) FPR value are predicted to be possibly damaging (less confident prediction). Mutations with estimated false positive rates above the second (higer) FPR value are classified as benign.

You can also find more information here.

ADD COMMENT

Login before adding your answer.

Traffic: 2071 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6