Need suggestions to combine the missense variants pathogenicity prediction scores generated by SIFT, CADD, REVEL, PolyPhen, MPC, etc. to prioritize/select further
1
1
Entering edit mode
3.0 years ago
Apurba ▴ 10

Hi,

I am trying to find out a way to select/prioritize the missense variants based on an array of pathogenicity prediction scores generated by SIFT, CADD, REVEL, PolyPhen, MPC, and M_CAP. Right now, I have transformed these scores into dichotomous variables (based on their recommended cut-off) and added them to generate a combined score (ranging from 0 to 6).

Then further->select only those missense variants which showed deleterious/damaging effect recommended by at least 50% of the above-mentioned tools.

Can someone suggest other possible ways/approaches to combine these scores (which can be helpful in prioritizing the missense variants)?

Thanks in advance

Apurba

Missense PredictionTools • 1.3k views
ADD COMMENT
1
Entering edit mode
3.0 years ago

Hi,

there is a lot of work in this direction.

In brief, you need a large set of truly pathogenic variants (ClinVar etc) and truly neutral ones (1000GP and other population bases) and use these 6 scoring as features and train some sort of classifier.

CADD if I am not mistaken is already a combination of scores.

Or you may use tools already developed (e.g. https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-020-00775-w )

ADD COMMENT
0
Entering edit mode

Thanks, for your reply. I am aware that in addition to CADD, other scores e.g. REVEL, MPC, etc. were generated combinedly by utilizing multiple other scores and features. But my question was how someone can use them together to select/prioritize the missense variants for their own data sets (e.g. right now for our dataset, I am using a cut-off of at least 3 out of 6 prediction tools (if at least 3 of them would show the damaging impact) to select a missense variant, but that seems to be subjective; someone else may prefer to select the cut off of 4 out of 6)?

ADD REPLY
0
Entering edit mode

You can use it as 3 of 6, or 4 of 6, or 5 of 6, or 1 of 6, and still be right. It is the question of the accuracy of your threshold. To assess the accuracy (Precision ad Recall) you can't avoid a proper statistical approach with positive and negative examples.

ADD REPLY

Login before adding your answer.

Traffic: 2415 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6