Entering edit mode
15 days ago
kirill.zaslavsky
•
0
Hi all, I am analyzing variants from an old database and remapping them to hg38. I have what seems to be the same variant in two people annotated in two different ways - as an insertion and duplication. Here is the VCF input:
X 41473864 . G <DUP> . . PtID=XXXXXX;SVTYPE=DUP;SVLEN=9;END=41473872 GT:DP 1:150
X 41473872 . T TGCGCCGCCT . . PtID=YYYYYY;SVTYPE=INS;END=41473873 GT:DP 1:150
Ensembl's VEP correctly classifies the <DUP> variant as protein coding SnpEff incorrectly classifies is an intron variant
For now, I am just converting the short <DUP> entries into INS entries as a workaround, but I am wondering what is causing this issue and how it can be fixed
Thank you for your help
Can you show us the output from VEP and snpEff? They could be annotating two different transcripts.
EDIT: I ran VEP quickly on this variant and there are both coding and non-coding variants at this genomic position. See results below:
I get the exact same output from VEP as you do.
However, from SnpEff I get (parsed):
It tries to do some weird realignment. From snpEff log in bash:
Unsure why it's doing this...
Thanks again for your help
It looks like snpEff does not work reliably on SVs: https://pcingola.github.io/SnpEff/snpeff/introduction/?h=structural#snpeff-features (search for "structural variant")
I guess that could be because the INS has a definite length whereas the DUP could be taken as dupG, which is why it doesn't give you accurate annotations. My point is, VEP will cover all bases so I'd pick that over snpEff any given day.
this is good to know, thanks!