Remove variants that do not map to human genome
1
0
Entering edit mode
4.9 years ago

I received an hg38 VCF file that's had variants imputed with 1000 genomes. I've encountered some issues with the VCF; REF alleles that do not align to a reference genome, ALT alleles that do not appear to be reported anywhere in the literature, and, most recently, variants that flat-out do not align to the human genome (variants on chr19 with bp-pos 100 million+ when the whole chromosome is in the 50 million bp range).

I've worked out hack-y solutions to most of the issues that I've encountered, but this latest one has been an issue for me. I only detected these variants when I ran VEP and it flagged them as not mapping to the genome. As such, I'm more or less removing these variants one at a time using grep -v. I'd like a solution where I can just remove any variants from the vcf that appear to map to regions that do not exist in the human genome. Bonus points if the solution also encompasses some of the other issues I mentioned, although I think I've already found solutions to those. Is there anything out there that does this?

vcf variant-calling QC • 1.3k views
ADD COMMENT
0
Entering edit mode

Hello john.michel.rouhana!

It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/8629/remove-variants-that-do-not-map-to-human-genome

This is typically not recommended as it uses the finite time of volunteers in both communities.

ADD REPLY
0
Entering edit mode

I wasn't aware- thank you for making this apparent. I thought it'd make the most sense to post it in both locations. Thanks for the etiquette lesson. Is there any way to remove my post here?

ADD REPLY
1
Entering edit mode

You received an answer already which is why we would restore a deleted post anyway out of respect for the user who invested time to answer. Don't worry, leave the question here but for the future, please consider not to cross-post as many users are active in both communities, avoiding double-efforts ;-)

ADD REPLY
1
Entering edit mode

The minimal you could do is link both posts to each other, so contributors on forum A will find that someone has replied on forum B.

ADD REPLY
2
Entering edit mode
4.9 years ago

(not tested)

bcftools norm -c x

with option

-c, --check-ref <e|w|x|s>         check REF alleles and exit (e), warn (w), exclude (x), or set (s) bad sites [e]
ADD COMMENT

Login before adding your answer.

Traffic: 2029 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6