Biostar Beta. Not for public use.
Remove variants that do not map to human genome
0
Entering edit mode
13 months ago

I received an hg38 VCF file that's had variants imputed with 1000 genomes. I've encountered some issues with the VCF; REF alleles that do not align to a reference genome, ALT alleles that do not appear to be reported anywhere in the literature, and, most recently, variants that flat-out do not align to the human genome (variants on chr19 with bp-pos 100 million+ when the whole chromosome is in the 50 million bp range).

I've worked out hack-y solutions to most of the issues that I've encountered, but this latest one has been an issue for me. I only detected these variants when I ran VEP and it flagged them as not mapping to the genome. As such, I'm more or less removing these variants one at a time using grep -v. I'd like a solution where I can just remove any variants from the vcf that appear to map to regions that do not exist in the human genome. Bonus points if the solution also encompasses some of the other issues I mentioned, although I think I've already found solutions to those. Is there anything out there that does this?

ADD COMMENTlink
0
Entering edit mode

Hello john.michel.rouhana!

It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/8629/remove-variants-that-do-not-map-to-human-genome

This is typically not recommended as it uses the finite time of volunteers in both communities.

ADD REPLYlink
0
Entering edit mode

I wasn't aware- thank you for making this apparent. I thought it'd make the most sense to post it in both locations. Thanks for the etiquette lesson. Is there any way to remove my post here?

ADD REPLYlink
1
Entering edit mode

You received an answer already which is why we would restore a deleted post anyway out of respect for the user who invested time to answer. Don't worry, leave the question here but for the future, please consider not to cross-post as many users are active in both communities, avoiding double-efforts ;-)

ADD REPLYlink
1
Entering edit mode

The minimal you could do is link both posts to each other, so contributors on forum A will find that someone has replied on forum B.

ADD REPLYlink
2
Entering edit mode
14 months ago
France/Nantes/Institut du Thorax - INSE…

(not tested)

bcftools norm -c x

with option

-c, --check-ref <e|w|x|s>         check REF alleles and exit (e), warn (w), exclude (x), or set (s) bad sites [e]
ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1