Split Vcf File Into Snps And Indels
3
9
Entering edit mode
11.8 years ago
pablo.riesgo ▴ 140

Hi there,

As recommended in the GATK best practices the Variant Quality Score Recalibration has to be done separately for SNPs and Indels. But, I didn't find the way to do this split in a clean way (for instance vcftools). Does anybody know a tool to do this?

I already found a script that does the trick but I am surprised that this functionality is not included in the usual tools for processing VCF files.

The script in case it helps: http://ngsda.blogspot.com.es/2011/06/awk-script-to-seperate-snp-and-indel.html

Thanks! Pablo.

split vcf snp indel gatk • 21k views
ADD COMMENT
0
Entering edit mode

As an update, in my use cases, VCFtools aren't able to process my vcf files and will report that there are some errors. Specifically, the error is because Polyploidy was found, and it wasn't currently supported by vcftools.

ADD REPLY
0
Entering edit mode

Please do not add an answer unless it answers the top level post. This post is better suited as a comment, and I am moving it to one.

ADD REPLY
15
Entering edit mode
11.8 years ago

The most recent versions of vcftools have an option to include o remove indels.

From http://vcftools.sourceforge.net/options.html#site_filter :

--keep-only-indels
--remove-indels

Include or exclude sites that contain an indel. For this option 'indel' means any variant that alters the length of the REF allele.

This functionality is relatively new, so if can't use these options on your computer, it means that you are using an old version of vcftools.

ADD COMMENT
0
Entering edit mode

Hey, I'm facing problems with --remove-indels. Though I have the latest version of vcftools (vcftools_0.1.12a.tar.gz) installed, I get an error.

Command:

vcftools --gzvcf LR1_sorted_snp.vcf.gz --remove-indels --recode --recode-INFO-all --out LR1_SNP_ONLY
Error: Unknown option: --remove-indels

Can you help me out?

Thank you!

ADD REPLY
1
Entering edit mode
11.8 years ago
pablo.riesgo ▴ 140

Thanks!!

It suits perfectly my needs. I was updated but I had missed this bit of the documentation.

This command creates a new VCF file keeping only indels and leaving the INFO field untouched:

vcftools --vcf X.vcf --keep-only-indels --out X.indel --recode --recode-INFO-all

Regarding the python script I posted it does not work well in case of having many SNPs at the same position. REF=A ALT=C,G is recognised as an indel while it is actually two SNPs.

Pablo.

ADD COMMENT

Login before adding your answer.

Traffic: 2283 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6