Split VCF into separate VCFs by SNP count
3
0
Entering edit mode
3.1 years ago

Is there a way to run bcftools or another program to split up a vcf of a single scaffold/chromosome into chunks by SNP count? For example, how could I best split a VCF with 10,581 SNPs into ten chunks of length 1000 (snp 1 - 1000, snp 1001 - 2000, ...), and an 11th chunk SNP 10,001 - SNP 10,581. Is there a simple way to do this?

Thanks!

vcf bcftools next-gen • 3.9k views
ADD COMMENT
1
Entering edit mode
3.1 years ago
4galaxy77 2.8k

There could be a more elegant way, but this could work. The first line splits up the snps into chunks of whatever size you want (-l) and then the next line loops over each file and subsets the vcf according.

bcftools query -f'%CHROM\t%POS\n' in.bcf | split -l 1000
for file in x*; do bcftools view -T $file -Ob in.bcf > in.$file.bcf; done
ADD COMMENT
1
Entering edit mode
-O             b
ADD REPLY
1
Entering edit mode
3.1 years ago

I wrote a tool: http://lindenb.github.io/jvarkit/Biostar497922.html

$ gunzip -c src/test/resources/rotavirus_rf.vcf.gz | java -jar dist/biostar497922.jar -o TMP -n 20 -m jeter.mf
[INFO][Biostar497922]Writing TMP/split.000001.vcf.gz
[INFO][Biostar497922]Writing TMP/split.000002.vcf.gz
[INFO][Biostar497922]Writing TMP/split.000003.vcf.gz
[INFO][Biostar497922]. Completed. N=45. That took:0 second

TMP/split.000001.vcf.gz RF01    970 A   RF04    1900    A   20
TMP/split.000002.vcf.gz RF04    1920    A   RF09    317 C   20
TMP/split.000003.vcf.gz RF09    414 T   RF11    74  CAAAAA  5
ADD COMMENT
0
Entering edit mode

That's great! Thank you for doing that.

ADD REPLY

Login before adding your answer.

Traffic: 2543 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6