Fastest way to switch out sites on one BCF for sites in another BCF?
0
0
Entering edit mode
3.9 years ago
curious ▴ 750

I want to replace sites in BCF B with those that appear in BCF A

BCF A:

##fileformat=VCFv4.1
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
chrX    2781512 chrX:2781512:A:G        A       G       .       PASS        GT:    0|0
chrX    2781514 chrX:2781514:C:A        C       A       .       PASS        GT:    0|1
chrX    2781518 chrX:2781518:A:G        A       G       .       PASS        GT:    0|1

BCF B:

##fileformat=VCFv4.1
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
chrX    2781514 chrX:2781514:C:A        C       A       .       PASS        GT:    0|0

I want BCF C:

##fileformat=VCFv4.1
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
chrX    2781512 chrX:2781512:A:G        A       G       .       PASS        GT:    0|0
chrX    2781514 chrX:2781514:C:A        C       A       .       PASS        GT:    0|0
chrX    2781518 chrX:2781518:A:G        A       G       .       PASS        GT:    0|1

Right now I am basically removing chrX:2781514:C:A from BCF A, then I think I have to concat BCF A and BCF B to get BCF C, then sorting BCF C, kind of like this:

bcftools view -e ID=@{remove_snps_list} {BCF A} -Ob > {BCF A_filtered}

bcftools concat {BCF A_filtered} {BCF C} -Ob > {BCF C}

bcftools sort {BCF C} -Ob > {BCF C_sorted}

This is going to take forever with the size of my files, is there a better way?

bcftools bcf vcf • 852 views
ADD COMMENT
1
Entering edit mode

Pipe the bcftools commands to save on IO time.

bcftools view -e ID=@{remove_snps_list} {BCF A} -Ob | bcftools concat - {BCF C} -Ob | bcftools sort -Ob - > {BCF C_sorted}
ADD REPLY
0
Entering edit mode

Other than that though, the three step approach seems reasonable and should have the desired effect?

Also would the BCF be loaded completely into memory before the sort step, since this I think can only be done with a complete BCF rather than a stream of sites?

ADD REPLY
0
Entering edit mode

Yeah the steps seem good - multiple self-contained steps are better than one quashed up vague operation/script.

I'm not sure if the entire BCF will be loaded into memory - it doesn't seem necessary for your case - one could stream one VCF, seek to locations on the other using the index and then replace entries, but I'm not sure how bcftools works.

ADD REPLY

Login before adding your answer.

Traffic: 3199 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6