Editing vcf file using bcftools
0
0
Entering edit mode
6.6 years ago
janhuang.cn ▴ 210

I want to use bcftool (Version: 1.0 (using htslib 1.0)) to edit a vcf file, and then export a updated vcf file or bed file (bed is preferred).

There are several things that I want to do, and I found some relevant command from https://samtools.github.io/bcftools/bcftools.html. But I don't know how to put them together. Particularly I do not even know how to load the original vcf file.

I also found a previous post (Extract subset of samples from multigenome vcf file) on similar topic, but I still do not understand the command there.

bcftools view -Oz -S sample.txt $file > /get/inthis/dir/output_"${i##*/}"_.vcf.gz

1) Subset a sample based on a txt file. This txt file include the sample I want to keep in the vcf.

-S, --samples-file FILE

2) Keep the snps

-v, --types snps

3) Keep only snps with maf > 0.05

I did not find relevant command for this.

4) remove duplicate snp

-d, --rm-dup snps

or

-c, --collapse snps
bcftools vcf subset SNP maf • 6.6k views
ADD COMMENT
0
Entering edit mode

So what's the problem? Are you getting an error? bcftools filter is the command you'll need to filter by MAF, assuming it's one of your INFO fields. Any particular reason you're using such an old version of the tools? The current version is 1.5.

You won't be able to output in BED format with bcftools, you'll need to use something like BEDOPS' vcf2bed tool to make that conversion.

ADD REPLY
0
Entering edit mode

I used this command to subset the European sample from the all sample vcf (ALL.genotypes.vcf.gz), and export the vcf of European sample (EUR.genotypes.vcf.gz).

bcftools view --samples-file EUR.txt --force-samples --types snps --exclude "MAF[0]<0.05" --output-file EUR.genotypes.vcf.gz ALL.genotypes.vcf.gz

But I also want to remove duplicate snps. So I used the below command. However, bcftools did not return anything.

bcftools norm --remove-duplicates snps --output rmvdup_EUR.genotypes.vcf.gz EUR.genotypes.vcf.gz
ADD REPLY
0
Entering edit mode

So to clarify, after the first command, you still have output, but lose it after the second? What is snps doing in that command? The --remove-duplicates parameter doesn't require you to specify the type of record if I remember correctly.

You could also use the vcfuniq command from VCFutils to do this.

ADD REPLY
0
Entering edit mode

Thank you. You are right, I should not put snps after --remove-duplicates . The below command works.

bcftools norm --remove-duplicates --output rmvdup_EUR.genotypes.vcf.gz EUR.genotypes.vcf.gz

I now use bcftools view to export a vcf file for European population, and then use bcftools norm to remove duplicates, then export a second vcffile. But can I use bcftools view and bcftools norm in the same command? I do not actually need the first vcf file.

ADD REPLY
0
Entering edit mode

Yes, you can do both commands in one line with UNIX piping:

bcftools view --samples-file EUR.txt --force-samples --types snps --exclude "MAF[0]<0.05" ALL.genotypes.vcf.gz | bcftools norm --remove-duplicates --output rmvdup_EUR.genotypes.vcf.gz -

That should work and remove the need for the intermediate file. Glad you got it working.

ADD REPLY
0
Entering edit mode

I See. Thank you so much!

ADD REPLY

Login before adding your answer.

Traffic: 2369 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6