Say your VCF contains the per-sample depth and genotype quality annotations and you want to include only sites where one or more samples have big enough coverage (DP>10
) and genotype quality (GQ>20
). The expression -i 'FMT/DP>10 & FMT/GQ>20'
selects sites where the conditions are satisfied within the same sample:
bcftools query -i'FMT/DP>10 & FMT/GQ>20' -f'%POS[\t%SAMPLE:DP=%DP GQ=%GQ]\n' file.bcf
49979 SampleA:DP=10 GQ=50 SampleB:DP=20 GQ=40
On the other hand, if you need to include sites where both conditions met but not necessarily in the same sample, use the && operator rather than &:
bcftools query -i'FMT/DP>10 && FMT/GQ>20' -f'%POS[\t%SAMPLE:DP=%DP GQ=%GQ]\n' file.bcf
31771 SampleA:DP=10 GQ=50 SampleB:DP=40 GQ=20
49979 SampleA:DP=10 GQ=50 SampleB:DP=20 GQ=40
This example is taken from http://samtools.github.io/bcftools/howtos/filtering.html
EDIT: (inserted by a mod)
Answer given on github:
Well, sorry to demonstrate the difference on &
and &&
instead of |
of ||
, but it's the same priniciple.
The manual page says it all:
QUAL>10 | FMT/GQ>10 .. true for sites with QUAL>10 or a sample with GQ>10, but selects only samples with GQ>10
QUAL>10 || FMT/GQ>10 .. true for sites with QUAL>10 or a sample with GQ>10, plus selects all samples at such sites
Or you can try to run yourself:
$ bcftools query -f'[%POS %SAMPLE %DP\n]\n' -i'FMT/DP=19 | FMT/DP="."' test/view.filter.vcf
3162006 A 19
3162007 A .
3162007 B .
$ bcftools query -f'[%POS %SAMPLE %DP\n]\n' -i'FMT/DP=19 || FMT/DP="."' test/view.filter.vcf
3162006 A 19
3162006 B 1
3162007 A .
3162007 B .
sounds like the the single | works like the linux pipe symbol in a cmdline and the double || as the 'or operator'
Hm, if it would work like linux pipe where is the difference to
&&
?I guess I also have a problem with the description "selects only/all samples". Normaly I select a whole vcf line and not just a sample. Which subcommand of
bcftools
is able to select specific samples based on expression?Furthermore the man page said
&& (same as &)
, but in the examples this seems to be not the same:I'm confused. The only thing that seems to be clear to me, is that
|
vs.||
or&
vs&&
makes differences in multisamplevcf
's.fin swimmer
Could this be related to short-circuiting by any chance?
Hm, looks quite good. But I'm still not sure.
I decided to crosspost directly on bcftools github.
Let's wait and see.
Tagging: lh3