Good afternoon,
I'm trying to filter a VCF file that has the following dummy flag values:
PASS
: All filters passedFa
: Failed filter aFb
: Failed filter bFc
: Failed filter cFd
: Faield filter d
Variants can fail one or more filters. Variants that fail multiple filters will be annotated with the corresponding flags separated by semi-colon. Thus, the filter column can have one of the 5 above values, or any number of F*
values separated by ;
.
I'd like to select all variants that either PASS
ed or only failed filter a. How can I do this in bcftools? The -f
option skips location that does not contain one of the listed filters, so it keeps locations that contain any of the listed filters. When I use
bcftools view -f PASS,Fa ...
I get rows that failed filter a along with other filters also. That is, the above expression matches both Fa
and Fa;Fb
. I tried excluding the delimiter, but that didn't work:
bcftools view -f 'PASS,Fa,;' ... #didn't work
Does anyone know how to exclude or include exactly a list of filters? Nothing in the -i
or -e
EXPRESSIONS is useful either.
This is what I'm using right now, which is awk
mocking bcftools
:
zcat vcf_file.vcf.gz | awk -F"\t" -vOFS="\t" '$0 ~ /^#/ {print} $7=="PASS" || $7=="Fa" {print}'
Update: I tried this, but it picked up an entry that it was not supposed to pick up:
It picked up an entry where the FILTER value was
Fa;Fb
.Update #2: I've opened an issue on bcftools github: https://github.com/samtools/bcftools/issues/1285