Entering edit mode
5.2 years ago
Pierre Lindenbaum
160k
I've been asked to find the status of some variants (= overlaping a bed file) in a '1000 genomes' data set, including the HOM_REF genotypes (0/0).
For now I'm downloading each bam from 1000G and I call bcftools, (on sample per bam) , I played with --gvcf
but I cannot find the correct way to get the 0/0 genotypes.
here a snippet of my current nextflow (it doesn't work: there is no '0/0' in my vcf)
${samtools} view -u -b "${sample}.bam" "${C}" |\
${bcftools} mpileup --output-type b --targets "${C}.bed" --gvcf 0 -a 'FORMAT/DP' -a 'FORMAT/AD' -o "tmp.bcf" --fasta-ref "${REF}" -
${bcftools} index -f "tmp.bcf"
${bcftools} call --output-type z --ploidy GRCh37 --regions-file "${C}.bed" -o "tmp.${C}.vcf.gz" --gvcf 0 --multiallelic-caller "tmp.bcf"
${bcftools} index -t -f "tmp.${C}.vcf.gz"
Hey Pierre,
is there a reason why you use
samtools view
at the beginning? Also saving the mpileup result and indexing it is usually unnecessary.Until now I wasn't able to reproduce your problem. I was able to get 0/0 genotype with the
--gvcf
option and without. Then you get a record for each covered position.I've tried the long way like you did. And a short one (see below). The only difference to your code is, that I haven't use a
bed
file for the region.Result is:
splitting the analysis by chromosome. It's not clear to me how to do this: When using
bcftools --gvcf
I think one need to use the streaming filter to get the BED (NOT random-access)Thanks ! I'm away from the lab, I'll try tomorrow. About the BED: I'd like to avoid to extract the variants for the whole BAM...
If I understand your code correct, you have a
bed
file for each chromosome? If so, you can omit thesamtools view
and use the--regions-file
parameter forbcftools mpileup
instead of--targets
.bcftools
have then random access to the regions.If you want to query remote files (like I did in my test example) it might be useful to use
samtools view
to speed up things. Then apply the-L
parameter along with yourbed
file and use-M
to use the multi-region iterator.But that's all about optimization and not resolving your problem ...
Hi @finswimmer, I have a question regarding the command line to create
gvcf
file usingbcftools
. I used:I want to keep the calls with read depth more than
5
. But the gvcf shows some calls with DP lower than 5. I am not sure why is this error happening. Thank you!