Question

getting the HOM_REF genotypes using bcftools mpileup+call ?

1

Entering edit mode

5.2 years ago

Pierre Lindenbaum 161k

I've been asked to find the status of some variants (= overlaping a bed file) in a '1000 genomes' data set, including the HOM_REF genotypes (0/0).

For now I'm downloading each bam from 1000G and I call bcftools, (on sample per bam) , I played with --gvcf but I cannot find the correct way to get the 0/0 genotypes.

here a snippet of my current nextflow (it doesn't work: there is no '0/0' in my vcf)

${samtools} view -u -b "${sample}.bam" "${C}" |\
${bcftools} mpileup --output-type b --targets "${C}.bed" --gvcf 0 -a 'FORMAT/DP'  -a 'FORMAT/AD'  -o "tmp.bcf" --fasta-ref "${REF}" -

${bcftools} index  -f "tmp.bcf"

${bcftools} call --output-type z --ploidy GRCh37 --regions-file "${C}.bed" -o "tmp.${C}.vcf.gz" --gvcf 0 --multiallelic-caller "tmp.bcf"

${bcftools} index -t -f "tmp.${C}.vcf.gz"

bcftools call mpileup variant genotype • 2.7k views

ADD COMMENT • link updated 5.2 years ago by Ram 43k • written 5.2 years ago by Pierre Lindenbaum 161k

1

Entering edit mode

Hey Pierre,

is there a reason why you use samtools view at the beginning? Also saving the mpileup result and indexing it is usually unnecessary.

Until now I wasn't able to reproduce your problem. I was able to get 0/0 genotype with the --gvcf option and without. Then you get a record for each covered position.

I've tried the long way like you did. And a short one (see below). The only difference to your code is, that I haven't use a bedfile for the region.

$ bcftools mpileup -Ou --gvcf 0 -f Homo_sapiens.GRCh37.dna.primary_assembly.fa -r 1:977320-977340 ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/high_coverage_alignment/HG00096.wgs.ILLUMINA.bwa.GBR.high_cov_pcr_free.20140203.bam | bcftools call -m --gvcf 0 -Oz -o 1kg.vcf.gz

Result is:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  HG00096
1   977320  .   C   .   .   .   END=977329;MinDP=14 GT:DP   0/0:14
1   977330  .   T   C   214 .   DP=28;VDB=0.652362;SGB=-0.688148;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,11,4;MQ=60 GT:PL:DP    1/1:244,45,0:15
1   977331  .   C   .   .   .   END=977340;MinDP=15 GT:DP   0/0:15

ADD REPLY • link 5.2 years ago by finswimmer 16k

0

Entering edit mode

is there a reason why you use samtools view at the beginning? Also saving the mpileup result and indexing it is usually unnecessary.

splitting the analysis by chromosome. It's not clear to me how to do this: When using bcftools --gvcf I think one need to use the streaming filter to get the BED (NOT random-access)

I've tried the long way like you did. And a short one (see below). The only difference to your code is, that I haven't use a bedfile for the region.

Thanks ! I'm away from the lab, I'll try tomorrow. About the BED: I'd like to avoid to extract the variants for the whole BAM...

ADD REPLY • link 5.2 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

If I understand your code correct, you have a bed file for each chromosome? If so, you can omit the samtools view and use the --regions-file parameter for bcftools mpileup instead of --targets. bcftools have then random access to the regions.

If you want to query remote files (like I did in my test example) it might be useful to use samtools view to speed up things. Then apply the -L parameter along with your bed file and use -M to use the multi-region iterator.

But that's all about optimization and not resolving your problem ...

ADD REPLY • link 5.2 years ago by finswimmer 16k

0

Entering edit mode

Hi @finswimmer, I have a question regarding the command line to create gvcf file using bcftools. I used:

bcftools mpileup -Ov --gvcf 5 -f ref.fa example.sorted.bam | bcftools call -m --gvcf 5 -o example.vcf

I want to keep the calls with read depth more than 5. But the gvcf shows some calls with DP lower than 5. I am not sure why is this error happening. Thank you!

ADD REPLY • link 4.5 years ago by evelyn ▴ 230