Biostar Beta. Not for public use.
getting the HOM_REF genotypes using bcftools mpileup+call ?
1
Entering edit mode
14 months ago
France/Nantes/Institut du Thorax - INSE…

I've been asked to find the status of some variants (= overlaping a bed file) in a '1000 genomes' data set, including the HOM_REF genotypes (0/0).

For now I'm downloading each bam from 1000G and I call bcftools, (on sample per bam) , I played with --gvcf but I cannot find the correct way to get the 0/0 genotypes.

here a snippet of my current nextflow (it doesn't work: there is no '0/0' in my vcf)

${samtools} view -u -b "${sample}.bam" "${C}" |\
${bcftools} mpileup --output-type b --targets "${C}.bed" --gvcf 0 -a 'FORMAT/DP'  -a 'FORMAT/AD'  -o "tmp.bcf" --fasta-ref "${REF}" -

${bcftools} index  -f "tmp.bcf"

${bcftools} call --output-type z --ploidy GRCh37 --regions-file "${C}.bed" -o "tmp.${C}.vcf.gz" --gvcf 0 --multiallelic-caller "tmp.bcf"

${bcftools} index -t -f "tmp.${C}.vcf.gz"
ADD COMMENTlink
1
Entering edit mode

Hey Pierre,

is there a reason why you use samtools view at the beginning? Also saving the mpileup result and indexing it is usually unnecessary.

Until now I wasn't able to reproduce your problem. I was able to get 0/0 genotype with the --gvcf option and without. Then you get a record for each covered position.

I've tried the long way like you did. And a short one (see below). The only difference to your code is, that I haven't use a bedfile for the region.

$ bcftools mpileup -Ou --gvcf 0 -f Homo_sapiens.GRCh37.dna.primary_assembly.fa -r 1:977320-977340 ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/high_coverage_alignment/HG00096.wgs.ILLUMINA.bwa.GBR.high_cov_pcr_free.20140203.bam | bcftools call -m --gvcf 0 -Oz -o 1kg.vcf.gz

Result is:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  HG00096
1   977320  .   C   .   .   .   END=977329;MinDP=14 GT:DP   0/0:14
1   977330  .   T   C   214 .   DP=28;VDB=0.652362;SGB=-0.688148;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,11,4;MQ=60 GT:PL:DP    1/1:244,45,0:15
1   977331  .   C   .   .   .   END=977340;MinDP=15 GT:DP   0/0:15
ADD REPLYlink
0
Entering edit mode

is there a reason why you use samtools view at the beginning? Also saving the mpileup result and indexing it is usually unnecessary.

splitting the analysis by chromosome. It's not clear to me how to do this: When using bcftools --gvcf I think one need to use the streaming filter to get the BED (NOT random-access)

I've tried the long way like you did. And a short one (see below). The only difference to your code is, that I haven't use a bedfile for the region.

Thanks ! I'm away from the lab, I'll try tomorrow. About the BED: I'd like to avoid to extract the variants for the whole BAM...

ADD REPLYlink
0
Entering edit mode

If I understand your code correct, you have a bed file for each chromosome? If so, you can omit the samtools view and use the --regions-file parameter for bcftools mpileup instead of --targets. bcftools have then random access to the regions.

If you want to query remote files (like I did in my test example) it might be useful to use samtools view to speed up things. Then apply the -L parameter along with your bed file and use -M to use the multi-region iterator.

But that's all about optimization and not resolving your problem ...

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1