How can I get Allele Depth of my vcf file ?
1
2
Entering edit mode
5.0 years ago
apl00028 ▴ 90

I am attempting to get the Allele Depth (AD) from my vcf file.

I got this file from bcftools file using the function:

bcftools view -c -g scaffold_filt_PV003_q10_variant.bcf > scaffold_filt_PV003_q10_variant.vcf

but when I see the vcf generated in the column called format does not appear AD, only appear GT:PL:GQ I do not another alternative way to see Allele Depth.

Thanks in advance

SNP bcftools vcf • 6.6k views
ADD COMMENT
0
Entering edit mode

how did you call the original bcf file ?

ADD REPLY
0
Entering edit mode

I called it scaffold_filt_PV003_q10_variant.bcf, is it answer your question?

ADD REPLY
0
Entering edit mode

what is the tool used to call the variants ? samtools ? bcftools ?gatk ? what was the command line ?

ADD REPLY
0
Entering edit mode

He means: how did you perform the variant calling? - GATK?; SAMtools?

ADD REPLY
0
Entering edit mode

I used samtools to get that file, the command line is

bcftools view -c -g input.bcf > output.vcf
ADD REPLY
0
Entering edit mode

the question was:

how did you perform the variant calling

and not

how to you convert a vcf to bcf

ADD REPLY
0
Entering edit mode

begining with a input file:

samtools mpileup -g -f reference.txt input.bam > output.bcf

I got the bcf then I convert a bcf to vcf with this command:

bcftools view input.bcf > output.vcf

And the output.vcf has not a record of Allele Depth. Is there another alternative way to get it? How?

ADD REPLY
5
Entering edit mode
5.0 years ago

Try this:

bcftools mpileup \
    --redo-BAQ \
    --min-BQ 30 \
    --per-sample-mF \
    --annotate FORMAT/AD,FORMAT/ADF,FORMAT/ADR,FORMAT/DP,FORMAT/SP,INFO/AD,INFO/ADF,INFO/ADR \
    -f reference.txt \
    input.bam | \
  bcftools call \
    --multiallelic-caller \
    --variants-only \
    -Ov \
  > output.vcf ;

I am not sure why your reference is TXT format - usually it is FASTA (and indexed with samtools faidx). Also, why the -g option?

ADD COMMENT
0
Entering edit mode

well, I have follow your code:

      samtools faidx cmv3.fasta    to index the fasta file 

       bcftools mpileup --redo-BAQ --min-BQ 30 --per-sample-mF --annotate FORMAT/AD,FORMAT/ADF,FORMAT/ADR,FORMAT/DP,FORMAT/SP,INFO/AD,INFO/ADF,INFO/ADR -f cmv3.fasta scaffold_nofilt_PV158.bam bcftools call --multiallelic-caller --variants-only -Ov > scaffold_nofilt_PV158_q10_variant.vcf   to create the vcf file

But I get only this message: [main] Unrecognized command.

What did I do wrong? Thanks

ADD REPLY
1
Entering edit mode

your version of bcftools is just too old.

ADD REPLY
0
Entering edit mode

Okey, I updated the bcftools version and I got another issue:

   [E::fai_build_core] Format error, unexpected "g" at line 1

What is that mean? Thanks in advantage

ADD REPLY
0
Entering edit mode

You are missing the pipe symbol (|) just before bcftools call; however, this is my fault because I did not include it in my earlier comment (I have now added it).

ADD REPLY
1
Entering edit mode

GREAT! IT WORKS! THANKS!

ADD REPLY
0
Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

Can I see those SNP with lower allele frequency? I have seen that vcf file records those SNP with higher frequency.

ADD REPLY
0
Entering edit mode

To be clear, to which parameter are you referring? - AF?

ADD REPLY
0
Entering edit mode

No, I mean Coveage allele-fraction https://bioinformatics-core-shared-training.github.io/intro-to-IGV/InspectingVariantsInIGV.html when I change the threshold to 0.05 in IGV I see more SNP that in VCF file I can not see. How can I change the bcftools paramethers to see them? Thanks in advantage.

ADD REPLY
0
Entering edit mode

If you are expecting variants at those low allele fractions and these are supposed to be germline variants, then that alludes to some problem with the sequencing run. If you wish to try to call these using BCFtools, then add the following parameter to bcftools call:

--pval-threshold 1.0

Note that this will result in a lot of junk being called, too.

BCFtools mpileup is excellent at calling germline variants that are expected at frequencies of 50% or 100%. Something else like GATK may better if you are expecting other types of variants.

ADD REPLY
0
Entering edit mode

I get the same number of records using --pval-threshold 1.0 and whitout it, is there another way?

Thanks in advantage.

ADD REPLY
0
Entering edit mode

so, most probably these SNPs are sequecing errors, are not them?

ADD REPLY
0
Entering edit mode

Without seeing all information in your output VCF and having access to your BAM files, I cannot really comment further - sorry. What other information is in your VCF? - take a look at the QUAL score, for example. Also, how were your BAM files produced?

ADD REPLY
0
Entering edit mode

My BAM files were produced following these steps:

Firstly I did an alignment using bowtie2

  bowtie2 -f -x /cmv3.fasta /scaffold_nofilt_PV002.fa -S  /scaffold_nofilt_PV002.sam

After that I converted sam file in bam file

 samtools view -bS /scaffold_nofilt_PV002.sam > /scaffold_nofilt_PV002.bam

Then, I removed no mapped reads:

 samtools view -F 0x04 -b /scaffold_nofilt_PV003.bam >/scaffold_nofilt_PV003.bam

and I select only those data with a map quality of 10

 samtools view -q 10 -b /scaffold_nofilt_PV003.bam > /scaffold_nofilt_PV003.bam

Finally I sort my bam file and I index them:

 samtools sort /scaffold_nofilt_PV003.bam /scaffold_nofilt_PV003
 samtools index /scaffold_nofilt_PV003.bam
ADD REPLY
0
Entering edit mode

I am not sure that the removing of un-mapped reads is necessary, as these will not be included in the variant calling stage. I don't see any major issue by removing reads with MAPQ < 10.

The results can perhaps be explained by the experimental set-up: From where are the reads derived?; are you aligning reads to cytomegalovirus?

ADD REPLY
0
Entering edit mode

My results come from wild plant species of different host, I want to see the variation among cucumber mosaic virus RNA3 of differnt hosts or same host and different habitats.

ADD REPLY
0
Entering edit mode

and my bcftools is the 1.9 the latest

ADD REPLY

Login before adding your answer.

Traffic: 2594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6