Biostar Test Site

This is site is used for testing only. Visit: https://www.biostars.org to ask a question.

How can I get Allele Depth of my vcf file ?
1
0
Entering edit mode
2.1 years ago
apl00028 ▴ 70

I am attempting to get the Allele Depth (AD) from my vcf file.

I got this file from bcftools file using the function:

bcftools view -c -g scaffold_filt_PV003_q10_variant.bcf > scaffold_filt_PV003_q10_variant.vcf


but when I see the vcf generated in the column called format does not appear AD, only appear GT:PL:GQ I do not another alternative way to see Allele Depth.

SNP bcftools vcf • 2.5k views
0
Entering edit mode

how did you call the original bcf file ?

0
Entering edit mode

0
Entering edit mode

what is the tool used to call the variants ? samtools ? bcftools ?gatk ? what was the command line ?

0
Entering edit mode

He means: how did you perform the variant calling? - GATK?; SAMtools?

0
Entering edit mode

I used samtools to get that file, the command line is

bcftools view -c -g input.bcf > output.vcf

0
Entering edit mode

the question was:

how did you perform the variant calling

and not

how to you convert a vcf to bcf

0
Entering edit mode

begining with a input file:

samtools mpileup -g -f reference.txt input.bam > output.bcf


I got the bcf then I convert a bcf to vcf with this command:

bcftools view input.bcf > output.vcf


And the output.vcf has not a record of Allele Depth. Is there another alternative way to get it? How?

4
Entering edit mode
2.1 years ago

Try this:

bcftools mpileup \
--redo-BAQ \
--min-BQ 30 \
--per-sample-mF \
-f reference.txt \
input.bam | \
bcftools call \
--multiallelic-caller \
--variants-only \
-Ov \
> output.vcf ;


I am not sure why your reference is TXT format - usually it is FASTA (and indexed with samtools faidx). Also, why the -g option?

0
Entering edit mode

      samtools faidx cmv3.fasta    to index the fasta file



But I get only this message: [main] Unrecognized command.

What did I do wrong? Thanks

1
Entering edit mode

your version of bcftools is just too old.

0
Entering edit mode

Okey, I updated the bcftools version and I got another issue:

   [E::fai_build_core] Format error, unexpected "g" at line 1


What is that mean? Thanks in advantage

0
Entering edit mode

You are missing the pipe symbol (|) just before bcftools call; however, this is my fault because I did not include it in my earlier comment (I have now added it).

1
Entering edit mode

GREAT! IT WORKS! THANKS!

0
Entering edit mode

0
Entering edit mode

Can I see those SNP with lower allele frequency? I have seen that vcf file records those SNP with higher frequency.

0
Entering edit mode

To be clear, to which parameter are you referring? - AF?

0
Entering edit mode

No, I mean Coveage allele-fraction https://bioinformatics-core-shared-training.github.io/intro-to-IGV/InspectingVariantsInIGV.html when I change the threshold to 0.05 in IGV I see more SNP that in VCF file I can not see. How can I change the bcftools paramethers to see them? Thanks in advantage.

0
Entering edit mode

If you are expecting variants at those low allele fractions and these are supposed to be germline variants, then that alludes to some problem with the sequencing run. If you wish to try to call these using BCFtools, then add the following parameter to bcftools call:

--pval-threshold 1.0


Note that this will result in a lot of junk being called, too.

BCFtools mpileup is excellent at calling germline variants that are expected at frequencies of 50% or 100%. Something else like GATK may better if you are expecting other types of variants.

0
Entering edit mode

I get the same number of records using --pval-threshold 1.0 and whitout it, is there another way?

0
Entering edit mode

so, most probably these SNPs are sequecing errors, are not them?

0
Entering edit mode

Without seeing all information in your output VCF and having access to your BAM files, I cannot really comment further - sorry. What other information is in your VCF? - take a look at the QUAL score, for example. Also, how were your BAM files produced?

0
Entering edit mode

My BAM files were produced following these steps:

# Firstly I did an alignment using bowtie2

  bowtie2 -f -x /cmv3.fasta /scaffold_nofilt_PV002.fa -S  /scaffold_nofilt_PV002.sam


# After that I converted sam file in bam file

 samtools view -bS /scaffold_nofilt_PV002.sam > /scaffold_nofilt_PV002.bam


# Then, I removed no mapped reads:

 samtools view -F 0x04 -b /scaffold_nofilt_PV003.bam >/scaffold_nofilt_PV003.bam


# and I select only those data with a map quality of 10

 samtools view -q 10 -b /scaffold_nofilt_PV003.bam > /scaffold_nofilt_PV003.bam


# Finally I sort my bam file and I index them:

 samtools sort /scaffold_nofilt_PV003.bam /scaffold_nofilt_PV003
samtools index /scaffold_nofilt_PV003.bam

0
Entering edit mode

I am not sure that the removing of un-mapped reads is necessary, as these will not be included in the variant calling stage. I don't see any major issue by removing reads with MAPQ < 10.

The results can perhaps be explained by the experimental set-up: From where are the reads derived?; are you aligning reads to cytomegalovirus?

0
Entering edit mode

My results come from wild plant species of different host, I want to see the variation among cucumber mosaic virus RNA3 of differnt hosts or same host and different habitats.

0
Entering edit mode

and my bcftools is the 1.9 the latest