Id And Ref Are Empty
2
0
Entering edit mode
12.8 years ago
Zhshqzyc ▴ 520

Hi I run a samtools command to get a vcf file to find snp in a given area. But the result in vcf file doesn't make sense to me.

My command:

samtools mpileup -C50 -r chr21:start-end -Buf ref.fa 1.bam 2.bam 3.bam 4.bam 5.bam 6.bam 7.bam 8.bam 9.bam 10.bam 11.bam 12.bam 13.bam 14.bam | bcftools view -bvcg - > var.raw.bcf
bcftools view var.raw.bcf  > var.flt.vcf

The last line in vcf is:

#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    1    2    3    4    5    5    7    8    9    10    11    12    13    14
chr21    41801273    .    A    T    39.4    .    DP=26;AF1=0.195;CI95=0.07143,0.3929;DP4=9,5,2,1;MQ=49;FQ=40.3;PV4=1,1,1.7e-12,1    GT:PL:GQ    0/0:0,0,0:5    0/0:0,12,101:16    0/1:27,3,0:7    0/1:65,6,0:5    0/0:0,6,59:10    0/0:0,9,71:13    0/0:0,0,0:5    0/0:0,3,29:7    0/0:0,6,51:10    0/0:0,6,53:10    0/0:0,0,0:5    0/0:0,0,0:5    0/0:0,0,0:5    0/0:0,0,0:5

Why ID column and REF column are nothing, I don't understand it. Anything wrong in command or data preparation? Thanks.

samtools vcf • 3.1k views
ADD COMMENT
0
Entering edit mode

samtools is notorious for ignoring additional files - i don't think anything but 1.bam is being processed here

ADD REPLY
1
Entering edit mode
12.8 years ago

in your example, REF is not empy. is an 'A'.

Samtools doesn't annotate the VCF file, that is too say that it will not scan a database to find the known SNPs at a given location.

To annotate a VCF, you could have a look at the ensembl variant effect predictor: http://www.ensembl.org/Homo_sapiens/UserData/UploadVariations

ADD COMMENT
0
Entering edit mode
12.8 years ago
Swbarnes2 ★ 1.6k

Remember that vcf is a general format, and that samtools isn't the only way of making a vcf file. So there are lots of things that might go into a vcf file, but samtools doesn't necessarily know enough to put all that information in there.

In theory, you could sequence a bunch of humans, and find known SNPs with rs numbers, and those rs numbers would be appropriate to put in the ID column, but I don't think it's possible to show samtools a list of rs numbers, and expect it to put those in there. You'd either do that yourself, or find some software that would do it for you.

Its like the binary flags in your .bam file. Just because the binary flag can be set to indicate poor QC, or a duplicated read, doesn't mean that the software you used actually calculated that.

ADD COMMENT

Login before adding your answer.

Traffic: 1450 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6