1000g database with wrong variant annotation?
1
1
Entering edit mode
7.0 years ago
agata88 ▴ 870

Hi all,

I wanted to validate my pipeline for SNVs discovery. To do that I've downloaded exome fastq and vcf annotation files for each chromosome from http://www.internationalgenome.org/data-portal/sample/HG00119.

At the end I wanted to compare snps and indels in both results only for exonic sequences downloaded from UCSC database. To my surprise there are a lot of snps annotated by 1000g that I don't see in IGV at all! (and they are not annotated by my pipeline).

In addition I've downloaded BAM files for analysed sample from 1000g database, the same page http://www.internationalgenome.org/data-portal/sample/HG00119. And the same here - a lot of annotated variants not seen in Bam file.

I am very confused... is there other way I can validate my pipeline? Also would you recommend to write an email with this problem to 1000g?

Validating to wrong annotations is useless. Maybe there is a reference that is well checked and I can use it for validation of pipeline?

PS. I've selected vcf only for one sample so I am sure that those annotations are related to that patient. Thanks in advance,

Agata

1000g • 2.3k views
ADD COMMENT
1
Entering edit mode

is there other way I can validate my pipeline? [...] Maybe there is a reference that is well checked and I can use it for validation of pipeline?

My whole life changed after reading this paper: https://www.ncbi.nlm.nih.gov/pubmed/27535533

ADD REPLY
0
Entering edit mode

Thanks, I will definitely read that, Best, Agata

ADD REPLY
0
Entering edit mode

I strongly suggest you to do so and to try to understand every word (it took me weeks). There's a lot of knowledge in there.

ADD REPLY
0
Entering edit mode

Hi agata88,

Why was this thread deleted?
Not really nice after you have received a helpful suggestion...

Cheers, Wouter

ADD REPLY
0
Entering edit mode

Sorry, I've made that by mistake... thanks for bringing it back :)

ADD REPLY
0
Entering edit mode

I see that I have a lot of snp and indel with AC=0. Which means that it has no different allele but still is annotated in vcf file... that is why I've got confused. When I filtered it looks like everything is good. Now I'm feeling embarrassed because I missed it during analysis...still I hope that this post will help somebody with similar problem in the future :), Best, Agata

ADD REPLY
0
Entering edit mode

Did you filter the output of samtools mpileup without doing bcftools call (just a guess)? When you call, only the positions that actually have something to say are kept.

ADD REPLY
0
Entering edit mode

Yes, I know. I am actually using VarScan for variant detection :) Thanks

ADD REPLY
0
Entering edit mode
6.9 years ago
agata88 ▴ 870

Hi all!

After few days of digging unfortunately I need to say that annotations from 1000g database for patient HG00119 are wrong. There are a lot of snp and indels that don't appear in bam file also stored in this database. I think I will report that to database coordinators.

So, since my work can be put into the trash... can anyone suggest a database where I can find a patient SNVs with raw fastq files available?

Best,

Agata

ADD COMMENT
1
Entering edit mode

You can give it a try to ICGC or TCGA ? ( I see that you did not specify a sample type ). You just need to apply for the data. It contains bams, vcfs etc.

ADD REPLY
0
Entering edit mode

Thanks, I'll try that.

ADD REPLY

Login before adding your answer.

Traffic: 2021 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6