Comparing methods to genotype
2
0
Entering edit mode
9.1 years ago
Jautis ▴ 560

Hi, I have 2 vcf files produced through different genotyping pipelines (1 of which we believe is reliable and the other experimental). I used GenotypeConcordance in gATK which returns the number of mismatches, but now I'd like to know which sites those are.

Does anybody have any suggestions?

RNA-Seq genotyping vcf RRBS • 1.5k views
ADD COMMENT
4
Entering edit mode
9.1 years ago

I wrote a tool to compare two VCFs produced by two methods. https://github.com/lindenb/jvarkit/wiki/VcfCompareCallers

$ java -jar dist-1.128/vcfcomparecallers.jar  Proj1.samtools.vcf.gz  Proj1.varscan.vcf.gz
#Sample unique_to_file_1    unique_to_file_1_snp    unique_to_file_1_indel  unique_to_file_2    unique_to_file_2_snp    unique_to_file_2_indel  both_missing    common_context  common_context_snp  common_context_indel    common_context_discordant_id    called_and_same called_and_same_hom_ref called_and_same_hom_var called_and_same_het called_but_discordant   called_but_discordant_hom1_het2 called_but_discordant_het1_hom2 called_but_discordant_hom1_hom2 called_but_discordant_het1_het2 called_but_discordant_others
B00G5XG 43739   15531   27518   0   10773   11730   2182    558753  535010  22508   55052   1043356 0   26920   41136   3047    698 1993    152 204 0
B00G74M 43629   15445   27503   0   10739   11747   2346    558716  534939  22526   55092   1043355 0   27962   40295   2910    742 1823    164 181 0
B00G5XF 43542   15344   27515   0   10742   11691   2185    559017  535236  22533   55089   1044311 0   26842   40961   2960    809 1821    157 173 0
B00G74L 43705   15461   27543   0   10765   11745   2356    558606  534872  22509   55053   1041955 0   26849   42430   2989    725 1904    175 185 0
B00G5XE 43589   15393   27515   0   10764   11708   2425    558691  534970  22481   55052   1042648 0   27088   41698   2974    746 1906    152 170 0
ADD COMMENT
2
Entering edit mode
9.1 years ago

You might just use bedtools to do this. A simple bedtools intersect with the -v option. Make sure to use the -header and -sorted options as well. This will probably count different SNPs at the same site as being the same, but (A) that'll be rare within the same sample and (B) those would have to be low quality to begin with.

ADD COMMENT

Login before adding your answer.

Traffic: 2601 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6