How to calculate Imputation Accuracy Estimates like concordance with BEAGLE?
2
2
Entering edit mode
7.9 years ago
Shab86 ▴ 310

Hi all,

I have an imputed file output from BEAGLE IMPUTED FILE Rows are SNP's and columns are individuals. Now I would like to calculate some imputation accuracy estimates like concordance or Rsq and then plot them across MAF(minor allele frequency). How do I calculate them? Are there any tools which could generate such statistics?

Any help is highly appreciated.

genome SNP R impute sequencing • 5.6k views
ADD COMMENT
1
Entering edit mode
6.3 years ago
vskale135 ▴ 10

Hello All,

I would also like to determine imputation accuracy in our GBS dataset. Here is what I did:

1) randomly selected 1% SNPs: zcat all.vcf.gz | awk '$1~/^#/ || rand()<=0.01' | bgzip -c > eval.vcf.gz 2) exclude the evaluation sites from the original VCF : bcftools isec -C all.vcf.gz eval.vcf.gz -Oz > impute.vcf.gz 3) imputed the missing data using beagleV4 : java -Xmx100g -jar beagleV4.1.jar gt=impute.vcf.gz out=imputed window=100 overlap=30 niterations=10 when I compard the imputed.vcf.gz and eval.vcf.gz using vcf-compare, I got followign output:

SN Number of REF matches: 0 SN Number of ALT matches: 0 SN Number of REF mismatches: 0 SN Number of ALT mismatches: 0 SN Number of samples in GT comparison: 0

I request you to please help.

Thanking you with best regards

Sandip

ADD COMMENT
0
Entering edit mode

To estimate the quality of imputation, I think, imputed.vcf should be compared with all.vcf.gz and not with eval.vcf.gz :)

ADD REPLY
0
Entering edit mode
7.9 years ago

Three step process.

  1. Drop some % of your genotype calls.

  2. Impute

  3. Measure VCF concordance of original and imputed VCF file.

I've done this. You should play with the % of genotypes you remove and MAF.

ADD COMMENT
0
Entering edit mode

Thanks Zev for your reply. I have already done step 1 where I removed bad quality calls, and then masked the genotyped file which then I used in BEAGLE for imputation. Now I have the imputed file and the original one , and from these files I would like to get those accuracy estimates like concordance. The imputed file is the one I had attached in the original post. Any idea with how to get those estimates?

ADD REPLY
0
Entering edit mode

To test the accuracy you remove high quality variant calls. Not low quality calls.

ADD REPLY
0
Entering edit mode

I have done the steps 1 & 2 and my main query is about no. 3. Are there any tools etc which I can use to get step 3?

ADD REPLY
0
Entering edit mode

You can use vcf-compare or bcftools stats to get stats which you can plot using plot-vcfstats. Can you please let me know, how you performed the step1 and step2. I don't have reference panel.

Thanks and regards

Sandip

ADD REPLY
0
Entering edit mode

I'm trying to do the same thing..just that my data is multi allelic. How do we calculate imputation accuracy for multi allelic data? Ill appreciate any help on that.Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1601 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6