Beagle imputation results quality control
1
1
Entering edit mode
8.8 years ago
eyb ▴ 250

My region of interest is ~ 120 kb. I have 300 samples, all containing 14 SNPs in the region. I tried to impute using beagle to get more SNPs. I used CEU population (the closest one) as a reference. After filtering CEU samples I got about 350 SNPs per sample to use a reference.

After imputation I got the same number of SNPs in my population as in CEU. Is it legit to use them all? I have a gut feeling that I have to filter them according to the imputation quality or something. How do I do that? The output VCF looks something like this:

1       110187031       rs113581509     C       T       .       PASS    AR2=0.468;DR2=0.514;AF=0.06     GT:DS:GP        0|1:0.759:0.242,0.758,0 0|0:0.018:0.982,0.018,0 0|0:0.018:0.982,0.018,0 0|0:0.001:0.999,0.001,0

Can anyone give me a clue on how to filter the results? Or maybe I should use another software?

imputation beagle vcf • 7.7k views
ADD COMMENT
0
Entering edit mode

Hi eyb,

I know it's been several years ago, but right now I'm facing the same problem that you had in that moment: I've just achieved to impute my data with Beagle, but now I would like to know how to filter out the bad quality SNPs.

I suspect that it is related with the DR2 field, but I'm not quite sure about it... Did you finally resolve your problem?? Thank you very much in advanced!

ADD REPLY
4
Entering edit mode
5.1 years ago

There is no consensus on how best to filter the post-imputation results. You can use a combination of AR2 (allelic R-squared), DR2 (dosage R-squared), and MAF. Take a look at this pre-print and subsequent publication, where they actually did not do any filtering post-imputation:

Very low depth whole genome sequencing in complex trait association studies (peer reviewed publication).

Variant-level QC

Beagle provides two position level imputation metrics, allelic R-squared and dosage R-squared. Both measures are highly correlated (Supplementary Fig. S8a). Values between 0.3 and 0.8 are typically used for filtering (Brian Browning, personal communication).

Kevin

ADD COMMENT
1
Entering edit mode

Great, thanks, I'll have that in mind!

ADD REPLY
0
Entering edit mode

Update: Beagle 5.0 only have DR2.

ADD REPLY
0
Entering edit mode

Hi Kevin,

java -Djava.io.tmpdir=./temp/ -Xmx32g -jar beagle.16May19.351.jar impute=false gt=Exome.vcf out=Exome.vcf.phasing

Like above example. Suppose, we don't use map and reference parameters, what's the accuracy of the phasing for beagle?

Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 2701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6