Association Tests Using Impute2 Output
2
2
Entering edit mode
10.7 years ago

I have .gprobs, .metrics and .sample output file from IMPUTE2 and am trying to run association test using PLINK. I have uploaded first 5 lines of chromosome 12 here:

I have tried to run --dosage analysis using the .gprobs file and .sample to do the association test on a chromosome level.

But I am getting several warnings for:

- "Duplicate individual found"

and error:

- ERROR: Badly aligned columns for: SNP A1 A2

I have also tried to convert .gprobs and .sample to native ped and fam using gtools and tried the association test using PLINK but the output files also did not worked with --assoc command. I am wondering if there any file conversion required before taking IMPUTE2 output to PLINK or Do you recommend any other tool for association testing using IMPUTE2 output ?

PS. I have tried to ask this question(s) to IMPUTE2 mailing list, but they haven't approved me even after 24 hours after confirming my email.

imputation gwas plink • 11k views
ADD COMMENT
4
Entering edit mode
10.7 years ago
zx8754 11k

Following usually works for me:

Make a map file from Chr12_head_5.gprobs file (Note: 12 is in this case represents chr12):

awk '{print 12,$2,0,$3}' Chr12_head_5.gprobs > Chr12_head_5.map

Make a fam file from Chr12_head_5.sample file: remove 2 top rows, add fam file columns.

awk 'NR>2 {print $1,$2,0,0,$4,$5}' Chr12_head_5.sample > Chr12_head_5.fam

As there are 3 samples, I cut the gprobs file for those samples (Columns: chr,snp,bp,a1,a2, then 3 columns per individual representing AA, AB, BB for each snp):

--- 12-60076 60076 A C 0.603 0.346 0.050 0.171 0.506 0.323 0.248 0.659 0.094
--- 12-60252 60252 A G 0.989 0.011 0 0.935 0.065 0 0.898 0.101 0
--- 12-60317 60317 C T 0.998 0.002 0 1 0 0 0.991 0.009 0
--- 12-60474 60474 G A 0.987 0.013 0 0.923 0.076 0 0.848 0.149 0.003
--- 12-60628 60628 T C 0.996 0.004 0 1 0 0 0.985 0.015 0

Then run plink command:

plink \
--noweb \
--dosage Chr12_head_5.gprobs \
format=3 skip0=1 skip1=1 noheader \
--map Chr12_head_5.map \
--fam Chr12_head_5.fam \
--out Chr12_head_5

I run this commands to get "rough" associations, as --dosage doesn't accept --covar, --within options to correct for covariates and stratas. I then convert it to MACH (see Conversion of ped/map or bim/bim/ fam files to dosage for GWAs mit Probable and comparison with imputated genotypes) and run associations using R.

Regarding errors, first one means FID and IID is not unique, log file should show the duplicated individuals, second one is probably badly formatted headers on map file.

SNPTEST is supposed to work with IMPUTE output "seamlessly", but from my experience it doesn't and I avoid it.

ADD COMMENT
0
Entering edit mode

Thanks zx8754 for a great answer. Do you know whether the .gen format that you mentioned and .gprobs that I have are the same ?

ADD REPLY
1
Entering edit mode

Not sure how .gprobs file looks like, but added how my .gen files look from IMPUTE2 output.

ADD REPLY
1
Entering edit mode

This a first line from the file. I think its the same format.

--- 16-60180 60180 G C 1 0 0 0.894 0.106 0 0.993 0.007 0 1 0 0 1 0 0 1 ...
ADD REPLY
1
Entering edit mode

Google tells me that .gprobs is a BEAGLE output?

ADD REPLY
0
Entering edit mode

True. It's a native BEAGLE format. I have downloaded this dataset from dbGAP, as per phenotype description the data is imputed using IMPUTE2, but the output file extension is given as "chromosome-specific genotype probabilities files".

ADD REPLY
1
Entering edit mode

From your dropbox files, I created map and fam files and cut the gprobs file for 3 samples (as there were 3 samples in the .sample file), and --dosage did work.

ADD REPLY
0
Entering edit mode

That's great ! Can you please add that part also to your answer ?

ADD REPLY
1
Entering edit mode

Answer is updated, according to data provided.

ADD REPLY
1
Entering edit mode

That's great ! In the meantime, I was able to run SNPTEST2 on my data-sets seamlessly - I will post a detailed reply here so that biostars with IMPUTE2 data could try both way.

ADD REPLY
1
Entering edit mode
10.0 years ago
Kantale ▴ 140

I know that you said that you want to perform association analysis with plink, but I would recommend to try SNPTEST. The reason is that since you did the imputation with IMPUTE2, SNPTEST can process (relatively) nicely IMPUTE2 output files. You can use QCTOOL to convert IMPUTE2 output to SNPTEST input. Check this: https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html#input_file_formats

ADD COMMENT

Login before adding your answer.

Traffic: 2287 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6