GWAS analysis using PLINK
1
0
Entering edit mode
9.3 years ago
ngsgene ▴ 380

I have a GenomeStudio genotype file with missing genotypes denoted by -

Using this file I generated, for each chromosome the map, fam and lgen files and using the --recode option in plink converted them to ped format. To overcome the plink Error: Locus has >2 alleles I used the --missing-genotype option with the -

After ped files for each chromosome were successfully generated, there are a couple issues am facing:

My lgen file corresponds to the map file - but after recode the ped file has way more columns than the rows. I excpect the number of columns to be rows x 2 (both alleles) that of the map file.

When I try to merge all the chromosomes for evaluating summary statistics the - in the data doesn't seem to be excluded and continue to give errors.

Would converting all the - to 0 is the solution here? Am trying to understand how to exclude such data and best practices.

Thanks for any suggestions/feedback.

plink gwas merge missing-genotype • 4.7k views
ADD COMMENT
1
Entering edit mode
9.3 years ago
  1. You probably want to use both --missing-genotype - and --output-missing-genotype 0 during your conversion; this tells PLINK that the input fileset uses -, but you want the output fileset to use 0 so you don't have more headaches down the line.
  2. Can you explain what you mean by the "ped file has way more columns than [you expected]"? How many columns does it have? How many rows does the map file have?
  3. Is there any particular reason you are converting to .ped/.map instead of PLINK's preferred .bed/.bim/.fam format?
ADD COMMENT
0
Entering edit mode

Thanks for your response chrchang523, will give --output-missing-genotype 0 a try to get the format working.

The map files have various number of rows, pertaining to the number of SNPs in each chromosome, for example I have ~180000 for chr1, so I expect the ped file to have 180000 * 2 columns.

The only reason for .ped is to be able to see what data am generating, aim is to work with .bed/.bim format once the file formatting is taken care of

ADD REPLY
0
Entering edit mode

How many columns does the .ped actually have?

You might want to try converting to .tped/.tfam (--recode --transpose) instead, that text format might be easier to read (and it's definitely more convenient for PLINK to work with).

ADD REPLY
0
Entering edit mode

The --output-missing-genotype 0 option has helped replace all - to 0. But in either case the --merge option (using this to merge data from all chr) still reports an ERROR: Problem with MAP file line: There doesn't seem to be a way for me to track down which snp in particular is giving the issue as its reporting the first 6 columns for sample identifier and genotype info from the lgen file.

The .ped file now has ~180000 * 2 + 6 columns so that seems to have been correctly generated. Thanks for tip on transpose, are there other pros transposing the data - or this a preferred file format? Plan to impute this using 1000 Genomes, none of the info on Shapeit/Impute2 has suggested a .tped file yet - but please let me know if you have experience with that.

ERROR: Problem with MAP file line:
0 ###-# 0 0 1 -9 G G A A A A C C C C A G A A C C C C G G A G C T T C A A C C G G A A T T A A C T C C A G G G C C C T T C T T T T T T A A C T C C C C G G G G G A T C C C C T A G G G C C A G G G A A A A G G A A T T T T T T G G A A C C C C C C G G G G A A C
ADD REPLY
0
Entering edit mode

The "problematic MAP file line" is a properly formatted .ped file line. Try swapping the order of the arguments you're passing to --merge.

.tped files have fewer columns than .ped files, so I find them easier to work with in a text editor. If you're using --merge, though, .ped/.map lets you avoid an extra conversion step.

ADD REPLY
0
Entering edit mode

Thanks chrchang523! I am able to merge the files successfully, seems the order of .map .ped in the file list was causing the issue. Take home msg: the order of the file list to be merged should be .ped .map / .bed .bim .fam

ADD REPLY

Login before adding your answer.

Traffic: 1884 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6