Question

"Unrecognized token: C" error when using emmax-kin to make kinship matrix for GWAS

0

Entering edit mode

6.2 years ago

michael.nagle ▴ 100

I first obtained .tped and .tfam files from a .vcf genotype file for our GWAS population, using PLINK. I'm now trying to use the .tped, .tfam files to make a kinship matrix with EMMAX.

For some reason, I'm getting this error, which I'm not familiar with and I can't find any relevant discussion about this online.

Input: emmax-kin -v -s -d 10 [prefix for input .tped and .tfam]

The input files (obtained via PLINK) appear to be consistent with how .tped and .tfam files are supposed to look: https://www.cog-genomics.org/plink2/formats

Bottom 5 rows, first 12 columns of input .tped file:
scaffold_338 . 0 19212 0 0 0 0 0 0 0 0
scaffold_338 . 0 19274 0 0 0 0 0 0 0 0
scaffold_338 . 0 19312 0 0 0 0 0 0 0 0
scaffold_338 . 0 19426 0 0 0 0 0 0 T T
scaffold_338 . 0 19428 0 0 0 0 0 0 C C

Bottom 5 rows of input .tfam file:
852 1015268 0 0 0 -9
852 1015271 0 0 0 -9
852 1015274 0 0 0 -9
852 1015277 0 0 0 -9
852 1015280 0 0 0 -9

Output:
Reading TFAM file [my input file prefix].tfam ....
Reading TPED file [my input file prefix].tped ....
Unrecognized token C

Desired output: A .kinf file (kinship matrix)

I'm at a loss of how to address this problem, so help is greatly appreciated. Thanks for your time and help.

GWAS emmax kinship genomics • 3.1k views

ADD COMMENT • link 6.2 years ago by michael.nagle ▴ 100

1

Entering edit mode

No experience with emmax-kin, but the source code pasted below suggests it might expect genotypes encoded as 0,1,2, but it encountered the the letter base code 'C' in your TPED. Is that making any sense?

== emmax-kin.c lines 430-

// if zero_miss_flag is set, assume the genotypes are encoded 0,1,2
    // Additively encodes the two genotypes in the following way
    // when (j-nheadercols) is even, 0->MISSING, add 1->0, 2->1
    // when (j-nheadercols) is odd, check 0-0 consistency, and add 1->0, 2->1
    else {
      ctoken = (unsigned char)(token[0]-'0');

      if ( ctoken > 2 ) {
        fprintf(stderr,"Unrecognized token %s\n",token);
        abort();
      }

== end code

ADD REPLY • link 6.2 years ago by Ahill ★ 1.9k

0

Entering edit mode

I've looked at this part of the source code alone and in the broader context, and don't understand why it would want genotypes encoded as 0, 1 or 2 (or how this is possible) when a .tped file has G/A/T/K/0 for each.

Hope somebody can clarify...

ADD REPLY • link 6.2 years ago by michael.nagle ▴ 100

2

Entering edit mode

When generating your .tped, did you use the PLINK --recode12 --output-missing-genotype 0 options? I'm going from the EMMAX web page:

https://genome.sph.umich.edu/wiki/EMMAX#Preparing_Input_Genotype_Files http://zzz.bwh.harvard.edu/plink/dataman.shtml#recode

--recode12 will recode the alleles as 1 and 2.

ADD REPLY • link 6.2 years ago by Ahill ★ 1.9k

0

Entering edit mode

This solved the problem. Thank you!

ADD REPLY • link 6.1 years ago by michael.nagle ▴ 100