Plink - Non Human Data
2
2
Entering edit mode
12.4 years ago

I have read that PLINK supports some non human organisms: mouse, dog, ect...

I, however, am not lucky enough to have a "model" organism. Is there anyway to force PLINK to take scaffolds as opposed to chromosomes ( a limited number at that)? I have several hundred.

plink population format • 5.7k views
ADD COMMENT
5
Entering edit mode
12.4 years ago

I wanted to know about Plink's support for different organisms too, but could not find the information in the documentation. Looking at the source code was quicker:

grep -E 'define.+Chromosome' helper.cpp

void defineHorseChromosomes()
void defineSheepChromosomes()
void defineRiceChromosomes()
void defineDogChromosomes()
void defineMouseChromosomes()
void defineCowChromosomes()
void defineHumanChromosomes()

So the answer to your question is no, unless you want to write some C++ to extend Plink. Looking into these functions, I found code for setting up the chromsomes in a 'par' (options) object.

To find where these are called from:

grep defineDogChromosomes *

plink.cpp:  if (par::species_dog) defineDogChromosomes();

In plink.cpp:

if (par::species_dog) defineDogChromosomes();
  else if (par::species_sheep) defineSheepChromosomes();
  else if (par::species_cow) defineCowChromosomes();
  else if (par::species_horse) defineHorseChromosomes();
  else if (par::species_rice) defineRiceChromosomes();
  else if (par::species_mouse) defineMouseChromosomes();
  else defineHumanChromosomes();

You may be able to get away with writing one extra function, but I would also look at the code for the specific Plink analysis functions to see how your new scaffold definitions would be used.

ADD COMMENT
0
Entering edit mode

Very pertinent point made at the end. The dog genome has much longer stretches of high LD/haplotypes than human, for example. Such would need to be taken into account.

ADD REPLY
1
Entering edit mode
12.4 years ago
Caddymob ▴ 1000

I've run across this problem too...

My solution was to simply code all the chromosomes as 1 and positions as 1,2,3,4, etc. just to get through PLINK. Since the SNP IDs should be unique and thus map to a specific chromosome/scaffold/contig and position, you can create a lookup table (I do this in R) and map the SNPs back to their correct coordinate.

Unfortunately this will kill any LD type stuff in PLINK, but if you are only doing single SNP analyses, this works for me.

ADD COMMENT
0
Entering edit mode

I also took this approach although I was bummed because like you mentioned it destroyed subsequent analyses.

ADD REPLY
0
Entering edit mode

I was thinking about this too. If say LD analyses were what you were after, what about binning your contigs into 22 chromosomes, but within each chromosome, separate each by a big amount, say 5MB, so you don't get spurious LD results. I haven't tested this but perhaps it would work. I don'think PLINK has any hard limits on chromosome length, so even if you are 1e10 bp in pseudo chr length this might work. Just an idea...

ADD REPLY

Login before adding your answer.

Traffic: 2609 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6