Extract Samples With Specific Rsid And Genotype Using Plink Or Similar Tools
1
3
Entering edit mode
11.6 years ago

I have a PLINK formatted database (bed/bim/fam files) and corresponding recoded files (hh/ped/map). I am looking for an effective way to extract samples from this database with specific genotypes. I have looked through the PLINK manual and found that I can extract set of samples using "--keep" parameter and extract set of genotypes using "--extract", am wondering if this can be done in a single step using another parameter or tool.

My input is a list of rsIDs and genotypes; I need to get sample ids and genotype as output. INPUT

rs1800562 AA

OUTPUT

Sample1 AA
Sample5 AA
Sample22 AA
...

Is there any option in PLINK to do this or I should use unix 'grep' and/or a custom script to extract data. Suggestions on other computational genomics tools to do similar task is also welcome.

plink genomics genotyping gwas • 7.1k views
ADD COMMENT
4
Entering edit mode
11.6 years ago
Stephen 2.8k

You can combine both --keep and --extract in a single step, but you're wanting to condition your --keep based on the genotypes you get from your --extract, which PLINK can't do to my knowledge. If you want a single ped file for each snp you could do something like

awk '{print $1}' INPUT > mysnps
plink --bfile data --extract mysnps --tfile mysnps
(some code here to loop through each line of mysnps.tped and pulling out column index when your genotype matches, and write out a list of samples for each snp)
(some code here to run plink --keep for each list of samples)

... but you probably already knew this, and just need an implementation. Sorry this wasn't much help.

ADD COMMENT
0
Entering edit mode

Thanks a lot Stephen. I worked out a solution based on your suggestion - tped was the hat-tip :). Please see if you can add this as an answer for future reference.

ADD REPLY

Login before adding your answer.

Traffic: 2773 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6