Convert Genotype Matrix Into Plink Format
1
3
Entering edit mode
10.3 years ago

I have a genotype matrix (700 000 rows of SNPs and 2000 columns of samples). It's coded as 0/1/2 or NA. I want to convert this into plink format ped and map files. What's the best way to do this?

Thanks for the help!

It looks like:

         Sample1      Sample2        Sample3     Sample N
  SNP1     0            1              0            2  
  SNP2     0            NA             0            0  
  SNP3     0            0              0            0  
  SNP4     0            NA             0            0  
  SNP5     0            1              0            2  
  SNP6     0            NA             0            0  
  SNP7     2            1              0            2  
  SNP8     NA            NA             NA            NA
plink • 6.6k views
ADD COMMENT
4
Entering edit mode

Can you show us sample of the file?

ADD REPLY
0
Entering edit mode

zx8754 is right an example is a must.

ADD REPLY
0
Entering edit mode

can you tell me, How can i do it in R? I have a genotype matrix (near 3000 animal with 50 000 SNP in columns). It's coded as 0/1/2 or NA. I want to convert this into plink format in form allelic format for example 0 to 0 0, 1 to 1 1 and 2 to 2 2. this is a format for PLINK for quaity control my data, What's the best way to do this in R?

ADD REPLY
2
Entering edit mode
10.3 years ago
zx8754 11k

Based on your example data named raw.txt, you can make TPED format files, then use plink to convert to pedmap format:

#make tped
awk 'NR != 1 {print 1,$1,0,NR}' raw.txt > temp_snp.txt
cut -f2- raw.txt | sed '1,1d' | sed 's/0/A A/g' | sed 's/1/A B/g' | sed 's/2/B B/g' | sed 's/NA/0 0/g' > temp_geno.txt
paste temp_snp.txt temp_geno.txt > plink.tped

#make tfam
head -n1 raw.txt | tr '\t' '\n' | sed '1,1d' | awk '{print $1,$1,0,0,1,1}' > plink.tfam

#convert to PedMap
plink --noweb \
--tfile plink \
--recode \
--out plink

Or I would just use R for analysis.

ADD COMMENT

Login before adding your answer.

Traffic: 1514 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6