Remove Duplicate Individuals From Plink Ped File Using Plink
1
1
Entering edit mode
10.3 years ago
venks ▴ 740

I have a plink ped file. I wanted to remove duplicate individuals in ped file using PLINK.

Is there an option in PLINK to do this? If not, is there any other tool/option to do this?

Thank You

plink genotype gwas ped • 12k views
ADD COMMENT
0
Entering edit mode

Hi zx8754, I am trying to remove ID patients from my data and I am using the original PED file for doing that. I create a .txt file with the number of ID family and ID patients that I want to remove put in two columns, but it still doesn't work. The analysis seems to go until the end of the process (creating temporary files) when appears the message saying: Error: duplicates ID.

My command is: $ ./plink --file name --remove IDlist.txt --out subset2 --make-bed

And my IDlist.txt is:

1 2204
2 1146

So I know I have few duplicates but I don't understand why the presence of duplicates does not allow the removing process.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode
10.3 years ago
zx8754 11k

From plink manual:

The IDs are alphanumeric: the combination of family and individual ID should uniquely identify a person

How are you defining your duplicates?

I suggest you run IBS/IBD to identify duplicates, then remove duplicates on missingness (i.e.: keep individuals with most SNPs).

ADD COMMENT
0
Entering edit mode

Thank You. Duplicates are when the lab person genotyped a sample, he or she might've pipetted the same sample twice..

So , same sample with same genotypes are repeating twice in .ped file ( The ID's may or may not be the same ).

Hope it makes sense.

ADD REPLY
0
Entering edit mode

If you know the IDs to keep/remove then look into --keep or --remove option.

ADD REPLY

Login before adding your answer.

Traffic: 2574 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6