I have converted vcf file to bed files, and there are some duplicate SNPs. I would like to remove the duplicate SNPs, but keep one. For example, if rs1234 appears 5 times, I want to keep only one record (maybe the first one).
Right now I used
--write-snplist to get the snplist of the bed file, and use R to check the frequency of each snp, and use R to generate a duplicate snplist. With the duplicate snplist, I used
--extract to get the duplicate snp bed file, and
--exclude to get the bed file without any duplicate snp.
But how could I keep one snp for each duplicate snp?
And also, is there a way to do the above steps in plink, without switching to R to generate the duplicate snp list?