defining reference alleles in 23andMe file to create plink .ped file
1
1
Entering edit mode
9.0 years ago
gulcek ▴ 20

I created .ped and .map pairs of each individual from 23andMe raw text format by using plink. As far as I know,

plink --bfile binary_fileset --recode 12 --out new_text_fileset

--recode 12 command creates a genomic 0,1,2 matrix (If reference allele is A and genotype is AA, it gives 2, if genotype is AT it gives 1, TT gives 0) from .bed, .bim and .fam files. However, we do not know which allele is reference for each snp. Does plink defines the reference allele instead of us or do we have to pass on external reference allele data to plink? If so, how do we integrate reference allele data to plink .ped and .map file? 23andMe txt file only contains genotype pair for each snp.

Example 23andMe raw data format:

# rsid    chromosome    position    genotype
rs4477212    1    82154    AA
rs3094315    1    752566    AA
rs3131972    1    752721    GG
ped plink SNP • 3.1k views
ADD COMMENT
0
Entering edit mode
9.0 years ago

When it's necessary to have correct reference alleles, you can use the --a2-allele flag, which is designed to scrape them from e.g. a VCF file.

Note that --a2-allele should be present during the final --recode 12 export step. If you use it earlier, but not during the final export, PLINK may swap your reference and alternate alleles in between. (Especially if you save to .ped/.map format in between--that doesn't track reference vs. alternate alleles at all!)

ADD COMMENT
0
Entering edit mode

What is the usage of --a2-allele flag? What is input file and output file format? Could you post an example usage? I have .bim, .fam and .bed fileset of my 23andMe individuals now, how do input these to plink --a2-allele?

ADD REPLY

Login before adding your answer.

Traffic: 2527 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6