Is this an OK approach for lifting over a mixture of b36 & b37 to b38 variants using rsids only?
0
0
Entering edit mode
3.2 years ago
curious ▴ 750

I am working with an old, but widely used "mixed" dataset that contains SNPS mapped to a mixture of b36 + 37 coordinates

I don't know which build each SNP refers to, but each is labeled with an rsid. So I essentially tried to lift to b38 by rsid only like this:

  1. I updated the positions of the "mixed" dataset to b38 positions by merging with dbSNP141 on rsid ot create a "lifted" set.

  2. I downloaded 30x 1000 genomes data, which is called de novo on b38 and updated ID to include rsid

  3. II used beagle conform gt to make a "harmonized lifted" set by comparing to 1000 genomes as reference. This should make sure alleles/strand are harmonized between the datasets using freq and LD to correct ambiguous sites.

I realize this isn't ideal, but does this approach seem OK or are there better alternatives? I was able to "lift" 7022 of my original 7281 mixed build sites like this. plotting allele freq against a b38 references like topmed looks really clean too, so I think it worked

liftover • 1.2k views
ADD COMMENT

Login before adding your answer.

Traffic: 2916 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6