Snps In Linkage Disequilibrium For Ncbi Build 37 (Hg19)
2
2
Entering edit mode
10.9 years ago
cruzpedro ▴ 100

Hi all

I'd like to know if anyone has or might know where to get a list of SNPs in high LD for the last human genome build (GRCh37). I'm currently working with the Brazilian population. I'm trying to do PCA for population structure analysis and, therefore, need to remove high LD regions from my dataset to get a low biased PC axes.

Thanks a lot in advance for the attention!

Best regards, Pedro. PS: Just --indep-pairwise option in PLINK won't solve my problem

ld snp gwas plink • 5.5k views
ADD COMMENT
2
Entering edit mode
10.9 years ago

The LD data for hapmap / hg18 are available here: http://hapmap.ncbi.nlm.nih.gov/downloads/ld_data/?N=D

you could run liftOver to map those positions to hg19.

ADD COMMENT
0
Entering edit mode

Thanks a lot for you answer, really shed some light to my problem!

ADD REPLY
0
Entering edit mode

Do you know if there is the same data from the newest release of the 1000G project?

ADD REPLY
0
Entering edit mode

LD data for phase 3 on GRCh37 is available in the 1000 genomes browser, empowered by Ensembl. More details on that view can be found here.

ADD REPLY
0
Entering edit mode

Thank you very much. But is there also a ftp or whole genome download site?

ADD REPLY
0
Entering edit mode

Had a chat with my colleague Laura Clarke from the 1000 Genomes here at EMBL-EBI and this is what she said

There are no bulk downloads for LD information. Doing pairwise comparisons of 80m sites between 2500 individuals is impossible. The thing to recommend is to convert the files to plink format using our VCF to PED tool or vcftools

http://www.1000genomes.org/faq/can-i-convert-vcf-files-plinkped-format

http://vcftools.sourceforge.net/documentation.html#plink

Then look at the region of interest in haploview or equivalent. There is no feasible way to get this in bulk for any large quantity of 1000 genomes sites or individuals

ADD REPLY
2
Entering edit mode
10.9 years ago
Tky ★ 1.0k

Well I guess what is true for other populations might not be same for Brazilian population, therefore you may need to calculate the pairwise LD in you sample set and construct LD blocks ( and scan the consecutive window to locate the high LD region) Please refer to this paper for more information: A Price et a;., Long-Range LD Can Confound Genome Scansin Admixed Populations, AJHG 2008 (https://www.sciencedirect.com/science/article/pii/S0002929708003534)

ADD COMMENT
0
Entering edit mode

Hey! Thanks a lot for your answer! I was planning to do this as well later, since I was first geting a "test" set for running EIGENSTRAT. Well, I think I'll use for my sample the "--ld-window-kb" option in PLINK: http://pngu.mgh.harvard.edu/~purcell/plink/ld.shtml#ld2. Do you think this might suffice for LD controlling in a 60 individuals dataset? I have some data on other 50 individuals I can include to infer patterns of LD in my population.

Best regards!

ADD REPLY
0
Entering edit mode

Yeah, you should check the eigenstrat plot first (against other HapMap populations) And more samples give more accurate LD estimations.

ADD REPLY

Login before adding your answer.

Traffic: 2700 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6