Question

How can I download snp genotype file from 1000 Genome?

1

Entering edit mode

9.2 years ago

evo_genomics ▴ 60

How can I download genotype of specific snp (snp of coding region) for African population from 1000 Genome?

Thanks

SNP genotype population Genetics • 6.6k views

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by evo_genomics ▴ 60

Ram · Answer 1 · 2015-02-08

2

Entering edit mode

9.2 years ago

wangyi2412 ▴ 240

Visit ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ for the actual data.

There are also sample_population relationship description on the same ftp site. I don't have access to my record of the specific dir right now, but just browsing the site to see the docs will find it without much effort.

Hope this would help.

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by wangyi2412 ▴ 240

0

Entering edit mode

And you can check ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/current.tree if you want to find other things.

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by Tommy Carstensen ▴ 210

Ram · Answer 2 · 2015-02-08

2

Entering edit mode

9.2 years ago

rbagnall ★ 1.8k

You can use tabix if you prefer not to download the large vcf files of the actual data.

To download a single snp, lets say chr6 nucleotide position 7580958 (1 based numbering of GRCh 37 from the 1000 Genomes phase 3 data). Format is: tabix name-of-vcf-file chr:start-end

tabix ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5.20130502.sites.vcf.gz 6:7580958-7580959
6    7580958    rs2076299    A    G    100    PASS    AC=1018;AF=0.203275;AN=5008;NS=2504;DP=21936;EAS_AF=0.2867;AMR_AF=0.1772;AFR_AF=0.3139;EUR_AF=0.0358;SAS_AF=0.1585;AA=A|||

So the African allele frequency of rs2076299 in the 1000 Genomes data is AFR_AF=0.3139

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by rbagnall ★ 1.8k

0

Entering edit mode

Ah, now I see I have shown how to get the allele frequency, when 'genotypes' were asked for. You can still use tabix. You will need to retrieve information for the chromosome-specific vcf files of the 1000 Genomes data, which contain genotypes. (note the ALL.chr.6. bit in the file path. Change this to your chromosome number of choice)

tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr6.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz 6:7580958-7580959

In the above example, I have included -h option, which prints out the vcf header, including the sample IDs (e.g. NA21122 NA21123 NA21124 NA21125, etc). After the header lines is the variant information, including genotypes:

6    7580958    rs2076299    A    G    100    PASS    AC=1018;AF=0.203275;AN=5008;NS=2504;DP=21936;EAS_AF=0.2867;AMR_AF=0.1772;AFR_AF=0.3139;EUR_AF=0.0358;SAS_AF=0.1585;AA=A|||    GT0|0    0|0    0|0    0|0    0|1    0|1    0|0    0|0    0|0    0|0    0|0    0|1    0|0    0|0    0|0    0|0    0|0    0|0    0|0    0|0
...etc

Now you need to know the ethnicity of the sample IDs and you can find that information in this excel file:

http://www.1000genomes.org/sites/1000genomes.org/files/documents/20101214_1000genomes_samples.xls

From this file I can see that samples NA19092 to NA19266 are YRI (Yoruba in Ibadan, Nigeria).

ADD REPLY • link updated 4.5 years ago by Ram 43k • written 9.2 years ago by rbagnall ★ 1.8k

0

Entering edit mode

Thank you

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by evo_genomics ▴ 60