Accessing population specific mutation data from 1000 genome.
1
0
Entering edit mode
5.2 years ago

I am trying to download mutation data from 1000 genome. I am interested in white/not Hispanic or Latino population for the study. I have a few questions regarding the data.

  1. From the mentioned category on the 1000 genome (http://www.internationalgenome.org/faq/which-populations-are-part-your-study/) I think CEU is the best suited for my study.

  2. With the above assumption, I download VCF files from http://www.internationalgenome.org/data-portal/sample/NA06985. In the VCF files, they have categorization based on super-population only (for CEU it is EUR). But EUR is not exclusive to CEU. So, Is there a way or other metadata I missed so that I can filter mutations only for CEU from the VCF files?

1000gnome VCF mutation • 1.2k views
ADD COMMENT
1
Entering edit mode
5.2 years ago

Yes, you would mainly want the following EUR populations, i.e.:

  • CEU, Utah Residents (CEPH) with Northern and Western European Ancestry
  • TSI, Toscani in Italia
  • GBR, British in England and Scotland

The others are:

  • FIN, Finnish in Finland
  • IBS, Iberian Population in Spain

People from Iberia are obviously hispanic. People from Finland have been shown to be statistically significantly distinct from the other European populations.

On your point 2, there is indeed a better metadata file. Take a look at #Step 2, here: Produce PCA bi-plot for 1000 Genomes Phase III - Version 2

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 3123 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6