Using VCFtools to find which population codes individuals in a vcf file belong to
1
0
Entering edit mode
7.8 years ago
severalorks ▴ 110

EDIT: The population code for each individual is listed elsewhere and is found on the 1000Genomes site, so the question's essentially been answered

I've been looking through phased vcf data from 1000 Genomes, specifically these files: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/

This has been a helpful resource so far for deciphering vcf format: https://samtools.github.io/hts-specs/VCFv4.2.pdf

However, I still have questions about identifying which population code each individual in the data belongs to. For example, what population code does individual HG00096 belong to? Is it possible to find all individuals from group GBR? In column INFO, it lists the allele frequency for each super population for each of the recorded positions, though I'm looking for information about the more specific population codes.

I think VCFtools may allow me to accomplish this, I've looked through the manual here: https://vcftools.github.io/man_latest.html I haven't found the answer to my question yet, but I'm still searching through it.

So how can I use VCFtools to find which population codes individuals in a vcf file belong to? If VCFtools can't do this, where else can I get this information?

vcf 1000genomes population-code vcftools • 3.8k views
ADD COMMENT
2
Entering edit mode
7.8 years ago
LauferVA 4.2k

Hello,

The information is not contained in the VCF file itself, unless you extract it from the genomic information ...

Rather, the mapping is contained in a separate file. First, have a look here: http://www.1000genomes.org/category/population/ You might also look here for an alternative way to extract subpopulations of a given type: http://browser.1000genomes.org/Homo_sapiens/UserData/SelectSlice

Now then, the mapping you are looking for can be found here: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130606_sample_info/20130606_sample_info.xlsx which can be accessed from this page: http://www.1000genomes.org/faq/can-i-get-phenotype-gender-and-family-relationship-information-samples/

You mention VCF tools. A next step could be to extract only those samples and create a smaller VCF file, or to run whatever analysis you wanted on that subset using VCF tools.

I recently did something similar but using Plink2, then Plink1, if you would like that code for reference I can append it. Does this answer your question?

ADD COMMENT
1
Entering edit mode

Yes, thank you. I'd already found the population codes elsewhere but your additional information was very helpful too.

ADD REPLY

Login before adding your answer.

Traffic: 1713 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6