Biostar Beta. Not for public use.
1000Genomes population allele frequencies for list of SNPs
Entering edit mode
22 months ago

Hi all,

I have a list of more than 100 SNPs (rsXXXXXX) and I would like to obtain the different allele frequencies that each of them shows in each of the 1000Genomes populations (if possible not manually...). Is there any tool, R package etc, that allows to download them all at once? I've thought that I could maybe obtain those frequencies from the UCSC database or directly from the 1000Genomes database, but I'm open to any suggestions. Thank you very much in advanced!

snps 1000Genomes • 641 views
Entering edit mode
19 months ago
United States

Solution 1: The raw variant call data can be downloaded from . Once you have those files, the INFO column in each VCF file contains superpopulation allele frequencies, and there are a bunch of tools which can look up the INFO column entry for a particular rsID. The one complication is that, since there's a separate VCF file per chromosome, you may first need to figure out which rsIDs are on which chromosomes.

Solution 2: With plink 2.0, provides a single merged dataset containing all chromosomes (download the boldfaced links, then rename phase3_corrected.psam to all_phase3.psam). Then,

plink2 --pfile all_phase3 vzs --extract [your list of rsIDs] --export vcf

can then be used to export a VCF with only the rsIDs you care about; the precomputed superpopulation allele frequencies will be in the INFO column of this freshly generated VCF. You can also define your own populations with --keep and compute allele frequencies on the fly with --freq.

Entering edit mode

Sorry, I don't know what might be happening but my plink2 returns that it doesn't recognise the "--pfile" option... I'm launching the command in a folder with the files all_phase3.pgen.zst, all_phase3.psam, all_phase3.pvar.zst and my list of rsIDs... is it possible that it is not recognising some of the files? Or is it more likely a problem of my plink2 installation?

Thanks again :)

Entering edit mode
  1. You need to decompress the .pgen.zst file first; see the instructions at the top of the resources page.

  2. This requires plink 2.0, not 1.9. What do you get when you type “plink2 —version”?

Entering edit mode
18 months ago
France/Nantes/Institut du Thorax - INSE…
$ cat rslist.txt | while read R ; do wget -q -O - "${R}?download=frequency" | grep -E '^(#Study|1000Genomes)' | sed "s/^/${R}\t/" ; done

rs25    #Study  Population  Group   Samplesize  Ref Allele  Alt Allele  BioProject ID   BioSample ID
rs25    1000Genomes Global  Study-wide  5008    T=0.485 C=0.515 PRJEB6930   SAMN07490465
rs25    1000Genomes African Sub 1322    T=0.493 C=0.507     SAMN07486022
rs25    1000Genomes East Asian  Sub 1008    T=0.474 C=0.526     SAMN07486024
rs25    1000Genomes Europe  Sub 1006    T=0.521 C=0.479     SAMN07488239
rs25    1000Genomes South Asian Sub 978 T=0.52  C=0.48      SAMN07486027
rs25    1000Genomes American    Sub 694 T=0.38  C=0.62      SAMN07488242
rs26    #Study  Population  Group   Samplesize  Ref Allele  Alt Allele  BioProject ID   BioSample ID
rs26    1000Genomes Global  Study-wide  5008    T=0.335 C=0.665 PRJEB6930   SAMN07490465
rs26    1000Genomes African Sub 1322    T=0.404 C=0.596     SAMN07486022
rs26    1000Genomes East Asian  Sub 1008    T=0.291 C=0.709     SAMN07486024
rs26    1000Genomes Europe  Sub 1006    T=0.341 C=0.659     SAMN07488239
rs26    1000Genomes South Asian Sub 978 T=0.36  C=0.64      SAMN07486027
rs26    1000Genomes American    Sub 694 T=0.22  C=0.78      SAMN07488242
rs27    #Study  Population  Group   Samplesize  Ref Allele  Alt Allele  BioProject ID   BioSample ID
rs27    1000Genomes Global  Study-wide  5008    G=0.283 C=0.717 PRJEB6930   SAMN07490465
rs27    1000Genomes African Sub 1322    G=0.355 C=0.645     SAMN07486022
rs27    1000Genomes East Asian  Sub 1008    G=0.284 C=0.716     SAMN07486024
rs27    1000Genomes Europe  Sub 1006    G=0.261 C=0.739     SAMN07488239
rs27    1000Genomes South Asian Sub 978 G=0.29  C=0.71      SAMN07486027
rs27    1000Genomes American    Sub 694 G=0.16  C=0.84      SAMN07488242
rs28    #Study  Population  Group   Samplesize  Ref Allele  Alt Allele  BioProject ID   BioSample ID
rs28    1000Genomes Global  Study-wide  5008    C=0.517 T=0.483 PRJEB6930   SAMN07490465
rs28    1000Genomes African Sub 1322    C=0.601 T=0.399     SAMN07486022
rs28    1000Genomes East Asian  Sub 1008    C=0.476 T=0.524     SAMN07486024
rs28    1000Genomes Europe  Sub 1006    C=0.517 T=0.483     SAMN07488239
rs28    1000Genomes South Asian Sub 978 C=0.53  T=0.47      SAMN07486027
rs28    1000Genomes American    Sub 694 C=0.40  T=0.60      SAMN07488242

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1