Biostar Beta. Not for public use.
1000Genomes population allele frequencies for list of SNPs
2
Entering edit mode
22 months ago

Hi all,

I have a list of more than 100 SNPs (rsXXXXXX) and I would like to obtain the different allele frequencies that each of them shows in each of the 1000Genomes populations (if possible not manually...). Is there any tool, R package etc, that allows to download them all at once? I've thought that I could maybe obtain those frequencies from the UCSC database or directly from the 1000Genomes database, but I'm open to any suggestions. Thank you very much in advanced!

snps 1000Genomes • 641 views
ADD COMMENTlink
2
Entering edit mode
19 months ago
United States

Solution 1: The raw variant call data can be downloaded from http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ . Once you have those files, the INFO column in each VCF file contains superpopulation allele frequencies, and there are a bunch of tools which can look up the INFO column entry for a particular rsID. The one complication is that, since there's a separate VCF file per chromosome, you may first need to figure out which rsIDs are on which chromosomes.

Solution 2: With plink 2.0, http://www.cog-genomics.org/plink/2.0/resources#1kg_phase3 provides a single merged dataset containing all chromosomes (download the boldfaced links, then rename phase3_corrected.psam to all_phase3.psam). Then,

plink2 --pfile all_phase3 vzs --extract [your list of rsIDs] --export vcf

can then be used to export a VCF with only the rsIDs you care about; the precomputed superpopulation allele frequencies will be in the INFO column of this freshly generated VCF. You can also define your own populations with --keep and compute allele frequencies on the fly with --freq.

ADD COMMENTlink
0
Entering edit mode

Sorry, I don't know what might be happening but my plink2 returns that it doesn't recognise the "--pfile" option... I'm launching the command in a folder with the files all_phase3.pgen.zst, all_phase3.psam, all_phase3.pvar.zst and my list of rsIDs... is it possible that it is not recognising some of the files? Or is it more likely a problem of my plink2 installation?

Thanks again :)

ADD REPLYlink
0
Entering edit mode
  1. You need to decompress the .pgen.zst file first; see the instructions at the top of the resources page.

  2. This requires plink 2.0, not 1.9. What do you get when you type “plink2 —version”?

ADD REPLYlink
2
Entering edit mode
18 months ago
France/Nantes/Institut du Thorax - INSE…
$ cat rslist.txt | while read R ; do wget -q -O - "https://www.ncbi.nlm.nih.gov/snp/${R}?download=frequency" | grep -E '^(#Study|1000Genomes)' | sed "s/^/${R}\t/" ; done

rs25    #Study  Population  Group   Samplesize  Ref Allele  Alt Allele  BioProject ID   BioSample ID
rs25    1000Genomes Global  Study-wide  5008    T=0.485 C=0.515 PRJEB6930   SAMN07490465
rs25    1000Genomes African Sub 1322    T=0.493 C=0.507     SAMN07486022
rs25    1000Genomes East Asian  Sub 1008    T=0.474 C=0.526     SAMN07486024
rs25    1000Genomes Europe  Sub 1006    T=0.521 C=0.479     SAMN07488239
rs25    1000Genomes South Asian Sub 978 T=0.52  C=0.48      SAMN07486027
rs25    1000Genomes American    Sub 694 T=0.38  C=0.62      SAMN07488242
rs26    #Study  Population  Group   Samplesize  Ref Allele  Alt Allele  BioProject ID   BioSample ID
rs26    1000Genomes Global  Study-wide  5008    T=0.335 C=0.665 PRJEB6930   SAMN07490465
rs26    1000Genomes African Sub 1322    T=0.404 C=0.596     SAMN07486022
rs26    1000Genomes East Asian  Sub 1008    T=0.291 C=0.709     SAMN07486024
rs26    1000Genomes Europe  Sub 1006    T=0.341 C=0.659     SAMN07488239
rs26    1000Genomes South Asian Sub 978 T=0.36  C=0.64      SAMN07486027
rs26    1000Genomes American    Sub 694 T=0.22  C=0.78      SAMN07488242
rs27    #Study  Population  Group   Samplesize  Ref Allele  Alt Allele  BioProject ID   BioSample ID
rs27    1000Genomes Global  Study-wide  5008    G=0.283 C=0.717 PRJEB6930   SAMN07490465
rs27    1000Genomes African Sub 1322    G=0.355 C=0.645     SAMN07486022
rs27    1000Genomes East Asian  Sub 1008    G=0.284 C=0.716     SAMN07486024
rs27    1000Genomes Europe  Sub 1006    G=0.261 C=0.739     SAMN07488239
rs27    1000Genomes South Asian Sub 978 G=0.29  C=0.71      SAMN07486027
rs27    1000Genomes American    Sub 694 G=0.16  C=0.84      SAMN07488242
rs28    #Study  Population  Group   Samplesize  Ref Allele  Alt Allele  BioProject ID   BioSample ID
rs28    1000Genomes Global  Study-wide  5008    C=0.517 T=0.483 PRJEB6930   SAMN07490465
rs28    1000Genomes African Sub 1322    C=0.601 T=0.399     SAMN07486022
rs28    1000Genomes East Asian  Sub 1008    C=0.476 T=0.524     SAMN07486024
rs28    1000Genomes Europe  Sub 1006    C=0.517 T=0.483     SAMN07488239
rs28    1000Genomes South Asian Sub 978 C=0.53  T=0.47      SAMN07486027
rs28    1000Genomes American    Sub 694 C=0.40  T=0.60      SAMN07488242
ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1