Biostar Beta. Not for public use.
Question: Download hundreds of genes' variant csv from gnomAD
0
Entering edit mode

Now I have 500 genes of interest that I want to download from gnomAD for SNP analysis.

It will take forever if I type the each gene name and click the button "Export to csv".

How can I do that in batches?

ADD COMMENTlink 10 months ago Qingyang Xiao • 130 • updated 10 months ago Pierre Lindenbaum 120k
4
Entering edit mode
wget -O - "https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/genomes/gnomad.genomes.r2.1.1.sites.vcf.bgz" | gunzip -c | grep -E '(^#|\|(GENE1|GENE2|GENE3|GENE4)\|)' > genes.vcf
ADD COMMENTlink 10 months ago Pierre Lindenbaum 120k
Entering edit mode
3

If you are interested in specific genes, you would probably want to use gnomAD exomes, not genomes. It's based on more samples and the file is substantially smaller.

ADD REPLYlink 10 months ago
igor
7.7k
Entering edit mode
0

Small suggestion: If you have the disk space (something in the order of ~1TB), you could output wget to a temporary (i.e. wget -O - "https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/genomes/gnomad.genomes.r2.1.1.sites.vcf.bgz" > gnomad.vcf.bgz) and then query the file with gunzip + grep after in case you want to look at different genes, or you notice a typo etc. You could also do it per chromosomes and only grep the genes that match the chromosomes you need (see download page).

Since you have 500 genes, you could also put them in a text file (one gene per row) and provide the file as your list of search strings by modiying the grep part here to do gunzip -c gnomad.vcf.bgz | grep -E -f mygenes.txt.

Also keep in mind that grep with match whatever text is present; if you have gene symbols and some gene is a substring of something unrelated, it'll get matched, so you should definitely analyse your output for correct matches.

Finally, do you have gene symbols, or gene identifier (e.g. Ensembl, or RefSeq)? I would download the smallest file (chr21 sites VCF (6.12 GiB)) first and check that your inputs will work with what the gnomAD vcf provides, and then try on the whole dataset.

ADD REPLYlink 10 months ago
manuel.belmadani
• 830

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0