Biostar Beta. Not for public use.
Download hundreds of genes' variant csv from gnomAD
0
Entering edit mode
13 months ago
Qingyang Xiao • 130
Stockholm

Now I have 500 genes of interest that I want to download from gnomAD for SNP analysis.

It will take forever if I type the each gene name and click the button "Export to csv".

How can I do that in batches?

genome SNP • 350 views
ADD COMMENTlink
4
Entering edit mode
12 months ago
France/Nantes/Institut du Thorax - INSE…
wget -O - "https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/genomes/gnomad.genomes.r2.1.1.sites.vcf.bgz" | gunzip -c | grep -E '(^#|\|(GENE1|GENE2|GENE3|GENE4)\|)' > genes.vcf
ADD COMMENTlink
3
Entering edit mode

If you are interested in specific genes, you would probably want to use gnomAD exomes, not genomes. It's based on more samples and the file is substantially smaller.

ADD REPLYlink
0
Entering edit mode

Small suggestion: If you have the disk space (something in the order of ~1TB), you could output wget to a temporary (i.e. wget -O - "https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/genomes/gnomad.genomes.r2.1.1.sites.vcf.bgz" > gnomad.vcf.bgz) and then query the file with gunzip + grep after in case you want to look at different genes, or you notice a typo etc. You could also do it per chromosomes and only grep the genes that match the chromosomes you need (see download page).

Since you have 500 genes, you could also put them in a text file (one gene per row) and provide the file as your list of search strings by modiying the grep part here to do gunzip -c gnomad.vcf.bgz | grep -E -f mygenes.txt.

Also keep in mind that grep with match whatever text is present; if you have gene symbols and some gene is a substring of something unrelated, it'll get matched, so you should definitely analyse your output for correct matches.

Finally, do you have gene symbols, or gene identifier (e.g. Ensembl, or RefSeq)? I would download the smallest file (chr21 sites VCF (6.12 GiB)) first and check that your inputs will work with what the gnomAD vcf provides, and then try on the whole dataset.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1