How to download dbSNP153 vcf files in hg19/GRCH37 version
3
4
Entering edit mode
4.6 years ago
Shicheng Guo ★ 9.4k

Hi All,

I notice dbSNP152 has been updated to dbSNP153 when I search rs533316401

https://www.ncbi.nlm.nih.gov/snp/rs533316401

Released July 9, 2019

Who has the VCF files for dbSNP153 in hg19 genomic assembly version?

Thanks.

Okay. With the help from xx and xx, the problem solved:

Here is hg19:

wget https://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.25.gz
wget https://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.25.gz.tbi
wget https://raw.githubusercontent.com/Shicheng-Guo/AnnotationDatabase/master/GCF_000001405.25_GRCh37.p13_assembly_report.txt

Here is hg38:

wget https://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.38.gz
wget https://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.38.gz.tbi

NB: hg38 and hg19

dbsnp153 vcf • 13k views
ADD COMMENT
0
Entering edit mode

Thank you Shicheng Guo for this information.

It's also important to highlight here that, this VCF version, chrom names are different. They are in NCBI format (something like NC_... or NT_...). Also, I didn't find variants from MT chrom on those VCF files.

If you want this chrom names in UCSC (hg19) format (or other format), maybe you'll need other steps. You should read this and this

For now, it was the best solution which I found as I need chrom names in UCSC hg19 format.

Hope this helps!

ADD REPLY
0
Entering edit mode

Hello Sir, I have downloaded the latest release (dbSNP154,hg38)according to your above mention steps. This dbSNP file doesnot have SAO (ID) in INFO column. is there any possible way to include this ID in INFO column?

ADD REPLY
0
Entering edit mode

Is there a reason to use the "redesign" link rather than just the latest release?

ADD REPLY
3
Entering edit mode
4.6 years ago
igor 13k

Both versions are available on the FTP site. GCF_000001405.25 is the RefSeq assembly accession corresponding to GRCh37.p13.

RefSNP VCF files for GRC (Genome Reference Consortium) human assembly 37 (GCF_000001405.25) and 38 (GCF_000001405.38). Files are compressed by bgzip and with the tabix index.

Source: https://ftp.ncbi.nih.gov/snp/archive/b153/00readme.txt

ADD COMMENT
2
Entering edit mode
4.6 years ago

I don't think there is one available at the moment, but you can first get one for hg38:

wget -4 -c https://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.38.gz

wget -4 -c https://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.38.gz.tbi

Then, liftover it to GRCH37/hg19 using crossmap: http://crossmap.sourceforge.net/

python CrossMap.py vcf hg38Tohg19.over.chain.gz GCF_000001405.38.gz hg19.fa  GCF_000001405.hg19.vcf
ADD COMMENT
1
Entering edit mode
2.9 years ago
Shicheng Guo ★ 9.4k

dbSNP154 is coming, share a script for preprocessing

## 05/09/2021: 2020-05-26 13:48 -- dbSNP154
wget https://ftp.ncbi.nih.gov/snp/archive/b154/VCF/GCF_000001405.25.gz ./
wget https://ftp.ncbi.nih.gov/snp/archive/b154/VCF/GCF_000001405.25.gz.md5 ./        
wget https://ftp.ncbi.nih.gov/snp/archive/b154/VCF/GCF_000001405.25.gz.tbi ./     
wget https://ftp.ncbi.nih.gov/snp/archive/b154/VCF/GCF_000001405.25.gz.tbi.md5 ./   
wget https://ftp.ncbi.nih.gov/snp/archive/b154/VCF/GCF_000001405.38.gz ./         
wget https://ftp.ncbi.nih.gov/snp/archive/b154/VCF/GCF_000001405.38.gz.md5 ./      
wget https://ftp.ncbi.nih.gov/snp/archive/b154/VCF/GCF_000001405.38.gz.tbi ./    
wget https://ftp.ncbi.nih.gov/snp/archive/b154/VCF/GCF_000001405.38.gz.tbi.md5 ./  
wget https://raw.githubusercontent.com/Shicheng-Guo/AnnotationDatabase/master/GCF_000001405.25_GRCh37.p13_assembly_report.txt ./
wget https://raw.githubusercontent.com/Shicheng-Guo/AnnotationDatabase/master/GCF_000001405.38_GRCh38.p12_assembly_report.txt ./
awk -v RS="(\r)?\n" 'BEGIN { FS="\t" } !/^#/ { if ($10 != "na") print $7,$10; else print $7,$5 }' GCF_000001405.25_GRCh37.p13_assembly_report.txt > dbSNP-to-UCSC-GRCh37.p13.map
awk -v RS="(\r)?\n" 'BEGIN { FS="\t" } !/^#/ { if ($10 != "na") print $7,$10; else print $7,$5 }' GCF_000001405.38_GRCh38.p12_assembly_report.txt > dbSNP-to-UCSC-GRCh38.p12.map
#sed -i '{s/chrX/23/g}' dbSNP-to-UCSC-GRCh37.p13.map
#sed -i '{s/chrY/24/g}' dbSNP-to-UCSC-GRCh37.p13.map
#sed -i '{s/chrM/25/g}' dbSNP-to-UCSC-GRCh37.p13.map
#sed -i '{s/chr//g}' dbSNP-to-UCSC-GRCh37.p13.map
sbatch --job-name=dbsnp154 --output=dbsnp154.out ~/bin/sbatch.sh 'bcftools annotate --threads 48 --rename-chrs dbSNP-to-UCSC-GRCh37.p13.map GCF_000001405.25.gz -o dbSNP154.hg19.vcf.gz'
sbatch --job-name=hg38 --mem=24G --output=hg38 ~/bin/sbatch.sh 'bcftools annotate --threads 48 --rename-chrs dbSNP-to-UCSC-GRCh38.p12.map GCF_000001405.38.gz -o dbSNP154.hg38.vcf.gz'
ADD COMMENT

Login before adding your answer.

Traffic: 2859 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6