How to find the SNP in promoter regions
1
1
Entering edit mode
8.7 years ago
camelbbs ▴ 710

Hi all,

I want to ask if there is a database that storing the human disease-related SNPs. I want to acquire those SNP located in gene promoter regions. Can anyone help this.

Thanks very much.

Cam

promoter SNP • 3.5k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
3
Entering edit mode
8.7 years ago

Say you're working with hg19.

Grab SNP entries from NCBI and convert them to sorted BED with vcf2bed:

$ wget -qO- ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/VCF/common_all_20180423.vcf.gz \
    | gunzip -c - \
    | convert2bed --input=vcf --output=bed --sort-tmpdir=${PWD} - \
    > hg19.snp151.bed

Or use whatever subset or other source of SNPs desired, and use the command-line to turn it into a sorted BED file.

Grab gene annotations of interest (e.g., GENCODE) and filter for genes into a sorted BED with gff2bed:

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_21/gencode.v21.annotation.gff3.gz \
    | gunzip -c - \
    | gff2bed - \
    | awk '$8=="gene"' - \
    > genes.bed

Say we define proximal promoters as a region 1kb upstream of the gene. We can process the file genes.bed per-strand and generate promoter regions:

$ awk '{ \
        if ($6=="+") { \
            print $1"\t"($2 - 1000)"\t"$2"\t"$4"\t"$5"\t"$6; \
        } \
        else { \
            print $1"\t"$3"\t"($3 + 1000)"\t"$4"\t"$5"\t"$6; \
        } \
    }' genes.bed \
    > promoters.bed

Finally, we map SNP IDs to promoters with bedmap:

$ bedmap --echo --echo-map-id-uniq --delim '\t' promoters.bed hg19.snp151.bed > snps_over_promoters.bed
ADD COMMENT
1
Entering edit mode

This would just find all SNPs in promoter regions though. In order to get disease associated SNPs, you would have to use the Catalogue of Published GWAS or ClinVar to draw your SNPs from.

ADD REPLY
0
Entering edit mode

Thanks Alexander, does ClinVar include the cancer-associated SNPs?

ADD REPLY
1
Entering edit mode

ClinVar includes SNPs from any disease/phenotypic response observed by the researchers who upload them. If you're looking specifically for cancer SNPs, COSMIC might be a better choice (somatic).

ADD REPLY
0
Entering edit mode

Thanks Steven. I want to check the SNP in promoter sequence, but the SNP database don't include the strand info. So How do I know whether the SNP is in forward or reverse strand?

ADD REPLY

Login before adding your answer.

Traffic: 1978 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6