Identification of SNPs in regulatory regions
3
0
Entering edit mode
7.2 years ago
valerie ▴ 100

Hi guys,

I have a list of coordinates of my SNPs (converted and pre-processed vcf file). I used it to identify which of the SNPs are located inside the genes, using gtf file for mouse genome. Now I want to identify the SNPs that are located in regulatory regions e.g. transcription factor binding sites, protomotor regions. Is there any simple way to get a gtf file for such regions?

I looked through genome.ucsc.edu and found out that I can generated gtf file based on ORegAnno. Is it the correct way to act?

Thanks!

SNP • 2.8k views
ADD COMMENT
2
Entering edit mode
7.2 years ago
Chun-Jie Liu ▴ 280

You may check this paper The Ensembl Regulatory Build.

Ensembl call the regulatory region based on the Chip-seq and genome-wide and protein-specific measurements of DNA binding, histone modifications from ENCODE or other consortia.

Ensembl Regulatory Build contains human and mouse data, you can download from here.

The database regulomeDB may did the same things you want to do, you can check it.

ADD COMMENT
1
Entering edit mode

This is what the VEP will check your variants against. Definitely easier to just use the VEP rather than download and cross-reference.

ADD REPLY
0
Entering edit mode

Thanks for your information. I even miss such good tool before.

ADD REPLY
0
Entering edit mode

Thank you so much for your advice! May I ask you one more question: is it possible to identify which genes do these regulatory regions affect on?

ADD REPLY
1
Entering edit mode

We don't know this, I'm afraid. If they're promoters you can usually make a good inference based on position, but enhancers and insulators, we have no idea.

ADD REPLY
0
Entering edit mode

Understood! Thank you very much!

ADD REPLY
1
Entering edit mode

As Emily_Ensemb mentioned, we can find the promoter target gene by the position. It's very hard to identify the other regulatory elements targets.

The FANTOM also identified the regulatory regions. It provides PrESSto to view the regulatory element target and enhancer-promoter associations.

The Hi-C data could give you a hint of chromatin long range interaction based on the DNA binding protein interaction.

ADD REPLY
1
Entering edit mode
7.2 years ago

Regulatory regions are often represented as bed files. Promoters are roughly +/-500 bp of annotated TSS.

Now the distant regulatory elements are cell-type specific. So you may need to get the data from ENCODE portal for cell type of your interest or closely related cell type.

The SNPs overlapping regulatory regions often done as an enrichment analysis. A simple overlap might be purely by chance.

From the ENCODE portal, you could get all the annotations you are looking for. You need to download the Peaks file and choose appropriate cell-type.

https://www.encodeproject.org/data/annotations/

ADD COMMENT
1
Entering edit mode
7.2 years ago
Emily 23k

If you run them through the Ensembl VEP it will identify regulatory features the variants hits, and will also give you score changes for hits to TF motifs.

ADD COMMENT

Login before adding your answer.

Traffic: 2370 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6