Question

Promoter in gene body region of another gene - feedback on definition or consensus

0

Entering edit mode

4.9 years ago

lshepard ▴ 470

Hi everyone,

I am wondering if there is a standard/consensus about how we should define promoter regions which may fall upon gene body regions of another gene. To expand: say I define a promoter as 1000 bp away from the TSS and I want to intersect CpG methylation to find regulatory regions (gene body, promoter etc...). Now, let's assume that in one case the site falls within a gene body region of gene "x", however, it may also fall within the promoter regions of nearby genes, in this case, gene "y". (I mention one case, but in reality, we can observe several cases of this across mammalian genomes).

Given that there is a direct intersection between the site and the gene body of a gene "x", should we exclude this site from falling within the promoter region of the nearby gene "y"? My guess, is that there is no real "right" or "wrong" here, but I am wondering if there is a standard that the community generally follows? In the end, I am sure, that only a lot of validation to directly identify that there is a regulatory mechanism between this site and gene "x" and/or "y" would actually tell us something, but again, I am looking to see if there is a general standard we should follow for bioinformatics workflows of annotation.

To complement, attached is an image which shows this example from UCSC. For the sake of a visual, assume that the site falls within the red rectangle area I highlighted. Thus in this case, this site falls directly within the intron of SPATA1, but also in the upstream region we define may define as the promoter of GNG5 (where in this particular case, these two genes overlap a bit, but there are many others where they may not directly overlap, but the promoter regions still would).

What do you all think?

Image from specific example below:

annotation • 1.4k views

ADD COMMENT • link updated 4.9 years ago by Emily 23k • written 4.9 years ago by lshepard ▴ 470

3

Entering edit mode

There's no community standard here, though in most cases the region you've marked in red would be kept as part of the promoter.

ADD REPLY • link 4.9 years ago by Devon Ryan 104k

1

Entering edit mode

From what I understand, in addition to Devon's comment, is that these situations are quite common across the genome (and variations of these situations). I imagine that the 2 genes in the diagram would be expressed in different tissues and/or under different cellular states, i.e., as the chromatin responds to stimuli and opens/closes in a certain fashion such that transcription of one over the other is favoured. If both were transcribed at the same time, some form of polymerase 'blockage' could occur, or something along those lines...

ADD REPLY • link 4.9 years ago by Kevin Blighe 87k

1

Entering edit mode

Very true Kevin, thank you for that additional thought and thank you Devon.

ADD REPLY • link 4.9 years ago by lshepard ▴ 470

score 4 · Accepted Answer · 2019-06-12

We've done quite a lot of work in Ensembl (docs and links to papers here) on defining regulatory features, such as promoters, using epigenomic data. This is based on ChIP-seq data of transcription factors, histone modifications and Rpol, plus open chromatin data such as DNase and FAIRE (ENCODE, RoadMap and Blueprint). We find that there is epigenomic evidence of promoter activity extending massively into neighbouring genes and the gene itself. This is not just because of the massive footprint of ChIP-seq compared to the actual tiny footprint of a transcription factor. When we compare this to transcription factor motifs (SELEX) we find that there are validated motifs (ie have ChIP-seq binding in at least one cell type) within coding exons.

Overall, I think there is is no community standard here because biologically, promoters do overlap genes.