Question

Snps In Promoter Regions In Exome Sequencing?

4

Entering edit mode

12.9 years ago

Kevin ▴ 640

I have on hand exome sequencing data. One question was posed to me in a biological standpoint that I can't really answer.

1) for the SNPs that are not in the coding regions but are in promoter regions can I analyze them as per WGS? 2) is there a good way to get a bed file of the promoter regions? Should I be looking at the exome target bed file instead for targeted regions that lie outside the exons? Would I be doing anything grossly wrong if I were to analyze reads that lie outside of the targeted region if there's good coverage?

part of the answer: http://biostar.stackexchange.com/questions/888/how-to-get-promoter-sequences-for-human-genes

http://genome.ucsc.edu/ENCODE/downloads.html

If you define X kb upstream of a gene to be a promoter, you can get this using the UCSC table browser as follows: http://biostar.stackexchange.com/questions/8230/hg19-promoters-bed-file

exome promoter next-gen sequencing snp • 5.8k views

ADD COMMENT • link updated 6.9 years ago by Biostar 20 • written 12.9 years ago by Kevin ▴ 640

score 6 · Answer 1 · 2011-06-08

I would argue that no, you cannot (reliably) analyze anything that aligns outside the targeted regions in exon capture data. If you see a dense region of coverage outside a targeted region, one of two things has happened: a) the reads really came from there and they were capture due to off-target binding b) the reads really came from somewhere else and were misaligned.

The makers of the exon cap kit went through great care to reduce the chance of off-target binding, so I would expect that almost everything you see will be the result of a misalignment. Because the aligner is just following a set of rules, you will often get large collections of reads that have all been systematically aligned to the wrong place. Furthermore, due to differences between the human genome reference and the actual genome of the individual that was sequenced, it can be impossible to tell if a read was properly aligned or not.

That being said, I have noticed that the coverage often spills out ~100bp on either side of the bounds of the officially targeted region, so if your promoter of interest is super close to a targeted region, you may be in luck.

btw In case you are not convinced that you can't tell good alignments from bad ones, consider the following case. The aligner says "this read maps perfectly to exactly one location". In fact the gene it aligned to has a paralog which differs by one base. Furthermore, the aligner can't know this, but in reality the individual you sequenced does not have the snp that distinguishes the two paralogs in the canonical reference. So in the end it should have really mapped ambiguously because it aligns to two locations equally well.

score 2 · Answer 2 · 2011-06-05

2

Entering edit mode

12.9 years ago

Ryan Thompson ★ 3.6k

I just treat exome capture & seq data as WGS data with really uneven coverage distribution. As long as you are sufficiently confident that your reads are mapped correctly and you have sufficient coverage in your region of interest, you can draw draw all the same inferences as from WGS data.

ADD COMMENT • link 12.9 years ago by Ryan Thompson ★ 3.6k

score 1 · Answer 3 · 2011-06-08

Don't forget about alternate splicing in which one mRNA's exon is part of the promoter region of the alt. spliced mRNA. There are not so many cases of this but there are enough that it justifies keeping this in mind. Consider a 10-exon mRNA with an alternate version where transcription is fired from a promoter in intron 5 giving a transcript consisting of exons 6 through 10. The exome data for exons 1 through 5 then are pieces of the promoter for the shorter transcript.