Identify reads that span intron-exon junctions in RNA-Seq
2
1
Entering edit mode
5.8 years ago

Dear group, Given RNA-Seq data for either 1 or more than 1 different samples/conditions, I am interested in finding

A. an exon-intron or intron-exon junction with reads spanning these junctions.
B. Another similar question is identify intron-retention, where reads map exon-intron-exon.

A differs from B, where A coverage ends halfway in intron. There are algorithms such as iREAD for identifying intron-retention similar to B above.

Is there a way, say samtools or other tool where I can find intron-exon or exon-intron read overhang extension.

As an example, image is given for easy understanding. Thanks for your help.

-A

Image showing intron-exon read overhang for top sample, whereas for the rest of samples below no such read overhang or coverage is observed.

enter image description here

RNA-Seq • 5.7k views
ADD COMMENT
0
Entering edit mode

and what about your previous questions. e.g: Obtaining exon-intron and intron-exon reads Mapping and annotating DNA binding regions from ChIP-Seq to nearby gene did you leave a comment or marked them as "accepted" ?

ADD REPLY
0
Entering edit mode

Hi Pierre, Thanks for the reply. I will check this:Question: Extracting Intron-Exon Reads from bam files

For you second question, I did not post that question - but that is helpful.

thanks!

ADD REPLY
0
Entering edit mode

Dear Alex, Thank you so much. worked beautifully!! Adrian

ADD REPLY
0
Entering edit mode

IMHO, identifying and estimating the abundance of intron retention using exon-intron reads are unreliable at best and often lead to overestimation. In addition to introns having one or both splice sites alternatively spliced, such an approach is sensitive to parameters like overhang and mismatches used during the alignment step. I reckon a better way is to construct an intron database by using both the reference and any novel event identified from sequencing, and count/quantify reads in the bam against it.

ADD REPLY
3
Entering edit mode
5.8 years ago

Steps via awk and BEDOPS.

Build a BED file of exons:

$ wget -qO- ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.annotation.gtf.gz \
    | gunzip -c - \
    | awk '($3=="exon")' - \
    | gtf2bed - \
    | cut -f1-6 - \
    > gencode.v28.annotation.exons.c1t6.bed

Convert transcripts to merged exons:

$ awk -f transcripts2mergedExons.awk gencode.v28.annotation.exons.c1t6.bed > gencode.v28.annotation.mergedExons.c1t6.bed

Make an exon-intron list:

$ awk -f mergedExons2exonIntronList.awk gencode.v28.annotation.mergedExons.c1t6.bed > gencode.v28.annotation.exonsAndIntrons.c1t6.bed

Convert these to junctions:

$ awk -f exonIntronList2JunctionList.awk gencode.v28.annotation.exonsAndIntrons.c1t6.bed > gencode.v28.annotation.exonsIntronJunctions.c1t6.bed

Optionally, pad them, e.g. by 25 nt around the junction (adjust as needed):

$ bedops --everything --range 25 gencode.v28.annotation.exonsIntronJunctions.c1t6.bed > gencode.v28.annotation.exonsIntronJunctions.pad25.c1t6.bed

Map reads to the padded junctions:

$ bedmap --echo --count --delim '\t' gencode.v28.annotation.exonsIntronJunctions.pad25.c1t6.bed <(bam2bed < reads.bam) > answer.bed

The file answer.bed will contain the junction and the number of reads that map to — overlap with — the optionally-padded junction, by one or more bases.

If you want the actual reads that map to the exon-intron junctions:

$ bedmap --echo-map --delim '\t' gencode.v28.annotation.exonsIntronJunctions.pad25.c1t6.bed <(bam2bed < reads.bam) | sort-bed - > answer.bed

In this second example, the file answer.bed will contain the reads that map to the junction, by one or more bases.

Some links to Github Gists with listed awk scripts:

  • (transcripts2mergedExons.awk)
  • (mergedExons2exonIntronList.awk)
  • (exonIntronList2JunctionList.awk)
ADD COMMENT
0
Entering edit mode

Dear Alex, thanks for your answer. It is very helpful. If for e.g. I want to find only those reads spanning exon-exon junction then, can I directly jump from step "Convert transcripts to merged exons" to "Convert these to junctions" skipping an intermediate step "Make an exon-intron list"? Will it make sense?

ADD REPLY
1
Entering edit mode
5.1 years ago
Malcolm.Cook ★ 1.5k

I have used featureCounts function from the Rsubread package for this express purpose and found it worked perfectly. I recommend it if you work in R because:

  • it gives you precise control over the extent of overlap you want to require for a read to be counted as overlapping your intron-exon junction
  • it interoperates with spliced alignment BAM files as produced most excellently and quickly by the STAR aligner
  • you can profitably pass it acceptor and/or donor coordinates using the SAF format (if you do, you will probably want to replace GENEID with JUNCTIONID)
  • it is quite fast, and supports multi-threading
ADD COMMENT

Login before adding your answer.

Traffic: 1940 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6