Count Junctions In A Sam/Bam File
4
5
Entering edit mode
11.5 years ago
Dave Bridges ★ 1.4k

When considering transcript variants, I like using DEXseq to count reads mapped to exons, but while that is useful, I thought it would be more useful to be able to count junctions.

For example a given gene structure could be

1-2-3-4-5

But there might also be these structures

1-2-4-5

1-3-4-5

Although DEXseq seems like it could provide insight on this, in the above example, if both transcripts are about the same expression DEXseq would give similar counts for exons 2 and 3 and it would be hard to prove that these transcripts exist.

What I think would be better is if there would be a way to count junctions in a BAM file, such that you could say something like:

80% of transcripts have junctions from 1-2

20% of transcripts have junctions from 1-3

For a given junction. I think cufflinks tries to do something like this but it is not clear how they get their whole transcript numbers from these small data sets. Currently, for genes of interest I am loading the bam files into IGV and counting exon junctions by hand, but is there a script or tool that does this, maybe only counting junction transcripts with reasonable overlaps to both sides of the junction?

bam samtools rna-seq • 10k views
ADD COMMENT
4
Entering edit mode
11.5 years ago
Dan D 7.4k

I think FastMISO is what you want. It was a bear to install for me, but I've been happy with it thus far.

ADD COMMENT
3
Entering edit mode
11.5 years ago

If you ran TopHat to produce your BAM file it will have produced a 'junctions.bed' file. The score column of this junctions file contains the read counts for each junction. Since it is a BED file it can be manipulated with existing tools such as BEDTools. It can also be loaded in IGV alongside your BAM file.

By extracting junction coordinates from this file you can easily do things like:

  • Determining which junctions involve canonical GT-AG splice sites or other splice sites
  • Determine intron size
  • Identify junctions that correspond to known transcripts, novel junctions that are anchored to a known acceptor or donor site, novel junctions that involve known acceptor and donor sites, but in a novel combination (i.e. exon skipping)
  • For exon skipping events determine the number of exons skipped
  • etc.
ADD COMMENT
1
Entering edit mode
11.5 years ago
Ryan Thompson ★ 3.6k

You might want to try the spliceGraph function in the GenomicFeatures package of BioConductor. Here is a PDF files that gives a detailed example of how to use it with DEXSeq:

http://www.bioconductor.org/packages//2.10/bioc/vignettes/GenomicFeatures/inst/doc/spliceGraph.pdf

I haven't tried it myself yet, but I plan to.

ADD COMMENT
0
Entering edit mode
9.2 years ago
madk00k ▴ 360

New version of Qualimap tool counts junctions in RNA-seq Quality Control mode.

QualiMap can downloaded from here.

Here's an example of report.

ADD COMMENT

Login before adding your answer.

Traffic: 2698 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6