SAM file and alignment
1
0
Entering edit mode
6.2 years ago
qudrat ▴ 100

Hello everyone,
I have a SAM file from Tophat and I want to extract multiple reads sharing the same intron boundary based on CIGAR string.

RNA-Seq alignment • 1.5k views
ADD COMMENT
4
Entering edit mode

You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

ADD REPLY
0
Entering edit mode

I am not using TopHat, I already have SAM file from TopHat and I want to extract multiple reads sharing the same intron boundary based on CIGAR string.

ADD REPLY
1
Entering edit mode
6.2 years ago

If I understood the question correctly you'd like to access reads that span across the exon/intron boundary and contain the exon. Which makes it a bit tricker than a simple intersect.

You can't quite use the CIGAR string alone since that does not contain the coordinate. Working that out from the position would take some custom programming effort and would duplicate existing functionality in other libraries.

If you are able to use PySam the pileup method on the last coordinate of the exon might work. It states:

An alternative way of accessing the data in a SAM file is by iterating over each base of a specified region using the pileup() method. Each iteration returns a PileupColumn which represents all the reads in the SAM file that map to a single base in the reference sequence.

http://pysam.readthedocs.io/en/latest/api.html

You will still need to check that the end of the alignment is past the coordinate.

ADD COMMENT

Login before adding your answer.

Traffic: 2870 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6