Tophat 1.4 Output (Accepted_Hits.Bam)
1
0
Entering edit mode
11.6 years ago

I have read the Tophat 1.4 manual, however I do not seem to understand the outputs very well.

I used a reference genome for my alignment with Tophat and I got a file called "accepted_hits.bam"

This file's cigar strings don't contain any soft clips (S). Can someone explain me why this is happening ?

Also, this was for the reference genome. If I wanted to align to a junction database, should I just use the junction reference instead of the genome reference ? Or does Tophat align to junctions automatically ?

Very confused with the documentation.

tophat output bam • 4.2k views
ADD COMMENT
1
Entering edit mode
11.6 years ago
Ido Tamir 5.2k
  • tophat splits the reads, but still requires everything to match. You could have mismatches in the beginning or end, but these would be scored as M in the CIGAR string. You have to look in the MD tag for mismatches. Old tophat versions dont have the optional MD tag. You can generate it with samtools calmd.

  • You always need a reference sequence. You can use a junction file or a GTF file in addition to this. Tophat automatically tries to align reads that can not be mapped completely by aligning them in a split fashion, thus discovering junctions. You can turn this off, modify it with parameters or supply your own junctions. But you should also have a junctions.bed file in your output and see some reads with a CIGAR like 20M500N50M.

ADD COMMENT
0
Entering edit mode

Can the 500N part of the CIGAR contain soft clips ?

ADD REPLY
0
Entering edit mode

500N is a gap tophat is marking an intron

ADD REPLY

Login before adding your answer.

Traffic: 2547 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6