How are the TopHat deletions and insertions identified? How does this relate to the mismatch and other alignment options used? I'm a little confused and would appreciate an explanation. Thanks!
How are the TopHat deletions and insertions identified? How does this relate to the mismatch and other alignment options used? I'm a little confused and would appreciate an explanation. Thanks!
Hi, I found an answer to a similar question, which also gives the link to the proper documentation.
Maybe giving a look there might help.
The relation with the length of gaps that we allow is straightforward: the insertions and deletions have a maximum length, which is determined by the length of the gap we allow for.
1) So if an alignment works best with a gap, we consider the gap to be a deletion and vice versa for an insertion?
2) Maybe I'm missing something but this is all I found about insertions and deletions on the TopHat man page:
insertions.bed and deletions.bed. UCSC BED tracks of insertions and deletions reported by TopHat. Insertions.bed - chromLeft refers to the last genomic base before the insertion. Deletions.bed - chromLeft refers to the first genomic base of the deletion.
Maybe what you said is all there is to the insertions and deletions - that a high scoring alignment with a gap indicates the presence of a deletion in that region?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Could you be more specific and add some information? At least to me your question is unclear...
Sure - TopHat produces a deletions.bed and insertions.bed file.
1) How are these insertions and deletions identified?
2) When mapping reads to the genome, we allow for a certain number of mismatches and gaps - how do these relate to how tophat identifies deletions and insertions?