Question

HISAT2 not reporting all splice sites?

0

Entering edit mode

7.0 years ago

kapoozy • 0

Does HISAT2 have some criteria that causes it to filter splice sites from the final output?

I'm using HISAT2 (version 2.0.5) to try and find novel splice sites. Going through the SAM file produced by HISAT2 suggests that there are a number of alignments that support the existence of splice junctions that are not reported by the program (i.e. they aren't in the file you get using the --novel-splicesite-outfile option).

For example, I've got a read aligned like this:

SRR360120.14138165      83      V       12814114        60      9M1297N91M      =       12813953        -261    ATCCCATGTCTTAATTAAACTTGTGGTAACTTTTAATGAATTAAACTTCTGATTTTGCCGATAAGCATATCATATGAAAAATACTAAAAATGTCGAAATG    CC?CCCCDECEEEEC?D@FFDEECHHGEEGDEFHAGGIGJIIIHHGIJIGHCGDIJJGIIGHHCG@E?A9EAGIEGF@HHBFAJIGJHFF<FADBFF@@@    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:100        YS:i:251        YT:Z:CP XS:A:-  NH:i:1

The way it's split suggests that there should be a splice junction reported at V:12,814,123-12,815,419, but there's no such entry in the splice sites file.

For comparison, I aligned these reads using TopHat (version 2.1.0) and it produced the exact same alignment for this read. However, TopHat DOES report the expected splice junction at V:12,814,123-12,815,419.

HISAT2 rna-seq splicing splice sites • 2.5k views

ADD COMMENT • link 7.0 years ago by kapoozy • 0