This is a beta test.
Question: HISAT2 MAPQ scoring scheme
0
Entering edit mode

Hi,

On checking out MAPQ scores of my BAM file, I found that there are only three unique scores-0,1 and 60. What do the scores mean? I have checked their manual and its not mentioned there. (I am using HISAT2 version 2.0.4), where the highest score has been reduced from 255 to 60.

ADD COMMENTlink 17 months ago prasundutta87 • 330
Entering edit mode
1

Thanks a lot guys..you both seem to be right about this. Reads which have a MAPQ of 60 will also have NH tag (no. of hits) set as 1. If someone is interested in only uniquely mapped reads and is using HISAT v2.0.4, MAPQ can be used to get such reads or else only use NH tag to get those reads (different aligners use different scoring scheme, hence the NH tag is useful).

Although summarising the shared articles, to be really frank, there should not be anything called as a uniquely mapped read because the genome will have repeats and introducing one or more mismatches will allow the reads to be mapped elsewhere throwing the concept of 'uniqueness' to the bin..only one thing can be true, higher the MAPQ, the aligner is sure that the read has come from the assigned position (because its a probability)..One should choose an appropriate threshold keeping the error rate in mind.

ADD REPLYlink 11 months ago
prasundutta87
• 330
Entering edit mode
0

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLYlink 17 months ago
WouterDeCoster
39k
2
Entering edit mode

Looking in the hisat2 source code, hisat_bp.cpp, line 778:

 Reporting:" << endl
        << "  (default)          look for multiple alignments, report best, with MAPQ" << endl
        << "   OR" << endl
        << "  -k <int>           report up to <int> alns per read; MAPQ not meaningful" << endl
        << "   OR" << endl
        << "  -a/--all           report all alignments; very slow, MAPQ not meaningful" << endl

The default reporting mode is "-k 5", which according to the comments in the code the mapq value is not meaningful. The comment in the code indicates that reporting the best alignment is the default reporting mode, but that doesn't match what the manual says.

ADD COMMENTlink 17 months ago jgarbe • 20
2
Entering edit mode

With a small simulation, as far as I can tell the meanings are:

60 - uniquely mapped read, regardless of number of mismatches / indels
 1 - multiply mapped, perfect match or few mismatches / indels
 0 - unmapped, or multiply mapped and with lots of mismatches / indels

Sadly, I don't know how many are "few" and "lots", but nonetheless, multiply mapped reads have either MAPQ 0 or 1, whereas uniquely mapped reads have MAPQ of 60.

Looking quickly at the code is more confusing. Apparently, the relevant bits are in unique.h and unique.cpp. There are three versions of a MAPQ algorithm. The code seems to be inherited from Bowtie2, and there are several passages with values like 44, 42, 40, 24, 23, and so on. For example, on unique.cpp:

// There is no valid second-best alignment and the best alignment has a
// perfect score.
const TMapq pair_nosec_perf = 44;

Or in unique.h, on the V2 of the MAPQ algorithm:

// End-to-end alignment
if(!hasSecbest) {
    if     (bestOver >= diff * (double)0.8f) ret = 42;
    else if(bestOver >= diff * (double)0.7f) ret = 40;
    else if(bestOver >= diff * (double)0.6f) ret = 24;
    else if(bestOver >= diff * (double)0.5f) ret = 23;
    else if(bestOver >= diff * (double)0.4f) ret = 8;
    else if(bestOver >= diff * (double)0.3f) ret = 3;
    else ret = 0;

So, at first glance, like for Bowtie2 ( How does bowtie2 assign MAPQ scores? ), mismatches would affect MAPQ. However, this is not what happens, and my C++ skills have been nearly exhausted with this exercise.

edit: I am not the first one to have been fooled by the code, see MAPQ values are really useful but their implementation is a mess.

ADD COMMENTlink 17 months ago h.mon 25k

Login before adding your answer.

Powered by the version 1.6