Question

multi-mapped reads counting methods with RNA-seq

0

Entering edit mode

6.0 years ago

guillaume.rbt ★ 1.0k

Hi all,

I'm currently doing RNA-seq data analysis to assess differential expression between genes.

If I understand well, standard methods for counting reads does not take into account reads mapped to multiple loci, which doesn't allow to evaluate expression of duplicated genes.

I saw that there is a new tool that deals with multiple alignments (mmquant).

I was wondering if I still use a "classical" counting tool for uniquely mapped reads, like featurecounts, and then add the results of mmquant for the multiple alignments.

Does that make any sense, or is it statistically stupid and will mess with the normalization later?

Thanks,

Guillaume

RNA-Seq expression gene count • 4.7k views

ADD COMMENT • link 6.0 years ago by guillaume.rbt ★ 1.0k

0

Entering edit mode

Don't mix and match methods for different genes/types of reads, that's a recipe for messing up the downstream analysis.

ADD REPLY • link 6.0 years ago by Devon Ryan 104k

0

Entering edit mode

Thank for the tip, I had a bad feeling about this idea.

ADD REPLY • link 6.0 years ago by guillaume.rbt ★ 1.0k

score 1 · Answer 1 · 2018-04-25

1

Entering edit mode

6.0 years ago

h.mon 35k

Use just one tool, there is no reason to complicate matters. mmquant should give identical counts to uniquely mapped genes as featureCounts or HTSeq - the difference is only for multi-mappers. As the paper states:

mmquant is a drop-in replacement of the widely used tools htseq-count and featureCounts that handles multi-mapping reads in an unabiased way.

ADD COMMENT • link 5.7 years ago by h.mon 35k

2

Entering edit mode

The method used by mmquant is rather odd. It's incredibly unclear what effect including counts for non-existent fusion genes (due to their sharing reads) would have downstream when one is performing differential expression. My primary concern is that this would both lead to an inflated number of tests (often with lower power). Any 3' or 5' bias in a sample could have dramatic consequences on this.

In general, I would personally think it's better to use RSEM/salmon/kallisto followed by tximport in R to get more accurate handling of multimappers on the gene level.

ADD REPLY • link 6.0 years ago by Devon Ryan 104k

0

Entering edit mode

Does those tools handle multi-mapping reads? Which may be the best for this task?

ADD REPLY • link 6.0 years ago by guillaume.rbt ★ 1.0k

0

Entering edit mode

All of them handle multi-mappers using an expectation-maximization algorithm. Salmon and kallisto are the fastest. However, they are not drop-ins replacement to featureCounts or HTSeq, as they do not use as input a bam file of reads aligned to a reference genome. Instead, Salmon and RSEM can count from a bam aligned to a reference transcriptome; and Salmon and kallisto can count reads directly to a reference transcriptome, without the need to align - which is the fastest option.

ADD REPLY • link 6.0 years ago by h.mon 35k