Question

Typical percentage of multimapping reads in human rna-seq?

1

Entering edit mode

10 months ago

srhic ▴ 60

Hello,

I am doing some rna-seq analysis specific to transposable elements and repeats. Since we are interested in repeat elements, we are focusing on retaining multi-mapping reads in our data. I have tried using STAR and Bowtie2 for alignment but they give me very different percentages of multimapping reads.

With STAR I typically get around 5% of reads as multimapping. Even when I increase the parameter --outFilterMultiMapNmax to a high value such as 100 or 1000, this percentage doesn't change too much. With Bowtie2, on the other hand I get 20-30% reads labelled as aligning to multiple positions. I find this very confusing. Since the human genome is around 50% repetitive elements shouldn't the percentage of multimapping reads be much higher than what STAR is giving me?

Any one know what the typical percentage of multimapping reads should be when aligning rna-seq data to the human genome?

Thanks

RNA-seq alignment Bowtie2 STAR • 1.5k views

ADD COMMENT • link updated 10 months ago by GenoMax 141k • written 10 months ago by srhic ▴ 60

0

Entering edit mode

How are you judging the multi-mapping? Are you looking at NH tag?

ADD REPLY • link 10 months ago by GenoMax 141k

0

Entering edit mode

No, I am looking at the percentage of multimapping reads reported in the log files produced by STAR/Bowtie2. Is this not the correct approach?

ADD REPLY • link 10 months ago by srhic ▴ 60

0

Entering edit mode

Since the human genome is around 50% repetitive elements

You know you're aligning RNAseq?

ADD REPLY • link 10 months ago by Shred ★ 1.4k

0

Entering edit mode

Oh, I should have mentioned I also have h3k9me3 chip-seq which is known to bind repeats. I see similar percentage of multimappers in that data. And even for RNA-Seq, I was expecting a higher percentage as literature suggests that most of the genome is transcribed.

ADD REPLY • link 10 months ago by srhic ▴ 60

0

Entering edit mode

I see ~8% multiple mappers with Ribo-depleted, PE150 RNAseq, with mouse. (STAR) I do see about 50% multi-mappers with K9me3 (bowtie2)

Double checking you also adjusted --winAnchorMultimapNmax ?

To answer the difference between STAR and bowtie2, have you compared the overall mapping percentage? Is STAR assigning the same reads as unique or as unmapped?

Most of the genome is transcribed, but the level of transcription of each region or gene is not the same. Many repetitive elements will be silenced or rapidly degraded after transcription, so you would expect to see a low percentage of these reads in your RNA-seq.

ADD REPLY • link 10 months ago by rfran010 ▴ 900

0

Entering edit mode

Thanks. It is good to know that you are getting a similar percentage of multimappers with STAR. I analyzed some published datasets and STAR is giving me 5-15% multimappers for both ChIP-Seq and RNA-Seq data. H3k9me3 ChIP-Seq is giving me ~12% in STAR and ~40% in Bowtie2. And yes I adjusted --winAnchorMultimapNmax as well. STAR is not giving me many unmapped reads so it is probably assigning them as unique but I am not sure which of the aligners is giving the "correct" percentage of multimappers.

ADD REPLY • link 10 months ago by srhic ▴ 60

0

Entering edit mode

STAR is splice aware where as bowtie2 is not. So some of the difference may be because of that. Techniques where you are looking for specific areas of sequence enrichment will likely show multi-mapping since the sequences being enriched may be shorter compared toRNAseq. If you want you could use bbmap.sh (the aligner from BBMap suite). It is splice aware and using the option ambig=all will report every location that a read maps equally well to.

While imperfect you may need to depend on MAPQ to verify multi=mapping (https://sequencing.qcfail.com/articles/mapq-values-are-really-useful-but-their-implementation-is-a-mess/ )

ADD REPLY • link 10 months ago by GenoMax 141k