Question

paired-end, single-end, polyA tail, 500bp, and removal of RNA genes

0

Entering edit mode

6.2 years ago

moxu ▴ 510

For DEG analysis using RNA-seq, we typically remove pseudogenes, microRNA genes, and RNA genes such as LINC RNA, SCARNA, SNOR, etc., the reason being that the single-end RNA-seq typically uses the polyA tails of RNA to fish out RNA to sequence, and these RNA genes do not have polyA so they should not be there. This sounds fine to me.

My questions arise when paired-end RNA-sequencing is used:

If paired-end sequencing uses ~500bp RNA segments, are the polyA tails always in these segments? If not, are the ~500bp segments random chops of the polyA RNA or any RNA?
Should we remove the RNA genes as we have done for single-end in DEG analysis?
I have genes like RPPH1, RMRP, RN45S, & |MALAT1 high on my DEG list using paired-end alignment, but low on the DEG list using single-end alignment. These are RNA genes, but NOT RNA gene classes such as SNORNA, LINCRNA. Why is it so and should I remove these RNA genes from the DEG analysis or not?

Thanks in advance!

next-gen rna-seq • 1.8k views

ADD COMMENT • link 6.2 years ago by moxu ▴ 510

0

Entering edit mode

Note that many lncRNAs are polyadenylated.

ADD REPLY • link 6.2 years ago by Carlo Yague 8.6k

0

Entering edit mode

Thanks for the education. However, maybe because their functions are usually unknown, lncRNAs are filtered out in our analysis pipeline. Not sure if this is a good idea.

ADD REPLY • link 6.2 years ago by moxu ▴ 510

score 2 · Accepted Answer · 2018-02-09

2

Entering edit mode

6.2 years ago

Devon Ryan 104k

The sequence is somewhat randomly distributed throughout the transcript, though there tends to be a bit of bias toward one end or the other.
There's no real point in removing them in either case. You're going to get little if any signal from non-polyadenylated genes, so they'll get removed in filtering at some point mostly.
That things like RN45S are present suggests that something went amiss during poly-A enrichment. In fact, it sounds more like you did ribo-depletion (if you don't do that on fresh material it doesn't work well).

ADD COMMENT • link 6.2 years ago by Devon Ryan 104k

0

Entering edit mode

For point 1, do you refer to paired-end or single-end or both?

For point 3, the treatment was to use shRNA knockdown a RNA binding protein, which might have impact on RNA levels and cause the DE of the said genes. My question is why using paired-end alignment and single-end alignment caused such a big difference (in terms of DEG p-value). Does it have to do with the biological technology and RSEM alignment algorithm?

ADD REPLY • link 6.2 years ago by moxu ▴ 510

1

Entering edit mode

Both, the only difference between SE and PE sequencing is that you sequence both ends of the loaded fragments in the latter.
Maybe mappability, but from your description I wonder if you did ribo-depletion rather than poly-A selection. RSEM might affect things a bit, but I'd be surprised if it's that big of an effect.

ADD REPLY • link 6.2 years ago by Devon Ryan 104k