Question: The inclusion or exclusion of NR transcripts in RefSeq gene annotation in the RNA-Seq analysis
19 months ago
wangdp123 • 150

Hi there,

The human Refseq gene annotation includes protein-coding genes and non-coding genes and even for the protein-coding genes it is possible to include mRNA (NM_) and non-coding RNAs (NR_) for the same gene at the same time.

For the RNA-Seq analysis, is there a need to remove all the NR_ transcripts for the protein-coding genes in order to achieve a highly accurate expression estimate of the protein-coding genes?

For instance, the human gene APBB1:

8 mRNA:









1 non-coding RNA:


In order to obtain the gene expression of APBB1 in a proper way, should we include or eliminate NR_047512.2 in the GTF file when the mapping and quantification are performed?

Thanks a lot,


19 months ago wangdp123
19 months ago
h.mon 25k

For the the gene APBB1, if you are performing gene-level counts with STAR , it won't make any difference to include or exclude NR_047512.2, as all its exons are shared with other protein-coding transcripts, and I believe this holds true for most non-coding RNA transcripts from protein-coding genes.

If your interest is to untangle expression from coding transcripts and non-coding transcripts at the gene level, I suggest you quantify all transcripts (NM_ and NR_) using kallisto / Salmon / RSEM, but when summarizing the counts to gene counts, split NM_ and NR_ counts. If you exclude NR_ before counting, transcripts which would have been assigned to NR_ transcripts may be assigned to NM_ transcripts, due to the exons in common.

19 months ago h.mon

