Hi there,
The human Refseq gene annotation includes protein-coding genes and non-coding genes and even for the protein-coding genes it is possible to include mRNA (NM_) and non-coding RNAs (NR_) for the same gene at the same time.
For the RNA-Seq analysis, is there a need to remove all the NR_ transcripts for the protein-coding genes in order to achieve a highly accurate expression estimate of the protein-coding genes?
For instance, the human gene APBB1:
8 mRNA:
NM_001164.4
NM_001257319.2
NM_001257320.2
NM_001257321.2
NM_001257323.2
NM_001257325.2
NM_001257326.2
NM_145689.2
1 non-coding RNA:
NR_047512.2
In order to obtain the gene expression of APBB1 in a proper way, should we include or eliminate NR_047512.2 in the GTF file when the mapping and quantification are performed?
Thanks a lot,
Tom