The inclusion or exclusion of NR transcripts in RefSeq gene annotation in the RNA-Seq analysis
1
0
Entering edit mode
6.1 years ago
wangdp123 ▴ 340

Hi there,

The human Refseq gene annotation includes protein-coding genes and non-coding genes and even for the protein-coding genes it is possible to include mRNA (NM_) and non-coding RNAs (NR_) for the same gene at the same time.

For the RNA-Seq analysis, is there a need to remove all the NR_ transcripts for the protein-coding genes in order to achieve a highly accurate expression estimate of the protein-coding genes?

For instance, the human gene APBB1:

8 mRNA:

NM_001164.4

NM_001257319.2

NM_001257320.2

NM_001257321.2

NM_001257323.2

NM_001257325.2

NM_001257326.2

NM_145689.2

1 non-coding RNA:

NR_047512.2

In order to obtain the gene expression of APBB1 in a proper way, should we include or eliminate NR_047512.2 in the GTF file when the mapping and quantification are performed?

Thanks a lot,

Tom

RNA-Seq RefSeq • 1.4k views
ADD COMMENT
0
Entering edit mode
6.1 years ago
h.mon 35k

For the the gene APBB1, if you are performing gene-level counts with STAR , it won't make any difference to include or exclude NR_047512.2, as all its exons are shared with other protein-coding transcripts, and I believe this holds true for most non-coding RNA transcripts from protein-coding genes.

If your interest is to untangle expression from coding transcripts and non-coding transcripts at the gene level, I suggest you quantify all transcripts (NM_ and NR_) using kallisto / Salmon / RSEM, but when summarizing the counts to gene counts, split NM_ and NR_ counts. If you exclude NR_ before counting, transcripts which would have been assigned to NR_ transcripts may be assigned to NM_ transcripts, due to the exons in common.

ADD COMMENT

Login before adding your answer.

Traffic: 1963 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6