Best annotation GTFs for transcript detection vs novel transcript discovery
0
0
Entering edit mode
9.8 years ago
trakhtenberg ▴ 160

In my analysis of RNAseq data (mouse) I have two goals: one is to identify differentially expressed genes, and the other is to discover novel transcripts. I assume that for the first goal I should use the annotation database that has the least number of redundant or erroneous entries. For the second goal I assume I should use the most comprehensive database possible even if it may have redundant or erroneous entries. If I am correct in my assumptions, which annotation databases should I use?

In terms of accuracy, it seems that GENCODE version M3 may be the best way to go? If I understood correctly it includes non-redundant transcripts from all the main sources: (a) all Refseq RNAs, (b) all that is added in UCSC Genes from Genbank, (c) Ensembl checked by HAVANA and just predicted, (d) and other databases.

Is there a reason to use a different database (e.g., UCSC Gene) to accomplish the first goal? To accomplish the second goal (discovering novel transcripts), should I also use the entire genbank? And if GENCODE filters Ensembl should I also use the original Ensembl?

RNA-Seq • 1.9k views
ADD COMMENT

Login before adding your answer.

Traffic: 1889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6