For my project I want to analyze a bunch of long non-coding RNA's (lncRNA) with TCGA (filters: transcriptome profiling, gene expression quantification, RNA-seq, FPKM) expression data of specific cancers.
The resulting data for each gene uses versioned Ensembl ids and I need to convert these to SYMBOL (as this is what most lncRNA's I use are listed in).
However, when converting, versioned Ensembl IDs tend to have a lot less matching SYMBOLS than using regular IDs to match on. Not only that, but using versioned IDs practically NONE of my lncRNAs will be in present in the dataset when matching their symbol IDs.
Now for my question: Do my results get impeded/will they be incorrect if I just ignore ensembl version numbers and translate to symbol?
As Ensembl lists it, the transcripts that are used changes for each versioning number, what does this exactly mean for the TCGA's expression data?
Another question: some of my lncRNA's were referencing ENST ids (transcripts) and I was able to convert some of them to symbol. Is there incorrect assumptions made when converting IDs like this as well?
I know this question might be nitpicky, but I don't want faulty results/make incorrect assumptions.
Thanks in advance for any answers :)