Ensembl ID and counts to FPKM/TPM
1
0
Entering edit mode
4.9 years ago

I have an R dataframe where each row is an Ensembl ID for a gene, and each column is a sample, and each value is a number of counts of transcripts matched to a gene (for the entire genome, human). My question is: how do I get the appropriate gene length for normalization to FPKM or TPM? This answer: Get gene length with R gives the total gene length that includes introns which is not appropriate. This one: https://bioinformatics.stackexchange.com/questions/2567/how-can-i-calculate-gene-length-for-rpkm-calculation-from-counts-data gives a valid answer but I don't know what a GTF file is/ might not be able to construct one. I just want to go straight from ensmbl ID -----------> transcript length.

I know that there are a lot of details (how were the transcripts mapped/what reference genome, etc etc etc) that could make a simple ensembl ID -------->transcript length calculation slightly inaccurate but this is an early stage of the exploratory data analysis, and I can confirm the results later once the analysis pipeline is set up.

Thanks for tolerating an amateur!

RNA-Seq R tpm fpkm exon • 2.1k views
ADD COMMENT
0
Entering edit mode
4.9 years ago
h.mon 35k

I know that there are a lot of details (how were the transcripts mapped/what reference genome, etc etc etc) that could make a simple ensembl ID -------->transcript length calculation slightly inaccurate

It works better the other way around: the more information you provide, the better the answers you get.

I don't know what a GTF file is/ might not be able to construct one.

As you have Ensembl identifiers, you probably want a Ensembl annotation. You can download a gtf file from the corresponding organism genome page. You need to use the same annotation version used for quantification.

gtf file description:

GFF/GTF File Format - Definition and supported options

ADD COMMENT

Login before adding your answer.

Traffic: 3383 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6