I have an R dataframe where each row is an Ensembl ID for a gene, and each column is a sample, and each value is a number of counts of transcripts matched to a gene (for the entire genome, human). My question is: how do I get the appropriate gene length for normalization to FPKM or TPM? This answer: Get gene length with R gives the total gene length that includes introns which is not appropriate. This one: https://bioinformatics.stackexchange.com/questions/2567/how-can-i-calculate-gene-length-for-rpkm-calculation-from-counts-data gives a valid answer but I don't know what a GTF file is/ might not be able to construct one. I just want to go straight from ensmbl ID -----------> transcript length.
I know that there are a lot of details (how were the transcripts mapped/what reference genome, etc etc etc) that could make a simple ensembl ID -------->transcript length calculation slightly inaccurate but this is an early stage of the exploratory data analysis, and I can confirm the results later once the analysis pipeline is set up.
Thanks for tolerating an amateur!