Question

How to convert transcript level TPM to gene level TPM ?

3

Entering edit mode

5.7 years ago

k.kathirvel93 ▴ 300

Hi EveryOne,

I am using various quantification tools for RNA-seq analysis. Now my query is : HTseq-count and featurecounts are producing gene level counts of gene abundance, STRINGtie and Express are producing in transcript level TPM abundance. Now i want to compare these two outputs. For this i am thinking to convert two things first:

gene counts(HTSeq and FeatureCounts) to gene level TPM and
transcripts TPM(STRINGtie) to gene level TPM .

How can i get succeed in this ? Thanks.

RNA-Seq next-gen gene rna-seq sequencing • 11k views

ADD COMMENT • link updated 5.7 years ago by vj ▴ 520 • written 5.7 years ago by k.kathirvel93 ▴ 300

0

Entering edit mode

Can you please tell, how did you manage to get transcript level TPM using stringtie?

Because, I need transcript level TPM but stringtie output I get has TPM at gene level only.

ADD REPLY • link 4.9 years ago by kousi31 ▴ 100

score 5 · Answer 1 · 2018-07-30

5

Entering edit mode

5.7 years ago

harish ▴ 450

For any such conversion, i.e summing upto gene level from transcript level, you can always use Tximport.

However since you have TPMs, you will definitely need to go to counts level and then rescale it back to gene-level.

In any case please do let us know how different they are!

ADD COMMENT • link 5.7 years ago by harish ▴ 450

score 5 · Answer 2 · 2018-07-30

5

Entering edit mode

5.7 years ago

i.sudbery 19k

@harnish is right that Tximport should be able to do transcript to gene level conversion, although I'm not sure if StringTie or express are sources it handle importing form automatically. I've not heard of Tximport calculating TPMs from counts, but I could be wrong. Still, these things arn't hard to calculate yourself.

TPM from counts

To calculate TPM first calculate the RPKM/FPKM for each gene. Actually, you only need F/RPK as the per million will come out in the wash. It also doesn't matter if you use pairs (F) or reads (R). That is counts/pairs mapping to a gene divided by the total exonic length of the gene.

Some more sophisticated algorithms will use an effective length rather than real length.

To convert this to TPM divide the FPKM of each gene by the the sum of FPKM for all genes and multiply by 1 million.

If I have a dataframe df with three columns gene_id, counts and length then TPM is calculated:

df$RPK <- df$counts/df$length
df$TPM <- df$RPK*1000000/sum(df$RPK)

Gene TPM from transcript TPM

As TPM is transcripts per million, the gene TPM is simply the some of the transcript TPMs for all transcripts belonging to that gene.

If our dataframe transcript_tpm has gene_id, transcript_id and TPM, then we calculate gene TPM using dplyr thus:

gene_tpm <- group_by(transcript_tpm, gene_id) %>% summarize(TPM = sum(TPM))

ADD COMMENT • link 5.7 years ago by i.sudbery 19k

0

Entering edit mode

Shouldn't you only group_by gene_id? All the tpm rows with the same gene_id then will be summed.

ADD REPLY • link 5.5 years ago by jperez9315 • 0

0

Entering edit mode

transcript_tpm is the name of the dataframe, not a column in it.

ADD REPLY • link 5.5 years ago by i.sudbery 19k

0

Entering edit mode

in my data I have transcript id, gene id and 6 columns samples (3 control and 3 experiment), how would u write the code for (summarize)? when I tried it, I ended up with only one column of total TPM. thanx

ADD REPLY • link 4.2 years ago by fabucklain1 • 0

0

Entering edit mode

Yes. TPM=sum(tpm) is calculating the per group sums of the TPM column. If you want the sums of more columns, you will need to sum them as well.

ADD REPLY • link 4.2 years ago by i.sudbery 19k

score 3 · Answer 3 · 2018-07-30

If I am not wrong the TPM (Transcripts Per Million) is normalised transcripts (mRNA molecules) for a gene or a isoform (see Read Mapping and abundance estimation section). So in theory your option 2 is not necessary, if you can get gene-level TPMs. StringTie should be giving you gene-level abundances in TPMs (using -A flag) so you can directly compare them to the TPMs from HTSeq-counts (using @i.sudbery equation).

score 0 · Answer 4 · 2018-07-30

0

Entering edit mode

5.7 years ago

Prakash ★ 2.2k

To convert gene count to TPM , you can use this R script and to get gene level TPM from transcript.

ADD COMMENT • link 5.7 years ago by Prakash ★ 2.2k