TPM from StringTie
0
1
Entering edit mode
6.0 years ago
gozrom ▴ 80

I have extracted all the TPM values from gtf files generated by StringTie for all replicates, however Those TPM values are per transcript and not per gene.

Now I have one huge csv file with 12 replicates and their corresponding TPM values and I want to make the TPM values per gene to use it in a subsequent visualization.

File looks like this:

 X1    TPM transcript_id ref_gene_name  TPM.1 transcript_id.1 ref_gene_name.1  TPM.2 transcript_id.2
   <int>  <dbl> <chr>         <chr>          <dbl> <chr>           <chr>            <dbl> <chr>          
 1     1  2.60  MSTRG.1.1     <NA>           3.78  MSTRG.1.1       <NA>             4.22  MSTRG.1.1      
 2     2 NA     MSTRG.1.1     <NA>          NA     MSTRG.1.1       <NA>            NA     MSTRG.1.1      
 3     3  2.01  MSTRG.2.1     <NA>           1.17  MSTRG.2.1       <NA>             1.48  MSTRG.2.1      
 4     4 NA     MSTRG.2.1     <NA>          NA     MSTRG.2.1       <NA>            NA     MSTRG.2.1      
 5     5  0.402 ENSMUST00000~ Gm10568        0.316 ENSMUST0000019~ Gm10568          0.183 ENSMUST0000019~
 6     6 NA     ENSMUST00000~ Gm10568       NA     ENSMUST0000019~ Gm10568         NA     ENSMUST0000019~
 7     7  0.253 ENSMUST00000~ Gm7357         0.    ENSMUST0000020~ Rp1              2.66  ENSMUST0000018~
 8     8 NA     ENSMUST00000~ Gm7357        NA     ENSMUST0000020~ Rp1             NA     ENSMUST0000018~
 9     9 NA     ENSMUST00000~ Gm7357        NA     ENSMUST0000020~ Rp1              0.    ENSMUST0000019~
10    10  0.182 ENSMUST00000~ Gm6119        NA     ENSMUST0000020~ Rp1             NA     ENSMUST0000019~
 ... with 1,135,291 more rows,

Not sure, how to do that, if it's possible at all...

I guess it can be a for loop that runs on each ref_gene_name and sums up all the TPM from the TPM column before but I need it to run on all ref_gene_columns and create appropriate columns in a new data frame, and then export the new data frame to csv file. The code it's just to illustrate the idea, it doesn't mean it is the correct code....

df <- as.data.frame.matrix(df)
i=2     
for i  to i=file$ref_gene_name$end
{
if ref_gene_name$i == ref_gene_name$(i+1)
df$gene$i <- file$ref_gene_name$i
df$condition1.TPM <- file$TPM$i + file$TPM$(i+1)
i+1
if df$gene$i == file$TPM$(i+1)
df$condition1.TPM <- df$condition1.TPM + file$TPM$(i+1)
df$gene$i <- file$ref_gene_name$i
}

Any help is appreciated, thank you.

RNA-Seq • 5.1k views
ADD COMMENT
0
Entering edit mode

For me it is hard to fully understand the data format and what you tried, but I can give a generic advice. Give a look at the R function aggregate. If you have a simple structure with all the transcripts and genes and tmp in a single dataframe and want to sum TPM of the same gene, just try something like this:

aggregate(df$TPM,by=list(df$gene_name))
ADD REPLY
0
Entering edit mode

Thanks, that seems simpler than what I wrote,

I tried aggregate but got an error:

Error in match.fun(FUN) : 'length(genes_list$ref_gene_name)' is not a function, character or symbol If I run length(genes_list$ref_gene_name) as is it gives me the length of the specific column.

but when I do it through aggregate

TEST <- aggregate(gene_list$TPM,by=list(gene_list$ref_gene_name), FUN = length(gene_list$ref_gene_name))

I get an error.

ADD REPLY
0
Entering edit mode

Figured the error

when I substitute the FUN argument to any of a functional definition it works, but it only aggregate gene names without showing TPM values...

I need both I need the sum of all the TPM values from all the transcripts specific to each gene, and also the gene list

ADD REPLY
0
Entering edit mode

Can you please tell me how you filltered out TPM values from stringtie output?

ADD REPLY
0
Entering edit mode

Hi, I would also be interested in the same (but actually at the transcript level). Is there a convenient way to extract all the TPM values for all transcripts for all samples to feed in to Ballgown DE analysis? Thank you very much.

ADD REPLY
0
Entering edit mode

I think you can use -A flag when you do the counting

ADD REPLY

Login before adding your answer.

Traffic: 2485 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6