Biostar Beta. Not for public use.
TPM from StringTie
0
Entering edit mode
2.5 years ago
gozrom • 40

I have extracted all the TPM values from gtf files generated by StringTie for all replicates, however Those TPM values are per transcript and not per gene.

Now I have one huge csv file with 12 replicates and their corresponding TPM values and I want to make the TPM values per gene to use it in a subsequent visualization.

File looks like this:

 X1    TPM transcript_id ref_gene_name  TPM.1 transcript_id.1 ref_gene_name.1  TPM.2 transcript_id.2
   <int>  <dbl> <chr>         <chr>          <dbl> <chr>           <chr>            <dbl> <chr>          
 1     1  2.60  MSTRG.1.1     <NA>           3.78  MSTRG.1.1       <NA>             4.22  MSTRG.1.1      
 2     2 NA     MSTRG.1.1     <NA>          NA     MSTRG.1.1       <NA>            NA     MSTRG.1.1      
 3     3  2.01  MSTRG.2.1     <NA>           1.17  MSTRG.2.1       <NA>             1.48  MSTRG.2.1      
 4     4 NA     MSTRG.2.1     <NA>          NA     MSTRG.2.1       <NA>            NA     MSTRG.2.1      
 5     5  0.402 ENSMUST00000~ Gm10568        0.316 ENSMUST0000019~ Gm10568          0.183 ENSMUST0000019~
 6     6 NA     ENSMUST00000~ Gm10568       NA     ENSMUST0000019~ Gm10568         NA     ENSMUST0000019~
 7     7  0.253 ENSMUST00000~ Gm7357         0.    ENSMUST0000020~ Rp1              2.66  ENSMUST0000018~
 8     8 NA     ENSMUST00000~ Gm7357        NA     ENSMUST0000020~ Rp1             NA     ENSMUST0000018~
 9     9 NA     ENSMUST00000~ Gm7357        NA     ENSMUST0000020~ Rp1              0.    ENSMUST0000019~
10    10  0.182 ENSMUST00000~ Gm6119        NA     ENSMUST0000020~ Rp1             NA     ENSMUST0000019~
 ... with 1,135,291 more rows,

Not sure, how to do that, if it's possible at all...

I guess it can be a for loop that runs on each ref_gene_name and sums up all the TPM from the TPM column before but I need it to run on all ref_gene_columns and create appropriate columns in a new data frame, and then export the new data frame to csv file. The code it's just to illustrate the idea, it doesn't mean it is the correct code....

df <- as.data.frame.matrix(df)
i=2     
for i  to i=file$ref_gene_name$end
{
if ref_gene_name$i == ref_gene_name$(i+1)
df$gene$i <- file$ref_gene_name$i
df$condition1.TPM <- file$TPM$i + file$TPM$(i+1)
i+1
if df$gene$i == file$TPM$(i+1)
df$condition1.TPM <- df$condition1.TPM + file$TPM$(i+1)
df$gene$i <- file$ref_gene_name$i
}

Any help is appreciated, thank you.

RNA-Seq • 1.3k views
ADD COMMENTlink
0
Entering edit mode

For me it is hard to fully understand the data format and what you tried, but I can give a generic advice. Give a look at the R function aggregate. If you have a simple structure with all the transcripts and genes and tmp in a single dataframe and want to sum TPM of the same gene, just try something like this:

aggregate(df$TPM,by=list(df$gene_name))
ADD REPLYlink
0
Entering edit mode

Thanks, that seems simpler than what I wrote,

I tried aggregate but got an error:

Error in match.fun(FUN) : 'length(genes_list$ref_gene_name)' is not a function, character or symbol If I run length(genes_list$ref_gene_name) as is it gives me the length of the specific column.

but when I do it through aggregate

TEST <- aggregate(gene_list$TPM,by=list(gene_list$ref_gene_name), FUN = length(gene_list$ref_gene_name))

I get an error.

ADD REPLYlink
0
Entering edit mode

Figured the error

when I substitute the FUN argument to any of a functional definition it works, but it only aggregate gene names without showing TPM values...

I need both I need the sum of all the TPM values from all the transcripts specific to each gene, and also the gene list

ADD REPLYlink
0
Entering edit mode

Can you please tell me how you filltered out TPM values from stringtie output?

ADD REPLYlink
0
Entering edit mode

Hi, I would also be interested in the same (but actually at the transcript level). Is there a convenient way to extract all the TPM values for all transcripts for all samples to feed in to Ballgown DE analysis? Thank you very much.

ADD REPLYlink
0
Entering edit mode

I think you can use -A flag when you do the counting

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1