[solved] Convert fold changes of multiple transcripts of a given gene to a single value characterising this gene
0
0
Entering edit mode
7.6 years ago
biostart ▴ 370

Hello,

After a typical RNA-seq workflow performed by a collaborator I have a table with expression fold change for all transcripts. I have a subset of promoters of interest based on some other data, and for this subset of promoters I want to report fold changes of the corresponding genes. Question: how to integrate fold changes of multiple transcripts of a given gene to a single value characterising this gene? I am looking for a typical (well accepted in RNA-seq community) solution for this situation.

Thanks!

RNA-Seq • 2.2k views
ADD COMMENT
2
Entering edit mode

Dear biostart, Hi.

I think some of the problems have not a typical all-accepted solution , yet.

Some programs, e.g Trinity will check the DEG in two levels, the isoforms (transcripts) and genes. But the problems still remains as when you need to annotate the so-called "genes" you have to annotate the transcripts of that gene ! by the way, I think the more important criteria here is FDR but not FC. So, if you have performed DE analysis, if some genes (transcripts) situated in your FDR threshold, then you can discus about their FC.

Another way is to look after the genes that has only one isoforms which (according to the organism) would be rare among your huge number of assembled transcripts.

Some other programs same as CD-HIT-EST , Corset and RapClust will cluster the isoforms and report one isoform (that most of the time is the longest isoform - the longest transcript isn't always the 'best' transcript - or isoform with the longest ORF among the isoforms of a given gene) as representative of that gene.

In addition, you can filter out transcripts that have little read support (e.g collect the one with >= ~1 fpkm)

Hope I undrestand your question correctly and this helps.

ADD REPLY
2
Entering edit mode

I don't think there is a perfect solution for this, you probably need more information. For example, if you would have the (average) expression levels of the transcripts you could calculate a weighted average.

But perhaps the best approach would be to get access to the raw data and perform a "standard differential expression analysis" on the gene level, rather than on the transcript level.

ADD REPLY
1
Entering edit mode

Dear WouterDeCoster, Hi. I am curious that if there is some statistical way to sum up the fold change of the isoforms of a gene and then calculate a mean value for it? Although I think this kind of calculations will disrupt the biological concept of the whole project.

ADD REPLY
1
Entering edit mode

You can try tximport followed by edgeR, DESeq2 or limma voom to get gene-level fold change if you have raw transcript-level input.

ADD REPLY
0
Entering edit mode

If I understood correctly, for now he only has fold changes...

ADD REPLY
0
Entering edit mode

OPs question involves aggregating the fold changes of multiple transcripts so I propose to use the average transcript expression level to calculate a weighted average.

ADD REPLY
0
Entering edit mode

Would it be considered as "commonly accepted" thing to do?

ADD REPLY
2
Entering edit mode

The commonly acceptable way of doing this would be to get access to the raw data and perform analysis on the gene level rather than on the transcript level, completely avoiding the problem you have now.

ADD REPLY
1
Entering edit mode

of course the characteristic of assembler program and the algorithm it uses has some effect on this situation. for example in Trinity de novo assembly, you can have one FC for each gene (if you perform DE analysis on the gene level) but for annotation, you will need to work with isoforms !

ADD REPLY
0
Entering edit mode

yeah... I finally had to request the raw data, problem solved :)

ADD REPLY

Login before adding your answer.

Traffic: 1986 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6