What Is An Ideal Feature (Gene, Exons Or Transcripts) To Summarize Rnaseq Data ?
1
1
Entering edit mode
11.3 years ago

Do you prefer gene, exons or transcripts or all 3 and which tool do you prefer and why ?

I think transcript would be an ideal unit for summarizing the count data. But I have seen studies that summarize count data in gene level and would like to know if there is any relative benefit in using one or the other.

PS. I wanted to post this as a poll, but unfortunately we don’t have that feature yet biostar - till then lets discuss it in the normal way.

rna-seq rna-seq rna expression differential • 4.7k views
ADD COMMENT
1
Entering edit mode

There isn't any single answer to this question. It all depends on what kind of biological question you're trying to answer with the RNAseq data. If I'm looking for differential exon usage due to spliceosome mutations, gene-level data is useless to me. If I'm trying to work with a huge network of genes, I may need to simplify my inputs and use gene-level metrics to make the problem tractable.

ADD REPLY
0
Entering edit mode

I agree with the context of question here. But I was amused by the trend of interpreting/summarizing RNASeq data in the level of genes and not in transcript level. Given that human genes have an average of ~3 transcripts/gene (Ensembl) and the function/biotype ( for example coding, non-coding NMD, intron retention, pseudogene etc http://vega.sanger.ac.uk/info/about/gene_and_transcript_types.html) of each of them varies summarizing in gene level seems to be a backward step. I was wondering if there is any method/study that compared by summarizing data in different levels. IMHO, it in level of genes without considering the transcript/biotype information may bias the results. I am still working on our RNASeq data to show this limitation in a systematic way. I wanted to know if anyone had looked at this more closely in a methodology or a large-scale analysis paper.

ADD REPLY
2
Entering edit mode
11.3 years ago

I also think that the question needs to be formulated with respect to what is actually being measured.

A typical RNS-seq experiment measures counts over exons and exon junctions (if there are any). From that we may expand to transcript/gene models but that process will involve approximations and potential inconsistencies with respect of the real phenomena.

Measuring transcripts would be ideal, but that's not what the data is most of the time.

I personally think the ideal situation is to present and interpret data in the form that is closest to the form that was actually measured and thus minimize the assumptions. This is just an opinion.

ADD COMMENT
0
Entering edit mode

Agree with this part: "A typical RNS-seq experiment measures counts over exons and exon junctions (if there are any). From that we may expand to transcript/gene models but that process will involve approximations and potential inconsistencies with respect of the real phenomena." I am looking specifically at that approach - do you have any reference ?

"Measuring transcripts would be ideal, but that's not what the data is most of the time." My assumption is that mature RNAs with a polyA tail should correspond to a transcript, and there could be similarities between different transcript and thats what we are capturing exon/junction level information. -- I agree with measuing transcript part here, but could you clarify why the data is different ?

ADD REPLY
0
Entering edit mode

(Looks like I have never answered this one - found the post when searching something else.)

What I meant is that the transcript is fragmented into small pieces and these align to exons or junctions - so that is what we can quantify and not the entire transcript.

ADD REPLY

Login before adding your answer.

Traffic: 2870 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6