Question

What Is An Ideal Feature (Gene, Exons Or Transcripts) To Summarize Rnaseq Data ?

1

Entering edit mode

11.3 years ago

Khader Shameer 18k

Do you prefer gene, exons or transcripts or all 3 and which tool do you prefer and why ?

I think transcript would be an ideal unit for summarizing the count data. But I have seen studies that summarize count data in gene level and would like to know if there is any relative benefit in using one or the other.

PS. I wanted to post this as a poll, but unfortunately we don’t have that feature yet biostar - till then lets discuss it in the normal way.

rna-seq rna-seq rna expression differential • 4.7k views

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 11.3 years ago by Khader Shameer 18k

1

Entering edit mode

There isn't any single answer to this question. It all depends on what kind of biological question you're trying to answer with the RNAseq data. If I'm looking for differential exon usage due to spliceosome mutations, gene-level data is useless to me. If I'm trying to work with a huge network of genes, I may need to simplify my inputs and use gene-level metrics to make the problem tractable.

ADD REPLY • link 11.3 years ago by Chris Miller 22k

0

Entering edit mode

I agree with the context of question here. But I was amused by the trend of interpreting/summarizing RNASeq data in the level of genes and not in transcript level. Given that human genes have an average of ~3 transcripts/gene (Ensembl) and the function/biotype ( for example coding, non-coding NMD, intron retention, pseudogene etc http://vega.sanger.ac.uk/info/about/gene_and_transcript_types.html) of each of them varies summarizing in gene level seems to be a backward step. I was wondering if there is any method/study that compared by summarizing data in different levels. IMHO, it in level of genes without considering the transcript/biotype information may bias the results. I am still working on our RNASeq data to show this limitation in a systematic way. I wanted to know if anyone had looked at this more closely in a methodology or a large-scale analysis paper.

ADD REPLY • link 10.9 years ago by Khader Shameer 18k

Ram · Answer 1 · 2013-01-17

2

Entering edit mode

11.3 years ago

Istvan Albert 100k

I also think that the question needs to be formulated with respect to what is actually being measured.

A typical RNS-seq experiment measures counts over exons and exon junctions (if there are any). From that we may expand to transcript/gene models but that process will involve approximations and potential inconsistencies with respect of the real phenomena.

Measuring transcripts would be ideal, but that's not what the data is most of the time.

I personally think the ideal situation is to present and interpret data in the form that is closest to the form that was actually measured and thus minimize the assumptions. This is just an opinion.

ADD COMMENT • link 11.3 years ago by Istvan Albert 100k

0

Entering edit mode

Agree with this part: "A typical RNS-seq experiment measures counts over exons and exon junctions (if there are any). From that we may expand to transcript/gene models but that process will involve approximations and potential inconsistencies with respect of the real phenomena." I am looking specifically at that approach - do you have any reference ?

"Measuring transcripts would be ideal, but that's not what the data is most of the time." My assumption is that mature RNAs with a polyA tail should correspond to a transcript, and there could be similarities between different transcript and thats what we are capturing exon/junction level information. -- I agree with measuing transcript part here, but could you clarify why the data is different ?

ADD REPLY • link 11.3 years ago by Khader Shameer 18k

0

Entering edit mode

(Looks like I have never answered this one - found the post when searching something else.)

What I meant is that the transcript is fragmented into small pieces and these align to exons or junctions - so that is what we can quantify and not the entire transcript.

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by Istvan Albert 100k