Gene-level vs. Transcript-level quantification
2
5
Entering edit mode
6.2 years ago
pedrodcb ▴ 90

What is the difference between gene-level and transcript-level quantification?

Is gene-level quantification performed on genome orf sequences and transcription-level on mRNA sequences with introns removed and isoforms taken into account or I'm I getting this all wrong? If it's right where do I get transcripts reference sequences, because usually only genome annotations exists.

RNA-Seq alignment next-gen sequencing • 17k views
ADD COMMENT
7
Entering edit mode
6.2 years ago
h.mon 35k

The difference between gene-level and transcript level quantification is, well, that gene-level summarizes counts over genes, and transcrpt-level summarizes counts over transcripts.

Both gene-level and transcript-level may be calculated in several ways:

1) mapping to the genome and using an annotation to count reads overlapping the features of interest. The difference is how multi-mapping reads treated: in general they are discarded when summarizing genes directly, and apportioned using an expectation-maximization algorithm when summarizing over transcripts.

2) mapping to the transcriptome (with all isoforms from each gene represented as sequences). Counts are apportioned using an expectation-maximization algorithm, and counts from all isoforms from each gene are summed-up if summarizing at the gene-level.

If you want to use the transcriptome to do the quantification, Ensembl provides fasta downloads for (coding and non-coding) transcript sequences, or you can extract transcript sequences from a genome and its annotation - gffread from StringTie and rsem-prepare-reference from RSEM are two programs to perform this task.

ADD COMMENT
6
Entering edit mode
6.2 years ago

There's a difference between the read alignment step (which needs the actual sequence) and the quantification (which basically just counts the numbers of reads overlapping with defined loci, i.e., genes or transcripts). The standard workflow for model organisms with well established genome sequences and annotation is to:

  1. align to the genome (sequence in a fasta file), perhaps using transcriptome information (usually a gtf file), using a read alignment tool such as STAR
  2. count reads overlapping with genes, where genes are often defined as the sum of all exons for all transcripts of a given gene (introns are usually excluded), typical tools for this step are featureCounts or HTSeqCounts.

I have the feeling that your original confusion may stem from the rise of kallisto and salmon, which are being sold as tools for transcript quantification. These tools tend to not do the traditional read alignment, instead they try to focus on the sequence representing the transcriptome only and perform "pseudoalignment" and quantification. How to obtain the transcriptome sequence is well described here.

ADD COMMENT
0
Entering edit mode

Hi Friederike! I think that's exactly why I got confused. If I understand well, kallisto and salmon will replace the alignment to genome step (for instance using STAR) and instead just return quantification information for whatever transcripts are provided (could be entire chromosomes or let's say gene orfs) is that correct?

ADD REPLY
0
Entering edit mode

I believe they will take whatever sequence file you provide them as is, yes. But I haven't explored that in detail.

ADD REPLY

Login before adding your answer.

Traffic: 1858 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6