Question

Why is my features different between featurecounts and cuffnorm?

0

Entering edit mode

5.4 years ago

obizx002 • 0

So im very new to this whole deal, and very new to computer science stuff in general. I trying to do RNA seq computation and seem to be running into an unusual problem (i think). I am running Slurm jobs in the terminal and my end results are weird. The job is a whole pipeline using bowtie2, then tophat, then cufflinks, cuff quant, and then featurecounts and cuff norm. The idea is to take the raw counts from featurecounts and use it in edgeR. I run cuffnorms at the end to get FPKM counts, just to get an idea before starting edgeR. I noticed that feature counts is outputting counts with about 25,000 gene or features, yet cuffnorms is outputing 57,000 gene or features. The whole pipeline is using the same .gff3 and .fa files from ensembl (mouse). Does anyone know why this is happening?

RNA-Seq rna-seq alignment gene genome • 1.4k views

ADD COMMENT • link updated 5.4 years ago by igor 13k • written 5.4 years ago by obizx002 • 0

score 3 · Accepted Answer · 2018-11-30

3

Entering edit mode

5.4 years ago

igor 13k

In general, some of the tools you are using such as Tophat and Cufflinks have been replaced by newer alternatives. I would suggest you look into some previous discussions here, such as:

Additionally:

The job is a whole pipeline using bowtie2, then tophat, then cufflinks, cuff quant, and then featurecounts and cuff norm.

Some of those steps are actually redundant. You only need Tophat and featureCounts to get the necessary results.

To answer your actual question:

I noticed that feature counts is outputting counts with about 25,000 gene or features, yet cuffnorms is outputing 57,000 gene or features. The whole pipeline is using the same .gff3 and .fa files from ensembl (mouse). Does anyone know why this is happening?

What is probably happening is that cufflinks adds unannotated transcripts.

ADD COMMENT • link 5.4 years ago by igor 13k

0

Entering edit mode

Predicting novel transcripts is the whole point of using cufflinks

ADD REPLY • link 5.4 years ago by Kristoffer Vitting-Seerup ★ 4.0k

0

Entering edit mode

Yes, but many people run it even if they are not interested in them. For example, the original poster expects the output to match the original GFF file (known genes only).

ADD REPLY • link 5.4 years ago by igor 13k

0

Entering edit mode

If that is the case I strongly reccomend using Salmon or Kallisto. Kallisto can be downloaded from here and the manual for running Kallisto can be found here. Salmon can be downloaded from here and a manual for running Salmon can be found here. I actually wrote a entire section about the considerations for usage of different quantification tools recently.

ADD REPLY • link 5.4 years ago by Kristoffer Vitting-Seerup ★ 4.0k