cufflinks output file isoforms.fpkm_tracking has mixed ENSGxxx and ENSTxxx ID in the tracking_id column
1
0
Entering edit mode
9.8 years ago
Xianjun ▴ 310

I am not sure if anyone has similar observation. When I ran Cufflinks with standard option (e.g. -G $Gencode_GTF -M $Mask_GTF --compatible-hits-norm --multi-read-correct), in its output isoforms.fpkm_tracking file, I expected only Transcript ID (i.e. ENSTxxxxxx for human Gencode GTF) in the tracking_id column, but actually I saw every gene has a row with ENSGxxxxx there (see the first line in the grep output below). And its length seems abnormally long. Does anyone have clue? I am still waiting for reply from the cufflinks group.

$grep -w -P "tracking_id|XPR1" isoforms.fpkm_tracking
tracking_id    class_code    nearest_ref_id    gene_id    gene_short_name    tss_id    locus    length    coverage    FPKM    FPKM_conf_lo    FPKM_conf_hi    FPKM_status
ENSG00000143324.9    -    -    ENSG00000143324.9    XPR1    -    chr1:180601139-180859387    258248    0.209926    0.20419    0.188323    0.220131    OK
ENST00000367590.4    -    -    ENSG00000143324.9    XPR1    -    chr1:180601139-180859387    8474    3.30152    3.2113    2.81811    3.61095    OK
ENST00000367589.3    -    -    ENSG00000143324.9    XPR1    -    chr1:180601167-180855262    4126    2.83119    2.75383    2.21059    3.2843    OK
ENST00000498177.1    -    -    ENSG00000143324.9    XPR1    -    chr1:180805787-180843191    485    2.0379e-07    1.98221e-07    0    0.268656    OK
ENST00000464817.1    -    -    ENSG00000143324.9    XPR1    -    chr1:180832899-180847426    567    0.54371    0.528852    0    1.03411    OK
ENST00000467345.1    -    -    ENSG00000143324.9    XPR1    -    chr1:180856707-180857584    736    3.28404e-06    3.1943e-06    0    0.23211    OK
Cufflinks isoforms.fpkm_tracking RNA-Seq • 3.9k views
ADD COMMENT
0
Entering edit mode

What does your Gencode_GTF file contain?

ADD REPLY
0
Entering edit mode

it's typical gtf file download from gencode.org

ADD REPLY
0
Entering edit mode
9.8 years ago
Bert Overduin ★ 3.7k

There are many different GTF files on gencode.org, so I don't know which one you're referring to. I have no experience with Cufflinks, but I would assume that if your GTF file not only contains transcripts, but also genes, you would get output for both (but maybe I'm wrong ....). I would suggest you have a look whether your GTF file contains any genes and if so, remove them, run Cufflinks again and see what that gives you as a result.

By the way, 258248 is the length of the whole (unspliced) ENSG00000143324 gene.

ADD COMMENT
0
Entering edit mode

Thank you for your reply, Bert.

You are right; the GTF file does contain both gene and transcript, they are organized in a hierarchical way (e.g. gene --> transcripts --> exons).

But given the fact that cufflinks outputs both Transcript-level expression and Gene-level expression separately. The transcript-level expression file (they called isoforms.fpkm_tracking) should not contain any gene level expression. But thanks for your clue on the length, they may also compute the expression level of the unspliced form and put in the isoforms.fpkm_tracking file. If true, it would be good the Cufflinks team state it somewhere in the document.

ADD REPLY

Login before adding your answer.

Traffic: 1866 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6