Entering edit mode
8.1 years ago
Yuka Takemon
▴
40
Hello!
I am analysing RNAseq data using cufflinks/2.2.1 to run cuffdiff for two groups of samples WT vs KO. I am interested in knowing which of the isoforms of a corresponding gene is differentially expressed.
test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant
TCONS_00005883 XLOC_003696 Myo1c chr11:75651508-75674634 WT KO OK 0.695384 0.813635 0.226571 0.134985 0.7983 0.999954 no
TCONS_00005884 XLOC_003696 Myo1c chr11:75651508-75674634 WT KO OK 2.70472 5.00692 0.888442 2.3638 0.0006 0.327444 no
TCONS_00005885 XLOC_003696 Myo1c chr11:75651508-75674634 WT KO OK 65.5926 118.552 0.853912 3.30047 5.00E-05 0.0508103 no
I know that I should be looking at the TCONS id number, but which transcript is it actually referring to in relation to databases such as Ensembl? and how do can I look this up? In the above example for Myo1c, when you look this gene up on Ensembl there are 5 protein coding transcirpts, but there are only 3 in the cuffdiff output with different id systems.
I would appreciate your input.
Thank you,
Y
If you followed the Tuxedo pipeline, then I'd guess these are in the output from cuffmerge (e.g. merged.gtf). Some sample lines below: