Biostar Beta. Not for public use.
Is there any differences between tophat, cufflinks command with and without GTF file?
0
Entering edit mode
16 months ago
United States

Dear All,

I have a query regarding the gene annotation file (GTF).

1) Tophat command without GTF:

$tophat -p 8 --library-type fr-firststrand -o tophat_out reference_genome sample1_r1.fq sample1_r2.fq  2) Tophat command with GTF: $ tophat -p 8 --library-type fr-firststrand -G genes.gtf -o tophat_out reference_genome sample1_r1.fq sample1_r2.fq


What is the difference between the two tophat commands?

3) Cufflinks command without GTF:

$cufflinks -p 8 -o cufflinks_out tophat_out/accepted_hits.bam  4) Cufflinks command with GTF: $ cufflinks -p 8 -G gene.gtf -o cufflinks_out tophat_out/accepted_hits.bam


What is the difference between the two cufflinks commands?

Scenario 1: (Tophat command without GTF and Cufflinks command with GTF)

**$tophat -p 8 --library-type fr-firststrand -o tophat_out reference_genome sample1_r1.fq sample1_r2.fq** **$ cufflinks -p 8 -G gene.gtf -o cufflinks_out tophat_out/accepted_hits.bam**


Scenario 2: (Tophat command with GTF and Cufflinks command without GTF)

_$tophat -p 8 --library-type fr-firststrand -G genes.gtf -o tophat_out reference_genome sample1_r1.fq sample1_r2.fq_ _$ cufflinks -p 8 -o cufflinks_out tophat_out/accepted_hits.bam_


Scenario 3: (Tophat command with GTF and Cufflinks command with GTF)

_$tophat -p 8 --library-type fr-firststrand **-G genes.gtf** -o tophat_out reference_genome sample1_r1.fq sample1_r2.fq_ _$ cufflinks -p 8 **-G genes.gtf** -o cufflinks_out tophat_out/accepted_hits.bam_


Scenario 4: (Tophat command without GTF and Cufflinks command without GTF)

_$tophat -p 8 --library-type fr-firststrand -o tophat_out reference_genome sample1_r1.fq sample1_r2.fq_ _$ cufflinks -p 8 -o cufflinks_out tophat_out/accepted_hits.bam_


What is the difference between scenario1, scenario2, scenario3 and scenario4?

Does the output of scenario1, scenario2, scenario3 and scenario4 are same or different?

ADD COMMENTlink
1
Entering edit mode

Have you read the manual?

ADD REPLYlink
0
Entering edit mode

Hi Devon,

I read the manual, but still I was not clear.

ADD REPLYlink
0
Entering edit mode

Did Chirag's reply clarify things?

ADD REPLYlink
0
Entering edit mode

Hi Devon,

I have a better understanding now.

The reason why I have 4 different scenarios is, I have seen from different posts that people use these different combinations.

I am currently running all these 4 different combinations in my system. As of now I dint see my results.

So, I would like to know what should I expect from the output files of above 4 scenarios.

ADD REPLYlink
0
Entering edit mode

In general, if your organism has a decent annotation then you'll get better results if you use it.

ADD REPLYlink
2
Entering edit mode
16 months ago
Chirag Parsania ♦ 1.4k
University of Macau

Hi,

Find your few of the answers below

Que. 1 What is the difference between the two tophat commands?

Ans. When you run tophat with gtf file first it will build transcriptome by reading the information from gtf file. Then it will do alignment with transcriptome and not whole genome. Once it finishes alignment with transcriptoe remaining reads it will align with genome. That's how your alignment will be faster and it's a kind of guided alignment

Que.2 What is the difference between the two tophat commands?

Ans. Again answer is the same as I mentioned above. It will guide cufflink to build assembly. In your final output you will have both things known as well novel transcripts built from your data.

Please refer this http://cole-trapnell-lab.github.io/cufflinks/cufflinks/index.html

Hope other two you can solve by yourself

Cheers,

Chirag

ADD COMMENTlink
0
Entering edit mode

Thanks Chirag for your explanation.

1) Tophat command without GTF: Align the reads directly to reference genome. Generated accepted_hits.bam file will consider all mapping as novel exon-exon junctions.

2) Tophat command with GTF: Based on GTF file a junction database is created. Then TopHat will align reads that do not map within an exon against the junction database to identify spliced read alignments. If the alignment is still not found in junction DB it will consider as novel exon-exon junction. Generated accepted_hits.bam file will have two mappings one is spliced based on GTF and novel exon-exon junction.

I am clear with tophat now. But I have a doubt in cufflinks -G GTF and -g GTF?

ADD REPLYlink
0
Entering edit mode

I think cufflink only has the -g option. Basically, what cufflinks try to perform was to try to build a transcript GTF file based on your data. Without the -g option, cufflinks will assemble the transcript based only on your reads. With the GTF file, it will perform a guided assembly, kind of like performing denovo assembly with a reference genome.

ADD REPLYlink
0
Entering edit mode

Hi Sam,

Thanks for your explanation. I am getting it.

Does the output from cufflinks with GTF and without GTF differ?

I have a GTF file for mouse. Then which of the above scenarios should be used for my analysis?

ADD REPLYlink
0
Entering edit mode

I can ensure you that you get a completely different output. Probed

ADD REPLYlink
0
Entering edit mode

Yes, most likely a different output will be generated. If you are working on mouse, use the mouse GTF so that you can perform the guided assembly.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1