Question

comparing an assembly against close relatives

2

Entering edit mode

5.0 years ago

Assa Yeroslaviz ★ 1.8k

I would like to ask for recommendations/workflows on how to asses the results of our de-novo assembly from a trinity run.

We have done a Trinity run and got a transcripts.fasta files with the assembled transcripts.

Now we would like to compare these transcripts to an annotated genome of a close relative (which has a gtf file) to try and answer two questions.

To asses the quality of our trinity output for completeness and correctness.
To try and identify functionality of our newly assembled transcript via sequence homology.

I would appreciate some suggestions as to how to do both of these assignments.

thanks

Assa

de-novo assembly blast trinity • 1.6k views

ADD COMMENT • link updated 5.0 years ago by colindaven 6.3k • written 5.0 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

If you have a close relative reference genome, why do you consider necesary to do a de novo assembly? which is the purpose? obtain the transcriptome? or evaluate the assembler performance?

ADD REPLY • link 5.0 years ago by Buffo ★ 2.4k

0

Entering edit mode

the first one. I would like to obtain the annotated transcriptome of the said organism.

Would BLASTing my transcripts from Trinity (Trinity.fasta) against the genome (genome.fa file) of the close relative would give me good enough results?

ADD REPLY • link 5.0 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

If you need the transcriptome and the reference genome is closely-related and well annotated, you can just quantify the transcript expression using StrinTie and also look for new isoforms more than compare assembled transcripts.

ADD REPLY • link 5.0 years ago by Buffo ★ 2.4k

0

Entering edit mode

I have already done two things.

mapping the fastq files against the close relative (this gave me good results)
run a de-novo assembly using Trinity to create a transcript.fasta file with the assembled transcripts. this was followed by running bowtie of the fastq files against the indexed transcript.fasta file.

From both runs i got a bam file, which is needed by StringTie If I run StringTie against the first bam file, I won't have the annotations I need, but running StringTie with the second bam file - I don't have an annotation file (gtf), so I am not sure if this make sense. Can I use as input the bam file from the de-novo assembly, but takes the annotation file from the close relative?

ADD REPLY • link 5.0 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

Run Stringtie with the bam of step 1 and the GENOME annotation (GTF, GFF or GFF3), you do not need an extra annotation. Bam file contain the coordinates where each read is aligned to the genome, stringtie just count how many reads align to each annotated gene and calculates relative expression (TPM and FPKM). You can also count the mapped reads (to each annotated gene of bam1 ) with htseq-count and then calculate relative expression.

ADD REPLY • link 5.0 years ago by Buffo ★ 2.4k

0

Entering edit mode

But I'm not primarily interested in counting the expression. This is only a secondary results of the analysis.

I am more interested in creating an annotation file for my genome, which has no annotations. I would like to try comparative genomics which can assign/predict functionality to my transcripts( I was thinking something like via BLAST or other ORF-reading comparison tools).

ADD REPLY • link 5.0 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

you can also use stringtie or cufflinks to annotate your transcripts to the genome. To assign potential functionality to your transcripts you can use blast, but I recommend you Blast2GO, it is more easy to handle and can perform exactly you are looking for.

ADD REPLY • link 5.0 years ago by Buffo ★ 2.4k

score 1 · Answer 1 · 2019-04-14

1

Entering edit mode

5.0 years ago

colindaven 6.3k

What I used to do when I did a lot of this type of transcriptome assembly was the following

map trinity results (many different parameters, or clustered, or various organs) against the genome with gmap, using the very nice GFF3 out out option.
Manually compare (or get biologists to compare transcripts and regions of interest, even better) the different assemblies. You can get an impression very fast of which sets of results look best.
Use transdecoder to get sets of CDS, amino acids etc from the trinity assembly.

Worked pretty well. Functionally annotating the FASTA outputs of transdecoder was always highly compute intensive ....

You might also (re)annotate the genome using Maker with the evidence from the Trinity assemblies and Transdecoder steps.

Also, providing results iteratively to your collaborators via eg a local JBrowse will allow you to improve the transcripts and provide versioning.

ADD COMMENT • link 5.0 years ago by colindaven 6.3k

0

Entering edit mode

Thanks for the suggestion. I was already planning on running either StringTie or gmap. But just for clarifications - do you mean using the results of the trinity run (e.g. Trinity.fasta) to map against the indexed genome of the close relative?

something like that:

gmap_build -d Genome -D ./indexedFolder -k 13 closeRelative.fa
gmap -n 0 -D ./indexedFolder -d Genome ../trinity/trinity_output/Trinity.fasta -f gff3_gene > trinity_gmap.gff

How can I than make sense out of the gff file? Did you use transdecoder on the gff file as well?

ADD REPLY • link 5.0 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

From memory that looks reasonable. You might play around with the -n parameter to exclude junk too.

Making sense out if requires your eyeballing after import into a genome browser. That's why I mentioned JBrowse, which is excellent for comparing multiple tracks. You can import the GFF3 and use the server or standalone version.

Of course, you'll also need to import the GTF of your close relative too for comparison.

Hopefully that will allow you to see if your assembly is overly fragmented or reasonable.