Biostar Beta. Not for public use.
Question: IGV interpretation of assembled and annotated transcriptome reads in an allotetraploid
0
Entering edit mode

Hello all,

I am currently working on an allotetraploid suckerfish (Catostomidae) transcriptome. After sequencing (Ion Torrent), cleaning (FastQC and PrinSeq), assembling (Trinity), and annotating (Trinotate) everything against zebrafish files from Ensembl for comparison, my results in IGV look surprisingly sparse, given the >20 phred scores I constantly had post-sequencing and during initial quality control. Rather than showing complete transcripts, it appears to only show exons, and not necessarily the entire sections. Is there a setting to condense everything into a solid transcript read and complete the gene? Another question my lab has is to identify possible neo/subfunctionalization in duplicate genes, so if these reads are showing duplicates, how might I go about identifying them as such?

My biggest concern is that the data is sparse because most of the reads were short in length and may have potentially been filtered out during the initial assembly, because we came up against the technological limits of our institution's supercomputer's processing ability for a large volume of data (>60GB from the raw sequences alone). Most everything was done following Trinity and Trinotate's instructions on their GitHub tutorial pages. Did we 'throw the baby out with the bathwater,' so to speak?

https://ibb.co/TWQtqCy Creatine kinase muscle type A (ckma) exon fragments

https://ibb.co/w613XWY zoomed in view, duplicate assembled reads of an exon

Much thanks,

Kate (a somewhat scared PhD student)

ADD COMMENTlink 10 months ago ksgouros • 0 • updated 10 months ago Biostar 20
Entering edit mode
0

It is not clear if you did a genome-guided assembly, or did a de novo assembly (edit: the first IGV screenshot indicates you performed genome-guided assembly).

If you performed a genome-guided assembly with zebrafish genome as reference, I think this could explain the pattern you are observing, as I guess both genomes are not very close. In this case, the 3'- and 5'-UTR regions probably diverged more than the exons, explaining their apparent absence at the final assembled transcripts.

I would suggest you do a de novo assembly and map the resulting transcripts to the zebrafish genome.

ADD REPLYlink 10 months ago
h.mon
25k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0