I am currently working on an allotetraploid suckerfish (Catostomidae) transcriptome. After sequencing (Ion Torrent), cleaning (FastQC and PrinSeq), assembling (Trinity), and annotating (Trinotate) everything against zebrafish files from Ensembl for comparison, my results in IGV look surprisingly sparse, given the >20 phred scores I constantly had post-sequencing and during initial quality control. Rather than showing complete transcripts, it appears to only show exons, and not necessarily the entire sections. Is there a setting to condense everything into a solid transcript read and complete the gene? Another question my lab has is to identify possible neo/subfunctionalization in duplicate genes, so if these reads are showing duplicates, how might I go about identifying them as such?
My biggest concern is that the data is sparse because most of the reads were short in length and may have potentially been filtered out during the initial assembly, because we came up against the technological limits of our institution's supercomputer's processing ability for a large volume of data (>60GB from the raw sequences alone). Most everything was done following Trinity and Trinotate's instructions on their GitHub tutorial pages. Did we 'throw the baby out with the bathwater,' so to speak?
https://ibb.co/TWQtqCy Creatine kinase muscle type A (ckma) exon fragments
https://ibb.co/w613XWY zoomed in view, duplicate assembled reads of an exon
Kate (a somewhat scared PhD student)