IGV interpretation of assembled and annotated transcriptome reads in an allotetraploid
0
0
Entering edit mode
5.1 years ago
ksgouros • 0

Hello all,

I am currently working on an allotetraploid suckerfish (Catostomidae) transcriptome. After sequencing (Ion Torrent), cleaning (FastQC and PrinSeq), assembling (Trinity), and annotating (Trinotate) everything against zebrafish files from Ensembl for comparison, my results in IGV look surprisingly sparse, given the >20 phred scores I constantly had post-sequencing and during initial quality control. Rather than showing complete transcripts, it appears to only show exons, and not necessarily the entire sections. Is there a setting to condense everything into a solid transcript read and complete the gene? Another question my lab has is to identify possible neo/subfunctionalization in duplicate genes, so if these reads are showing duplicates, how might I go about identifying them as such?

My biggest concern is that the data is sparse because most of the reads were short in length and may have potentially been filtered out during the initial assembly, because we came up against the technological limits of our institution's supercomputer's processing ability for a large volume of data (>60GB from the raw sequences alone). Most everything was done following Trinity and Trinotate's instructions on their GitHub tutorial pages. Did we 'throw the baby out with the bathwater,' so to speak?

https://ibb.co/TWQtqCy Creatine kinase muscle type A (ckma) exon fragments

https://ibb.co/w613XWY zoomed in view, duplicate assembled reads of an exon

Much thanks,

Kate (a somewhat scared PhD student)

RNA-Seq IGV transcriptome duplicate allotetraploid • 1.1k views
ADD COMMENT
0
Entering edit mode

It is not clear if you did a genome-guided assembly, or did a de novo assembly (edit: the first IGV screenshot indicates you performed genome-guided assembly).

If you performed a genome-guided assembly with zebrafish genome as reference, I think this could explain the pattern you are observing, as I guess both genomes are not very close. In this case, the 3'- and 5'-UTR regions probably diverged more than the exons, explaining their apparent absence at the final assembled transcripts.

I would suggest you do a de novo assembly and map the resulting transcripts to the zebrafish genome.

ADD REPLY

Login before adding your answer.

Traffic: 2568 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6