Dear Friends,
I have assembled a phage (represented by a single fastq file) contigs using SPAdes. Reference phage genome is unknown. I divided the fastq file into separate files with 25000 reads each. Assembled the reads using Mira and Spades. Spades gave good results with the largest contig being 88300 bases (I selected the best 20 contigs of over 80000 bases). Then I performed BLAST on the contigs and got the best hit with over 95% identity; hence I know the phage the contig could belong to. Next, I want to get a consensus of the best 20 contigs obtained from assembler - can you please let me know how can I perform that? If anything is not clear, please let me know.
After which, I will perform genome alignment of the contigs with the best BLAST hit using "Mauve" and then use DNAmaster to annotate the genes in the contigs. I very welcome your suggestions on the steps too.
Thanks, DK
Why would you separate the reads out like that? The assembler will intrinsically give you back the best “consensus” contigs, thats it’s job.
Thanks for your comments! With separating the reads it gives better contig results because apparently the assembler algorithms perform better with small chunk of reads.
That’s usually only true if the depth of coverage is astronomically high, and even then it’s not guaranteed to cause issues. It usually matters a lot less for smaller sequence like phage. I would assemble all your reads first and see what you get before you start arbitrarily breaking them up.