Hello, I was given all the outputs of the result of a whole genome sequencing of a bacteria, and there is a lot of them and I don’t understand which one of these I should use to do a novo assembly:
3 subreads.fastq
3 subreads.fasta
3 bax.h5
1 bas.h5
1 mcd.h5
1 metadata.xml
I was thinking using SMRT link to do the analysis , but I don’t know which one of these I should use.
Thanks in advance.
Most (if not all) long-read assembler software you will come across will require the files called subreads. Depending on the tool used it needs to be .fasta or .fastq.
You mean the HGAP assembler of SMRT link?
that apparently needs bam files as input (see also here ) , you should have those as well btw, they are part of the default output of the PacBio machines/protocol. however, you can easily convert the fastq to bam though
Keep in mind though that this is a merely technical issue as (at least for the recent PacBio data), the quality-values in the fastq files have no meaning anymore as they are no longer used in the context of PacBio.
Yes it's HGAP, the thing is that I don't understand what file give in the file manager
doesn't correspond to what I have ( Or i did't understand something --')
I no longer have access to a SMRTlink install but looking at the manual (page 30) it looks like you will need to get the XML files from your provider since your install is not linked to the instrument. This sounds like RSII data (not sequel)?
Considering that you may want to move ahead with flye/canu for now while you wait to get the right data from the sequence provider.
I have no hands-in experience with SMRT link toolbox but I would already suggest to go for the local file system option (in stead of the SMRT link server one ).
the SMRT link toolbox is a huge box with many different types of analysis in
I answered your question on SeqAnswers. Will post a part here.
There are other options like flye (https://github.com/fenderglass/Flye ) and canu (https://github.com/marbl/canu) that may be better than SMRTlink and they can use the fastq/fasta files.