How to decide how many Iontorrent reads to run for contig assembly using Mira assembler?
0
0
Entering edit mode
5.4 years ago
DanielC ▴ 170

Dear Friends,

I am running Mira contig assembler on a iontorrent sequenced bacteriophage fastq file. The total number of reads in the fastq file is about 1800000; the average read length in the fastq file is 300, and the reference genome is unknown. To run the program efficiently, I have divided the fastq files into chunks of reads like "fastq1.fastq: has 10000 reads" etc. At present, among the fastq files I generated from the main fastq file, I am experimenting how many reads fastq file will give a better resultant contig. Ideally the best result should be just 1 contig. Could you please tell me how many reads I should run (given the information I have as aforementioned) to get the best resultant contig? Thanks much!

Mira assembler Iointorrent Reads contigs • 1.3k views
ADD COMMENT
0
Entering edit mode

Since you are working with a phage (assuming your DNA is pure phage) you are going to have a large amount of data which will oversample the DNA. Having too much coverage is not good to get good assemblies. You can either follow the method of incrementally adding reads or use a normalization method to intelligently look as the entire dataset at the same time.

ADD REPLY
0
Entering edit mode

Thanks genomax! If I have to do normalization, then should I do it on the main fastq file with 180000 reads? or the fastq files generated from the main fastq files with reads like 10000, 20000 etc? I would really appreciate your suggestion on this and the rational behind the selection. Thanks much!

ADD REPLY
0
Entering edit mode

Do the normalization with entire data.

ADD REPLY

Login before adding your answer.

Traffic: 2982 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6