Biostar Beta. Not for public use.
Using Simngs To Generate Reads
4
Entering edit mode
18 months ago
Travis ♦ 2.8k
USA

Hi,

Does anyone have experience of using simNGS (http://www.ebi.ac.uk/goldman-srv/simNGS/) to generate simulated Illumina reads?

I am attempting to generate a set of 5 million test reads using the human genome as input. The program seems to only generate one read (or read pair) per sequence i.e. chromosome. I have tried to find some means of altering the parameters to get more reads but I have been unsuccessful.

Can anyone advise?

ADD COMMENTlink
8
Entering edit mode
13 months ago
Botond Sipos ♦ 1.7k
United Kingdom

The simNGS packages comes with two binaries:

  • simLibrary, which simulates library construction. It takes as input the reference genome and outputs fragments with size distribution specified by the command line parameters (or defaults).

  • simNGS, which simulates sequencing and basecalling.It takes the fragments as input and in paired-end mode generates two reads from the ends of the fragment.

If you give as input a genome as a fasta file to simNGS, than it is expected to give you one read pair per chromosome, as it will interpret the chromosomes as fragments.

What you need is something like this:

simLibrary -n [number_of_fragments] reference.fas | simNGS -o fastq -p paired [runfile] > simulated_reads.fq

I would recommend reading the full documentation of simLibrary and simNGS before using them just to make sure that you get the output you expect.

ADD COMMENTlink
1
Entering edit mode

Thanks a lot. I will give the documentation a closer look - I've just been very short on time.

ADD REPLYlink
0
Entering edit mode

Is there any means of ensuring that the read names output are unique? I'm running into problems downstream with non-unique read names...

ADD REPLYlink
0
Entering edit mode

I guess your problem is that the reads having identical end points have the same ID. Unfortunately with the current release there is no way to guarantee the uniqueness of the IDs. But it is easy to post-process the output to make them unique, as the reads from the same pair are printed out consecutively.

ADD REPLYlink
0
Entering edit mode

No problem - I can fix it manually!

ADD REPLYlink
1
Entering edit mode
6.2 years ago
Benm • 710

You can try this one, https://sourceforge.net/projects/simulateseq/files/0.2.2/

ADD COMMENTlink
0
Entering edit mode
5.8 years ago
guillemch • 140
Stockholm

Hi!

It has been a long time since this thread was opened, but I arrived here because I was facing the same problem. I was having problems due to non-unique read IDs when the organism from which I was generating the reads had more than one chromosome. I've made a patch to fix this (here), you can apply it on the last version of simNGS.

Hope it helps!

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1