What's a good PacBio CLR read simulator?
1
4
Entering edit mode
7.2 years ago
pmarijon ▴ 140

What's a good PacBio CLR read simulator?

We could not use:

  • SiLiCO source doesn't generate quality values
  • FastqSim source has a bug where it outputs spurious A's and C's after 6 kbp
  • SimLord source outputs only CCS reads
  • pbsim source does not compile
  • readsim source - No control on quality value, we tried to assemble 60x E.coli simulated reads and obtained very fragmented assemblies with Canu 1.4

We were able to use:

  • LongIslnd source but is quite heavy-weight, it requires to install SMRTAnalysis (!) to generate model files. Model files aren't provided (this wasn't clear in the documentation), and it took over an hour to generate them myself. But helpful automated scripts were provided.
  • BBMap's RandomReads source seems work, and easy to install (thanks to Brian Bushnell)
  • NPBSS source MATLAB OCTAVE seems work but maybe not support multi-line FASTA.

Did you know other PacBio CLR read simulator ?

Edit : add BBmap's RandomReads and NPBSS

read simulator • 3.5k views
ADD COMMENT
1
Entering edit mode

I think you have a good list, In my case I go for pbsim maybe you need to post the error here so someone could help you.

ADD REPLY
0
Entering edit mode

Has you can see in this compile log it's a linking trouble, I think the build system forget some file.

ADD REPLY
1
Entering edit mode

did you read this or try the suggestions?

Some systems require unusual options for compilation or linking that the configure' script does not know about. Run ./configure --help for details on some of the pertinent environment variables.

You can give configure initial values for configuration parameters by setting variables in the command line or in the environment. Here is an example:

./configure CC=c99 CFLAGS=-g LIBS=-lposix

ADD REPLY
0
Entering edit mode

This issue explain why the build system is broken and alternative solution to build pbsim. Thank

ADD REPLY
1
Entering edit mode

http://www.nature.com/nrg/journal/v17/n8/full/nrg.2016.57.html

if cannot access the paper use sci-hub or gen-lib

ADD REPLY
0
Entering edit mode

Thank,

I read this publication, I didn't test EAGLE but they have a trouble with boost when it's upper than 1.56

ADD REPLY
0
Entering edit mode

Not sure what the intended use case is but you want to simulate the reads from a specific genome? Otherwise enough original PacBio data is available now. PacBio makes several sets available here.

ADD REPLY
0
Entering edit mode

It is useful for machine learning. I am currently trying to find a single working CLR simulator to control variant mutations for a deep learning based variant caller. There is plenty of real CLR data available, but few places to find quality variant calls to train on for public download. Until there is "Truth" variant set like https://jimb.stanford.edu/giab for prokaryotic genomes, simulators are the next best thing.

ADD REPLY
0
Entering edit mode

troysincomb Genome in a bottle (LINK) project has several well characterized datasets available. Some are PacBio so check them out.

ADD REPLY
1
Entering edit mode
7.2 years ago

BBMap's RandomReads tool has a PacBio mode. BBMap is already compiled, so just unzip it and it will run if you have Java installed. Usage:

randomreads.sh ref=reference.fa out=reads.fq.gz reads=10000 minlength=500 maxlength=15000 pacbio=t pbmin=0.13 pbmax=0.17

That will generate 10000 reads with length ranging from 500bp to 15kbp and average error rate from 13% to 17%, following PacBio's typical pattern of relative sub, del, and ins frequencies and lengths.

ADD COMMENT
2
Entering edit mode

this worked well for me, and easy to install, thanks.

ADD REPLY

Login before adding your answer.

Traffic: 2398 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6