falcon-unzip installation and testing using pacbio data
6.3 years ago
rob234king ▴ 610

I have installed falcon-unzip as below using the pre-compiled binaries and seems fine

cd /home/data/bioinf_resources/programming_tools/
#use virtualenv-2.7
source falcontest/bin/activate
tar xvzf falcon-2017.11.02-16.04-py2.7-ucs4.tar.gz -C falcontest
export LD_LIBRARY_PATH=falcontest/lib:${LD_LIBRARY_PATH}
 export PYTHONPATH=/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7
pip install pandas
easy_install --upgrade numpy

#setpath for mummer nucmer and show-cords
export PATH=/home/data/bioinf_resources/programming_tools/mummer-3.9.4alpha:$PATH

How do I run the example data, what commands and config files to use? I can download the raw data from ENA using project codes below. Arabidopsis data: PRJNA314706 V. vinifera cv. Cabernet Sauvignon: PRJNA316730 I've downloaded the config files from below for a test run: https://github.com/PacificBiosciences/FALCON_unzip/tree/master/examples And changed the line in fc_unzip.cfg to below: smrt_bin=/home/data/bioinf_resources/programming_tools/falcontest/bin/

I've downloaded the example assemblies file which comes with the config files used: fc_run.cfg input.fofn fc_unzip.cfg unzip.sh Assume download the raw data and put paths in “input.fofn” but then how to start it..

I have downloaded the raw data for the arabidopsis assembly as a test first. I have updated the input.fofn file with locations and smrtanalysis/bin location in fc_unzip.cfg

The unzip.sh file first command has changed in the pre-built binaries. This is the file used in the paper to run start to finish but very first command I get an error.

"This fc_track_reads.py" has become "fc_track_reads_htigs0.py"

Change made but when run first command get the below error

No handlers could be found for logger "pypeflow.simple_pwatcher_bridge"
Traceback (most recent call last):
  File "/home/data/bioinf_resources/programming_tools/falcontest/bin/fc_track_reads_htigs0.py", line 11, in <module>
    load_entry_point('falcon-unzip==0.4.0', 'console_scripts', 'fc_track_reads_htigs0.py')()
  File "/scratch/cdunn/fork/.git/LOCAL4/lib/python2.7/site-packages/falcon_unzip/mains/track_reads_htigs0.py", line 338, in main
  File "/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow/simple_pwatcher_bridge.py", line 273, in refreshTargets
    self._refreshTargets(updateFreq, exitOnFailure)
  File "/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow/simple_pwatcher_bridge.py", line 339, in _refreshTargets
    raise Exception(msg)
Exception: Some tasks are recently_done but not satisfied: set([Node(0-rawreads), Node(1-preads_ovl)])


I did not have pypeflow which I now do (although didn't see anywhere that said I needed it). And I am just trying a small dataset, e.coli. And just running falcon with the below fc_run.py fc_run.cfg

but get the error in local mode:

> 2018-02-26 12:45:34,727 - fc_run - INFO - Setup logging from file
> "None". 2018-02-26 12:45:34,822 - fc_run - INFO - fc_run started with
> configuration fc_run.cfg 2018-02-26 12:45:34,832 - fc_run - INFO -  No
> target specified, assuming "assembly" as target  2018-02-26
> 12:45:34,833 - pypeflow.simple_pwatcher_bridge - WARNING - In
> simple_pwatcher_bridge, pwatcher_impl=<module 'pwatcher.fs_based' from
> '/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow-1.0.0-py2.7.egg/pwatcher/fs_based.pyc'>
> 2018-02-26 12:45:34,834 - pypeflow.simple_pwatcher_bridge - INFO - In
> simple_pwatcher_bridge, pwatcher_impl=<module 'pwatcher.fs_based' from
> '/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow-1.0.0-py2.7.egg/pwatcher/fs_based.pyc'>
> 2018-02-26 12:45:34,855 - pypeflow.simple_pwatcher_bridge - INFO -
> job_type='local', job_queue='', sge_option='-pe smp 8 -q your_queue',
> use_tmpdir=False, squash=False, job_name_style=0 2018-02-26
> 12:45:34,867 - pypeflow.simple_pwatcher_bridge - DEBUG - Created
> PypeTask('0-rawreads/raw-fofn-abs',
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads/raw-fofn-abs',
> "{'o_fofn': PLF('0-rawreads/raw-fofn-abs/input.fofn', None)}",
> "{'i_fofn': PLF('input.fofn', None)}") 2018-02-26 12:45:34,868 -
> pypeflow.simple_pwatcher_bridge - DEBUG - Added
> PRODUCERS['0-rawreads/raw-fofn-abs'] =
> PypeTask('0-rawreads/raw-fofn-abs',
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads/raw-fofn-abs',
> "{'o_fofn': PLF('0-rawreads/raw-fofn-abs/input.fofn', None)}",
> "{'i_fofn': PLF('input.fofn', None)}") 2018-02-26 12:45:34,869 -
> pypeflow.simple_pwatcher_bridge - DEBUG - Built
> PypeTask('0-rawreads/raw-fofn-abs',
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads/raw-fofn-abs',
> "{'o_fofn': PLF('input.fofn', '0-rawreads/raw-fofn-abs')}",
> "{'i_fofn': PLF('input.fofn', None)}") 2018-02-26 12:45:34,869 -
> pypeflow.simple_pwatcher_bridge - DEBUG - New
> Node(0-rawreads/raw-fofn-abs) needs set([]) 2018-02-26 12:45:34,891 -
> pypeflow.simple_pwatcher_bridge - INFO - Num unsatisfied: 0, graph: 1
> 2018-02-26 12:45:34,893 - pypeflow.simple_pwatcher_bridge - DEBUG -
> Created PypeTask('0-rawreads',
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads',
> "{'length_cutoff': PLF('0-rawreads/length_cutoff', None),\n
> 'raw_reads_db': PLF('0-rawreads/raw_reads.db', None),\n
> 'rdb_build_done': PLF('0-rawreads/rdb_build_done', None),\n
> 'run_jobs': PLF('0-rawreads/run_jobs.sh', None)}", "{'input_fofn':
> PLF('input.fofn', '0-rawreads/raw-fofn-abs')}") 2018-02-26
> 12:45:34,895 - pypeflow.simple_pwatcher_bridge - DEBUG - Added
> PRODUCERS['0-rawreads'] = PypeTask('0-rawreads',
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads',
> "{'length_cutoff': PLF('0-rawreads/length_cutoff', None),\n
> 'raw_reads_db': PLF('0-rawreads/raw_reads.db', None),\n
> 'rdb_build_done': PLF('0-rawreads/rdb_build_done', None),\n
> 'run_jobs': PLF('0-rawreads/run_jobs.sh', None)}", "{'input_fofn':
> PLF('input.fofn', '0-rawreads/raw-fofn-abs')}") 2018-02-26
> 12:45:34,898 - pypeflow.simple_pwatcher_bridge - DEBUG - Built
> PypeTask('0-rawreads',
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads',
> "{'length_cutoff': PLF('length_cutoff', '0-rawreads'),\n
> 'raw_reads_db': PLF('raw_reads.db', '0-rawreads'),\n 'rdb_build_done':
> PLF('rdb_build_done', '0-rawreads'),\n 'run_jobs': PLF('run_jobs.sh',
> '0-rawreads')}", "{'input_fofn': PLF('input.fofn',
> '0-rawreads/raw-fofn-abs')}") 2018-02-26 12:45:34,898 -
> pypeflow.simple_pwatcher_bridge - DEBUG - New Node(0-rawreads) needs
> set([Node(0-rawreads/raw-fofn-abs)]) 2018-02-26 12:45:34,901 -
> pypeflow.simple_pwatcher_bridge - INFO - Num unsatisfied: 1, graph: 2
> 2018-02-26 12:45:34,901 - pypeflow.simple_pwatcher_bridge - INFO -
> About to submit: Node(0-rawreads) 2018-02-26 12:45:34,901 -
> pypeflow.simple_pwatcher_bridge - DEBUG - enque nodes:
> set([Node(0-rawreads)]) 2018-02-26 12:45:34,967 -
> pypeflow.simple_pwatcher_bridge - DEBUG - In
> rundir='/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads',
> sge_option='-pe smp 8 -q your_queue', __sge_option='-pe smp 8 -q
> your_queue' 2018-02-26 12:45:34,967 - pwatcher.fs_based - DEBUG -
> run(jobids=<1>, job_type=local, job_queue=) 2018-02-26 12:45:34,968 -
> pwatcher.fs_based - DEBUG - jobs: {'P76645cb57cfd20':
> Job(jobid='P76645cb57cfd20', cmd='/bin/bash run.sh',
> rundir='/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads',
> options={'job_queue': '', 'sge_option': '-pe smp 8 -q your_queue',
> 'job_type': 'local'})} 2018-02-26 12:45:34,968 - pwatcher.fs_based -
> INFO - starting job Job(jobid='P76645cb57cfd20', cmd='/bin/bash
> run.sh',
> rundir='/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads',
> options={'job_queue': '', 'sge_option': '-pe smp 8 -q your_queue',
> 'job_type': 'local'}) 2018-02-26 12:45:34,969 - pwatcher.fs_based -
> DEBUG - Wrapped "python2.7 -m pwatcher.mains.fs_heartbeat
> --directory=/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads
> --heartbeat-file=/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/heartbeats/heartbeat-P76645cb57cfd20
> --exit-file=/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/exits/exit-P76645cb57cfd20
> --rate=10.0 /bin/bash run.sh || echo 99 >| /home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/exits/exit-P76645cb57cfd20"
> 2018-02-26 12:45:34,969 - pwatcher.fs_based - DEBUG - Writing wrapper
> "/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/wrappers/run-P76645cb57cfd20.bash"
> 2018-02-26 12:45:35,002 - pwatcher.fs_based - DEBUG - CD:
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/jobs/P76645cb57cfd20'
> <- '/home/data/bioinf_resources/programming_tools/falcontest/raw'
> 2018-02-26 12:45:35,012 - pwatcher.fs_based - DEBUG - dir:
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/jobs/P76645cb57cfd20'
> call: '/bin/bash
> /home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/wrappers/run-P76645cb57cfd20.bash
> 1>|stdout 2>|stderr & ' 2018-02-26 12:45:35,019 - pwatcher.fs_based -
> DEBUG - pid=40352 pgid=40352 sub-pid=40573 2018-02-26 12:45:35,020 -
> pwatcher.fs_based - DEBUG - CD:
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/jobs/P76645cb57cfd20'
> -> '/home/data/bioinf_resources/programming_tools/falcontest/raw' 2018-02-26 12:45:35,022 - pwatcher.fs_based - INFO - Submitted
> backgroundjob=MetaJobLocal(MetaJob(job=Job(jobid='P76645cb57cfd20',
> cmd='/bin/bash run.sh',
> rundir='/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads',
> options={'job_queue': '', 'sge_option': '-pe smp 8 -q your_queue',
> 'job_type': 'local'}), lang_exe='/bin/bash')) 2018-02-26 12:45:35,023
> - pypeflow.simple_pwatcher_bridge - DEBUG - Result of watcher.run()={'submitted': ['P76645cb57cfd20']} 2018-02-26
> 12:45:35,023 - pypeflow.simple_pwatcher_bridge - DEBUG - N in queue: 1
> (max_jobs=8) 2018-02-26 12:45:35,024 - pwatcher.fs_based - DEBUG -
> query(which='list', jobids=<1>) 2018-02-26 12:45:35,041 -
> pwatcher.fs_based - DEBUG - Unable to remove heartbeat
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/heartbeats/heartbeat-P76645cb57cfd20' when sentinal was found in exit-sentinels listdir. Traceback (most
> recent call last):   File
> "/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow-1.0.0-py2.7.egg/pwatcher/fs_based.py",
> line 565, in get_status
>     os.remove(heartbeat_path) OSError: [Errno 2] No such file or directory:
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/heartbeats/heartbeat-P76645cb57cfd20'
> 2018-02-26 12:45:35,045 - pwatcher.fs_based - DEBUG - Status EXIT 256
> for heartbeat:heartbeat-P76645cb57cfd20 2018-02-26 12:45:35,045 -
> pypeflow.simple_pwatcher_bridge - ERROR - Task Node(0-rawreads) failed
> with exit-code=256 2018-02-26 12:45:35,046 -
> pypeflow.simple_pwatcher_bridge - DEBUG - recently_done:
> [(Node(0-rawreads), False)] 2018-02-26 12:45:35,046 -
> pypeflow.simple_pwatcher_bridge - DEBUG - Num done in this iteration:
> 1 2018-02-26 12:45:35,047 - pypeflow.simple_pwatcher_bridge - INFO -
> recently_satisfied: set([]) 2018-02-26 12:45:35,047 -
> pypeflow.simple_pwatcher_bridge - INFO - Num satisfied in this
> iteration: 0 2018-02-26 12:45:35,047 - pypeflow.simple_pwatcher_bridge
> - INFO - Num still unsatisfied: 1 2018-02-26 12:45:35,048 - pypeflow.simple_pwatcher_bridge - ERROR - Some tasks are recently_done
> but not satisfied: set([Node(0-rawreads)]) 2018-02-26 12:45:35,048 -
> pypeflow.simple_pwatcher_bridge - ERROR - ready: set([])  submitted:
> set([]) 2018-02-26 12:45:35,049 - pwatcher.fs_based - DEBUG -
> delete(which='known', jobids=<0>) 2018-02-26 12:45:35,049 -
> pwatcher.fs_based - DEBUG - Deleting jobs for jobids from known ([])
> 2018-02-26 12:45:35,052 - pwatcher.fs_based - DEBUG - Failed to kill
> job for heartbeat 'heartbeat-P76645cb57cfd20': IOError(2, 'No such
> file or directory') 2018-02-26 12:45:35,083 - pwatcher.fs_based -
> DEBUG - Cannot remove heartbeat: OSError(2, 'No such file or
> directory') 2018-02-26 12:45:35,084 - pypeflow.simple_pwatcher_bridge
> - DEBUG - In notifyTerminate(), result of delete:None
Here is a tutorial for Falcon if you have not seen it.

I will try this and see if can get it to complete. Thanks

I got the test data to run using the config file provided in tutorial. It looks like was successful. The raw data was fasta. I just tested with fastq data and ends with error. From ENA they quite often provide the data in fastq so using this tools looks like I can convert https://github.com/zyndagj/FALCON-formatter to the format that it would require if had fastq data. Have you any experience of using fastq data insteaed or the h5 raw files?


