falcon-unzip installation and testing using pacbio data
0
0
Entering edit mode
6.3 years ago
rob234king ▴ 610

I have installed falcon-unzip as below using the pre-compiled binaries and seems fine

cd /home/data/bioinf_resources/programming_tools/
#use virtualenv-2.7
unset PYTHONPATH
source falcontest/bin/activate
tar xvzf falcon-2017.11.02-16.04-py2.7-ucs4.tar.gz -C falcontest
export LD_LIBRARY_PATH=falcontest/lib:${LD_LIBRARY_PATH}
 export PYTHONPATH=/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7
pip install pandas
easy_install --upgrade numpy

#setpath for mummer nucmer and show-cords
export PATH=/home/data/bioinf_resources/programming_tools/mummer-3.9.4alpha:$PATH

How do I run the example data, what commands and config files to use? I can download the raw data from ENA using project codes below. Arabidopsis data: PRJNA314706 V. vinifera cv. Cabernet Sauvignon: PRJNA316730 I've downloaded the config files from below for a test run: https://github.com/PacificBiosciences/FALCON_unzip/tree/master/examples And changed the line in fc_unzip.cfg to below: smrt_bin=/home/data/bioinf_resources/programming_tools/falcontest/bin/

I've downloaded the example assemblies file which comes with the config files used: fc_run.cfg input.fofn fc_unzip.cfg unzip.sh Assume download the raw data and put paths in “input.fofn” but then how to start it..

I have downloaded the raw data for the arabidopsis assembly as a test first. I have updated the input.fofn file with locations and smrtanalysis/bin location in fc_unzip.cfg

The unzip.sh file first command has changed in the pre-built binaries. This is the file used in the paper to run start to finish but very first command I get an error.

"This fc_track_reads.py" has become "fc_track_reads_htigs0.py"

Change made but when run first command get the below error

fc_track_reads_htigs0.py
No handlers could be found for logger "pypeflow.simple_pwatcher_bridge"
Traceback (most recent call last):
  File "/home/data/bioinf_resources/programming_tools/falcontest/bin/fc_track_reads_htigs0.py", line 11, in <module>
    load_entry_point('falcon-unzip==0.4.0', 'console_scripts', 'fc_track_reads_htigs0.py')()
  File "/scratch/cdunn/fork/.git/LOCAL4/lib/python2.7/site-packages/falcon_unzip/mains/track_reads_htigs0.py", line 338, in main
  File "/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow/simple_pwatcher_bridge.py", line 273, in refreshTargets
    self._refreshTargets(updateFreq, exitOnFailure)
  File "/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow/simple_pwatcher_bridge.py", line 339, in _refreshTargets
    raise Exception(msg)
Exception: Some tasks are recently_done but not satisfied: set([Node(0-rawreads), Node(1-preads_ovl)])

UPDATE:

I did not have pypeflow which I now do (although didn't see anywhere that said I needed it). And I am just trying a small dataset, e.coli. And just running falcon with the below fc_run.py fc_run.cfg

but get the error in local mode:

> 2018-02-26 12:45:34,727 - fc_run - INFO - Setup logging from file
> "None". 2018-02-26 12:45:34,822 - fc_run - INFO - fc_run started with
> configuration fc_run.cfg 2018-02-26 12:45:34,832 - fc_run - INFO -  No
> target specified, assuming "assembly" as target  2018-02-26
> 12:45:34,833 - pypeflow.simple_pwatcher_bridge - WARNING - In
> simple_pwatcher_bridge, pwatcher_impl=<module 'pwatcher.fs_based' from
> '/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow-1.0.0-py2.7.egg/pwatcher/fs_based.pyc'>
> 2018-02-26 12:45:34,834 - pypeflow.simple_pwatcher_bridge - INFO - In
> simple_pwatcher_bridge, pwatcher_impl=<module 'pwatcher.fs_based' from
> '/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow-1.0.0-py2.7.egg/pwatcher/fs_based.pyc'>
> 2018-02-26 12:45:34,855 - pypeflow.simple_pwatcher_bridge - INFO -
> job_type='local', job_queue='', sge_option='-pe smp 8 -q your_queue',
> use_tmpdir=False, squash=False, job_name_style=0 2018-02-26
> 12:45:34,867 - pypeflow.simple_pwatcher_bridge - DEBUG - Created
> PypeTask('0-rawreads/raw-fofn-abs',
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads/raw-fofn-abs',
> "{'o_fofn': PLF('0-rawreads/raw-fofn-abs/input.fofn', None)}",
> "{'i_fofn': PLF('input.fofn', None)}") 2018-02-26 12:45:34,868 -
> pypeflow.simple_pwatcher_bridge - DEBUG - Added
> PRODUCERS['0-rawreads/raw-fofn-abs'] =
> PypeTask('0-rawreads/raw-fofn-abs',
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads/raw-fofn-abs',
> "{'o_fofn': PLF('0-rawreads/raw-fofn-abs/input.fofn', None)}",
> "{'i_fofn': PLF('input.fofn', None)}") 2018-02-26 12:45:34,869 -
> pypeflow.simple_pwatcher_bridge - DEBUG - Built
> PypeTask('0-rawreads/raw-fofn-abs',
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads/raw-fofn-abs',
> "{'o_fofn': PLF('input.fofn', '0-rawreads/raw-fofn-abs')}",
> "{'i_fofn': PLF('input.fofn', None)}") 2018-02-26 12:45:34,869 -
> pypeflow.simple_pwatcher_bridge - DEBUG - New
> Node(0-rawreads/raw-fofn-abs) needs set([]) 2018-02-26 12:45:34,891 -
> pypeflow.simple_pwatcher_bridge - INFO - Num unsatisfied: 0, graph: 1
> 2018-02-26 12:45:34,893 - pypeflow.simple_pwatcher_bridge - DEBUG -
> Created PypeTask('0-rawreads',
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads',
> "{'length_cutoff': PLF('0-rawreads/length_cutoff', None),\n
> 'raw_reads_db': PLF('0-rawreads/raw_reads.db', None),\n
> 'rdb_build_done': PLF('0-rawreads/rdb_build_done', None),\n
> 'run_jobs': PLF('0-rawreads/run_jobs.sh', None)}", "{'input_fofn':
> PLF('input.fofn', '0-rawreads/raw-fofn-abs')}") 2018-02-26
> 12:45:34,895 - pypeflow.simple_pwatcher_bridge - DEBUG - Added
> PRODUCERS['0-rawreads'] = PypeTask('0-rawreads',
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads',
> "{'length_cutoff': PLF('0-rawreads/length_cutoff', None),\n
> 'raw_reads_db': PLF('0-rawreads/raw_reads.db', None),\n
> 'rdb_build_done': PLF('0-rawreads/rdb_build_done', None),\n
> 'run_jobs': PLF('0-rawreads/run_jobs.sh', None)}", "{'input_fofn':
> PLF('input.fofn', '0-rawreads/raw-fofn-abs')}") 2018-02-26
> 12:45:34,898 - pypeflow.simple_pwatcher_bridge - DEBUG - Built
> PypeTask('0-rawreads',
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads',
> "{'length_cutoff': PLF('length_cutoff', '0-rawreads'),\n
> 'raw_reads_db': PLF('raw_reads.db', '0-rawreads'),\n 'rdb_build_done':
> PLF('rdb_build_done', '0-rawreads'),\n 'run_jobs': PLF('run_jobs.sh',
> '0-rawreads')}", "{'input_fofn': PLF('input.fofn',
> '0-rawreads/raw-fofn-abs')}") 2018-02-26 12:45:34,898 -
> pypeflow.simple_pwatcher_bridge - DEBUG - New Node(0-rawreads) needs
> set([Node(0-rawreads/raw-fofn-abs)]) 2018-02-26 12:45:34,901 -
> pypeflow.simple_pwatcher_bridge - INFO - Num unsatisfied: 1, graph: 2
> 2018-02-26 12:45:34,901 - pypeflow.simple_pwatcher_bridge - INFO -
> About to submit: Node(0-rawreads) 2018-02-26 12:45:34,901 -
> pypeflow.simple_pwatcher_bridge - DEBUG - enque nodes:
> set([Node(0-rawreads)]) 2018-02-26 12:45:34,967 -
> pypeflow.simple_pwatcher_bridge - DEBUG - In
> rundir='/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads',
> sge_option='-pe smp 8 -q your_queue', __sge_option='-pe smp 8 -q
> your_queue' 2018-02-26 12:45:34,967 - pwatcher.fs_based - DEBUG -
> run(jobids=<1>, job_type=local, job_queue=) 2018-02-26 12:45:34,968 -
> pwatcher.fs_based - DEBUG - jobs: {'P76645cb57cfd20':
> Job(jobid='P76645cb57cfd20', cmd='/bin/bash run.sh',
> rundir='/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads',
> options={'job_queue': '', 'sge_option': '-pe smp 8 -q your_queue',
> 'job_type': 'local'})} 2018-02-26 12:45:34,968 - pwatcher.fs_based -
> INFO - starting job Job(jobid='P76645cb57cfd20', cmd='/bin/bash
> run.sh',
> rundir='/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads',
> options={'job_queue': '', 'sge_option': '-pe smp 8 -q your_queue',
> 'job_type': 'local'}) 2018-02-26 12:45:34,969 - pwatcher.fs_based -
> DEBUG - Wrapped "python2.7 -m pwatcher.mains.fs_heartbeat
> --directory=/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads
> --heartbeat-file=/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/heartbeats/heartbeat-P76645cb57cfd20
> --exit-file=/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/exits/exit-P76645cb57cfd20
> --rate=10.0 /bin/bash run.sh || echo 99 >| /home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/exits/exit-P76645cb57cfd20"
> 2018-02-26 12:45:34,969 - pwatcher.fs_based - DEBUG - Writing wrapper
> "/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/wrappers/run-P76645cb57cfd20.bash"
> 2018-02-26 12:45:35,002 - pwatcher.fs_based - DEBUG - CD:
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/jobs/P76645cb57cfd20'
> <- '/home/data/bioinf_resources/programming_tools/falcontest/raw'
> 2018-02-26 12:45:35,012 - pwatcher.fs_based - DEBUG - dir:
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/jobs/P76645cb57cfd20'
> call: '/bin/bash
> /home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/wrappers/run-P76645cb57cfd20.bash
> 1>|stdout 2>|stderr & ' 2018-02-26 12:45:35,019 - pwatcher.fs_based -
> DEBUG - pid=40352 pgid=40352 sub-pid=40573 2018-02-26 12:45:35,020 -
> pwatcher.fs_based - DEBUG - CD:
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/jobs/P76645cb57cfd20'
> -> '/home/data/bioinf_resources/programming_tools/falcontest/raw' 2018-02-26 12:45:35,022 - pwatcher.fs_based - INFO - Submitted
> backgroundjob=MetaJobLocal(MetaJob(job=Job(jobid='P76645cb57cfd20',
> cmd='/bin/bash run.sh',
> rundir='/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads',
> options={'job_queue': '', 'sge_option': '-pe smp 8 -q your_queue',
> 'job_type': 'local'}), lang_exe='/bin/bash')) 2018-02-26 12:45:35,023
> - pypeflow.simple_pwatcher_bridge - DEBUG - Result of watcher.run()={'submitted': ['P76645cb57cfd20']} 2018-02-26
> 12:45:35,023 - pypeflow.simple_pwatcher_bridge - DEBUG - N in queue: 1
> (max_jobs=8) 2018-02-26 12:45:35,024 - pwatcher.fs_based - DEBUG -
> query(which='list', jobids=<1>) 2018-02-26 12:45:35,041 -
> pwatcher.fs_based - DEBUG - Unable to remove heartbeat
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/heartbeats/heartbeat-P76645cb57cfd20' when sentinal was found in exit-sentinels listdir. Traceback (most
> recent call last):   File
> "/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow-1.0.0-py2.7.egg/pwatcher/fs_based.py",
> line 565, in get_status
>     os.remove(heartbeat_path) OSError: [Errno 2] No such file or directory:
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/heartbeats/heartbeat-P76645cb57cfd20'
> 
> 2018-02-26 12:45:35,045 - pwatcher.fs_based - DEBUG - Status EXIT 256
> for heartbeat:heartbeat-P76645cb57cfd20 2018-02-26 12:45:35,045 -
> pypeflow.simple_pwatcher_bridge - ERROR - Task Node(0-rawreads) failed
> with exit-code=256 2018-02-26 12:45:35,046 -
> pypeflow.simple_pwatcher_bridge - DEBUG - recently_done:
> [(Node(0-rawreads), False)] 2018-02-26 12:45:35,046 -
> pypeflow.simple_pwatcher_bridge - DEBUG - Num done in this iteration:
> 1 2018-02-26 12:45:35,047 - pypeflow.simple_pwatcher_bridge - INFO -
> recently_satisfied: set([]) 2018-02-26 12:45:35,047 -
> pypeflow.simple_pwatcher_bridge - INFO - Num satisfied in this
> iteration: 0 2018-02-26 12:45:35,047 - pypeflow.simple_pwatcher_bridge
> - INFO - Num still unsatisfied: 1 2018-02-26 12:45:35,048 - pypeflow.simple_pwatcher_bridge - ERROR - Some tasks are recently_done
> but not satisfied: set([Node(0-rawreads)]) 2018-02-26 12:45:35,048 -
> pypeflow.simple_pwatcher_bridge - ERROR - ready: set([])  submitted:
> set([]) 2018-02-26 12:45:35,049 - pwatcher.fs_based - DEBUG -
> delete(which='known', jobids=<0>) 2018-02-26 12:45:35,049 -
> pwatcher.fs_based - DEBUG - Deleting jobs for jobids from known ([])
> 2018-02-26 12:45:35,052 - pwatcher.fs_based - DEBUG - Failed to kill
> job for heartbeat 'heartbeat-P76645cb57cfd20': IOError(2, 'No such
> file or directory') 2018-02-26 12:45:35,083 - pwatcher.fs_based -
> DEBUG - Cannot remove heartbeat: OSError(2, 'No such file or
> directory') 2018-02-26 12:45:35,084 - pypeflow.simple_pwatcher_bridge
> - DEBUG - In notifyTerminate(), result of delete:None
falcon pacbio • 3.1k views
ADD COMMENT
0
Entering edit mode

Here is a tutorial for Falcon if you have not seen it.

ADD REPLY
0
Entering edit mode

I will try this and see if can get it to complete. Thanks

ADD REPLY
0
Entering edit mode

I got the test data to run using the config file provided in tutorial. It looks like was successful. The raw data was fasta. I just tested with fastq data and ends with error. From ENA they quite often provide the data in fastq so using this tools looks like I can convert https://github.com/zyndagj/FALCON-formatter to the format that it would require if had fastq data. Have you any experience of using fastq data insteaed or the h5 raw files?

ADD REPLY

Login before adding your answer.

Traffic: 955 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6