Biostar Beta. Not for public use.
output path issue on Busco (python)
0
Entering edit mode
21 months ago
Darrill • 0

Hi everyone, I'm actually running a busco.py program to find orthologous genes present in all insects in my genomes. To do so I created a bash file to run it on the cluster. Here is the file:

 #!/bin/bash
#SBATCH -t 24:00:00
#SBATCH -e path/busco_job.log/busco_job.error
#SBATCH -o path/busco_job.log/busco_job.out
date;hostname;pwd
ASSEMBLY=path/genome.fasta
LINEAGE=path/hymenoptera_odb9
SAMP=my_species
NAME=$SAMP'_BUSCO_v3' ######################################### # define PATH to sofwtare used by BUSCO # ######################################### #Augustus export PATH=/bin:/usr/bin:/usr/remote/bin:path/Augustus3.3/bin:path/Augustus3.3/scripts # hmmer PATH=$PATH:/path/hmmer-3.2.1/bin
# blast et python
PATH=$PATH:/path/ncbi-blast-2.8.1+/bin PATH=$PATH:/usr/bin
# augustus
export AUGUSTUS_CONFIG_PATH=/path/Augustus3.3/config

################
# Command line #
################
export PATH=/usr/remote/Python-3.6.5/bin:$PATH PATH=$PATH:/usr/bin
out_path = path/run_busco
export PYTHONPATH=$PYTHONPATH:~/path/site-packages python3 /path/busco-masterV3/scripts/run_BUSCO.py -i$ASSEMBLY -o $NAME -l$LINEAGE -m geno -f


The main issue is that the program busco.py by default write the output files into the directory where the python busco.py is ran but I would like to change the directory where are written the output files. And in the documentation they say that the option out_path can be modified from 2 ways: One is to modifie the path directly on the config.ini file or to provide input parameters through the command line which will override those defined in config.ini (and it is this solution I want to use). But it does not work even if I write in the run.sh file out_path = my_desired_path

Here is the documentation concerning the path:

In this file (config.ini), you must declare the paths to all dependencies (see below) and you can optionally define the required input parameters (described later in this document). Note: providing input parameters through the command line will override those defined in config.ini. The config.ini.default file is extensively commented and self explanatory. here is the head of the content of config.ini:

# BUSCO specific configuration
# It overrides default values in code and dataset cfg, and is overridden by arguments in command line
# Uncomment lines when appropriate
[busco]
# Input file
;in = ./sample_data/target.fa
# Run name, used in output files and folder
;out = SAMPLE
# Where to store the output directory
;out_path = ./sample_data
# Path to the BUSCO dataset
;lineage_path = ./sample_data/example
# Which mode to run (genome / protein / transcriptome)
;mode = genome
# How many threads to use for multithreaded steps
;cpu = 1
# Domain for augustus retraining, eukaryota or prokaryota
# Do not change this unless you know exactly why !!!
;domain = eukaryota
# Force rewrite if files already exist (True/False)
;force = False
# Restart mode (True/False)
;restart = False
# Blast e-value
;evalue = 1e-3


So I was wondering why even if I write in my script : out_path = /path/run_busco the out_file are still in the ./sample_data ??

Thank you for your help.

0
Entering edit mode

Hello,

I don't know the program. But I guess you have to remove the ; before the out_path parameter in the config file, so that whatever you declare there have an effect.

fin swimmer

0
Entering edit mode

Yes I removed the ; part but there is still the same issue.

0
Entering edit mode

It would be odd if the config file is using ; in some way. But in that case can you specify a directory you want the output to go to in ;out_path = /path_to_dir_you_want

0
Entering edit mode

It would be odd if the config file is using ; in some way.

The php config file php.ini for example uses this to comment out parameters.

0
Entering edit mode

Yep it works if I modify it directly in the config.ini file of course but the output path will change depending on the script I use...

I have around 100 script to run with a unique path for each job, that is why I want to incorporate the out_path directly in my script and not in the config.ini which does not change.

0
Entering edit mode

Have the script generate/modify the config.ini.

2
Entering edit mode
18 months ago
h.mon 25k
Brazil
ERROR   Please do not provide a full path in --out parameter, no slash. Use out_path in the config.ini file to specify the full path.


It is a bit annoying you can't just give a path to --out. I would solve (in fact, it is what I do when I use BUSCO) the issue in a simpler manner than editing the config for every run: I just create and cd into the desired output directory before running BUSCO.

################
# Command line #
################
export PATH=/usr/remote/Python-3.6.5/bin:$PATH PATH=$PATH:/usr/bin
mkdir path/run_busco
cd path/run_busco
export PYTHONPATH=$PYTHONPATH:~/path/site-packages python3 /path/busco-masterV3/scripts/run_BUSCO.py -i$ASSEMBLY -o $NAME -l$LINEAGE -m geno -f


Of course, if "ASSEMBLY=path/genome.fasta" and "LINEAGE=path/hymenoptera_odb9" are relative paths, they have to be tweaked to work in the new folder - if they are absolute paths, they will work regardless of where BUSCO is running.