Script error during orthologs analysis
1
0
Entering edit mode
8.3 years ago
Mehmet ▴ 820

Dear All:

I am using a script named ExtractSeq.sh which is located at https://github.com/halexand/Ehux_HD/blob/master/orthoMCL_output/ExtractSeq.sh

But I got an error:

ExtractSeq.sh: line 67: ${outdir}/${myArray[0]}.fa: ambiguous redirect

Any solution?

genome sequence software-error • 1.5k views
ADD COMMENT
0
Entering edit mode
8.3 years ago

Try putting double quotes around the variables.

ADD COMMENT
0
Entering edit mode
#!/bin/bash
# This is a bash script that extracts the sequences for all orthologous groups (OG).
# It takes the a OG ids list as input and saves all sequences belonging to that group
# from all organism in a file named with OG group in fasta format.
# Note that after the script is executed, there will be 'n' number of files (where
# n=total number of OG's in the input list

# Arun Seetharam <aseetharam@purdue.edu>

scriptName="${0##*/}"
outdir=$(pwd)
function printUsage() {
cat <<EOF

Synopsis

$scriptName [-h | --help] [-o dir_name] input_ids_list database

Description

Extracts sequences for all ortholog groups supplied as list. For each ID in the list
a file containing FASTA sequences will be generated, whcih belong to that OG.
Note: this script requires standalone blast+ software.

input_ids_list
Input list should contain orthologous group IDs one per line
These IDs should be generated by "orthomclMclToGroups" command

database
Absoulute path for the database should be specified. The database is
generally named as 'goodProteins.fasta'.

-o directory_name
directory name to save the output files. By default all files will be saved in
the current directory.

-h, --help
Brings up this help page

Author

Arun Seetharam, Bioinformatics Core, Purdue University.
aseetharam@purdue.edu

EOF
}

if [ $# -lt 1 ] ; then
    printUsage
    exit 1
fi

while getopts ':o:' option; do
    case "$option" in
        o) outdir=$OPTARG
        shift
        ;;
        h) printUsage
        exit
        ;;
        help) printUsage
        exit
        ;;
    esac
done

#module use /apps/group/bioinformatics/modules # uncomment these 2 lines if running this on clusters
#module load blast

mkdir -p $outdir
shift $(( $# - 2 ))
file=${1}
pathdbname=${2}
sed -i 's/://g' ${file}
while IFS=$' ' read -r -a myArray
do
    for i in "${myArray[@]:0}";
    do
        blastdbcmd -entry "$i" -db ${pathdbname} >> ${outdir}/${myArray[0]}.fa;
    done
done <${file}

Here is the script that I am trying to use, but I am quite new on scripts. Should I change any things , such as pathname etc., on it?

ADD REPLY
0
Entering edit mode

Try putting quotes around the variables in the blastdbcmd line:

blastdbcmd -entry "$i" -db "${pathdbname}" >> "${outdir}"/"${myArray[0]}.fa"

If this doesn't work, check that the variables are not empty.

ADD REPLY
0
Entering edit mode

I did what you suggested. I didnt get the same error, but I got a new error below:

file name is too long.

What parameter should I change?

ADD REPLY
0
Entering edit mode

It's as the error message says: the file name is too long. It could also be because you've got a space separated list of file names in the variable instead of a single one as expected. Look at what's in the different variables with echo, e.g. echo $i

ADD REPLY
0
Entering edit mode

the file means that a file in which I put sequences (ID lists) or fasta names in database (goodProteins.fasta) ?

I am really having troubles about this script. My goal is to get single copy genes from orthomcl output and to use them in phylogenetic tree process.

Do you have any better idea than mine?

ADD REPLY

Login before adding your answer.

Traffic: 2453 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6