Bash: For loop with two statements, first statements as input to second with .fasta files
1
2
Entering edit mode
5.2 years ago
rah ▴ 20

im working on several .fasta files which in their name contains the chr_start_end.fasta which I want to iterate through and then extract the individual size for each fasta file. Then I want to use the size as an input to another command for each fasta file, so in the same for loop.

To extract the coordinates from the file I use as an example:

echo "chr10_126777139_126791124.fasta" | awk -F'[_.]' '{print $3-$2}'

which yields = 126791124 - 126777139 = 13985

Then I want to give the 13985 as an input to an genome assembly tool called canu. Like this example

canu -assemble -p asstest -d . -genomeSize=13985 -nanopore-raw chr10_126777139_126791124.fasta

I've tried this so far, but I cant get it to work properly.

for f in *.fasta; do Gen_size=$(echo "$f" | awk -F'[_.]' '{print $3-$2}') canu -assemble -p asstest -d . genomeSize=$Gen_size -nanopore-raw $f; done

I want to do this for several .fasta files at once, do any of you have any suggestions on how to pass one input to the next statement within the same for loop? Thanks?

fasta assembly bash sequence • 1.5k views
ADD COMMENT
0
Entering edit mode

It looks fine. What is the error you get?

Also, the line starting with "canu" should be on a new line.

ADD REPLY
2
Entering edit mode

You need a semi colon after your echo/awk command and before you call canu:

for f in *.fasta; do Gen_size=$(echo "$f" | awk -F'[_.]' '{print $3-$2}') canu -assemble -p asstest -d . genomeSize=$Gen_size -nanopore-raw $f; done
                                                                         ^ here
ADD REPLY
0
Entering edit mode

Thank you, nicely noticed.

ADD REPLY
2
Entering edit mode
5.2 years ago
ATpoint 81k

Hope I got your question correctly: Wrap it into a function and parallelize with GNU parallel using $JOBS as the number of parallel jobs. Not familiar with Canu so you might have to tune it a bit because I do not know what parameter to set to define an output name/deirectory/whatever.

function CANU {

  FILE=$1
  Gen_size=$(echo "$1" | awk -F'[_.]' '{print $3-$2}')

  canu -assemble -p asstest -d . genomeSize=$Gen_size -nanopore-raw $FILE

}; export -f CANU
ls *.fasta | parallel -j $JOBS "CANU {}"
ADD COMMENT
0
Entering edit mode

Thanks for your respons, i'll give it a try when im doing my next round of analysis.

ADD REPLY

Login before adding your answer.

Traffic: 2701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6