Bash loop for bbduk
4
0
Entering edit mode
3.0 years ago
avelarbio46 ▴ 30

Hello Everyone! I'm trying to run a loop to trim my rnaseq reads in bbduk. However, I can't seem to input files from a specific directory nor output them to a directory. Probably this is very easy to solve. I'm currently using:

for i in `ls -1 /home/gabriel.gama/Dados_CD_genomics/TrueSeq_dezembro/*_1.fq.gz | sed 's/_1.fq.gz//'`
do
bbduk.sh -Xmx1g in1=$i\_1.fq.gz in2=$i\_2.fq.gz out1=/home/gabriel.gama/Análises/Teste1/bbduk/$i\_clean_1.fq.gz out2=/home/gabriel.gama/Análises/Teste1/bbduk/$i\_clean_2.fq.gz ref=/home/gabriel.gama/bbduk/bbmap/resources/adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo qtrim=r trimq=10 maq=10 
done

I'm not getting the sample names as output, even tough the command run. I'm getting the absolute path to file names as output

Maybe I should use something such as

for a in 'basename $i'

to get the basename of the file, and then reference it as such:

out1=/home/gabriel.gama/Análises/Teste1/bbduk/$a\_clean_1.fq.gz 
bash bbduk • 3.4k views
ADD COMMENT
0
Entering edit mode

what error are you getting?

ADD REPLY
0
Entering edit mode

I can't get the "samplename.fq" as output, just absolute paths

ADD REPLY
4
Entering edit mode
3.0 years ago
GenoMax 141k

Please do not use spaces in folder names. It is just simpler to use a _ when you feel like using a space. It would be better name my machine to my_machine.

There can be no spaces in bbduk.sh options. You seem to have a space between ref= and the directory after it.

Try this:

 for i in `ls -1 /home/gabriel.gama/Dados_CD_genomics/TrueSeq_dezembro/*_1.fq.gz`; \
    do dname=$(dirname ${i}); name=$(basename ${i} _1.fq.gz); \
    bbduk.sh -Xmx1g in1=${dname}/${name}_1.fq.gz in2=${dname}/${name}_2.fq.gz \
    out1=/home/gabriel.gama/Análises/Teste1/bbduk/${name}_clean_1.fq.gz out2=/home/gabriel.gama/Análises/Teste1/bbduk/${name}_clean_2.fq.gz \
    ref=/home/gabriel.gama/bbduk/bbmap/resources/adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo qtrim=r trimq=10 maq=10 ; 
done
ADD COMMENT
0
Entering edit mode

I removed spaces where possible, but I still can't reference to the sample name, just absolute path. Because of that I can't output the fastq correctly :(

ADD REPLY
2
Entering edit mode
3.0 years ago
Mensur Dlakic ★ 27k

Try setting up basename in your for command:

for i in `ls -1 /home/gabriel.gama/Dados_CD_genomics/TrueSeq_dezembro/*_1.fq.gz | sed 's/_1.fq.gz//' | xargs -i basename {}`
ADD COMMENT
0
Entering edit mode

This makes "$i" become basename, but then I can't use it to define the absolute path in

bbduk.sh -Xmx1g in1=$i\_1.fq.gz
ADD REPLY
1
Entering edit mode

Come on now, you can't expect us to change every single line of code. It helps if you put some effort of your own, which here would be adding a fixed directory part to in1 and in2:

bbduk.sh -Xmx1g in1=/home/gabriel.gama/Dados_CD_genomics/TrueSeq_dezembro/$i\_1.fq.gz in2=/home/gabriel.gama/Dados_CD_genomics/TrueSeq_dezembro/$i\_2.fq.gz out1=/home/gabriel.gama/Análises/Teste1/bbduk/$i\_clean_1.fq.gz out2=/home/gabriel.gama/Análises/Teste1/bbduk/$i\_clean_2.fq.gz ref=/home/gabriel.gama/bbduk/bbmap/resources/adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo qtrim=r trimq=10 maq=10 
ADD REPLY
0
Entering edit mode

I agree with you! I'm really trying to learn about loops and more complex functions, but I couldn't get your answer to work before the edits!

ADD REPLY
0
Entering edit mode

See the modified version I added to my answer above.

ADD REPLY
2
Entering edit mode
3.0 years ago

I think code is a little bit complex as you are getting sample read name (ls), converting to the dir name (sed) and then adding read information (in loop). Please check the following code and see if you can improve on it. I have removed the paths, rephrased the code and see if it makes sense. (please add paths wherever applicable). This code works when wherever the reads and adapters are kept in the same directory:

with bash loop:

$ for i in $(ls  *_1.fq.gz);                                                                                                                                                          
do echo bbduk.sh -Xmx1g in1=$i in2=${i/_1/_2} out1=./bbduk_results/${i/_1/_clean_1} out2=./bbduk_results/${i/_1/_clean_2} ref=adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo qtrim=r trimq=10 maq=10;
done

bbduk.sh -Xmx1g in1=sample1_1.fq.gz in2=sample1_2.fq.gz out1=./bbduk_results/sample1_clean_1.fq.gz out2=./bbduk_results/sample1_clean_2.fq.gz ref=adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo qtrim=r trimq=10 maq=10
bbduk.sh -Xmx1g in1=sample2_1.fq.gz in2=sample2_2.fq.gz out1=./bbduk_results/sample2_clean_1.fq.gz out2=./bbduk_results/sample2_clean_2.fq.gz ref=adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo qtrim=r trimq=10 maq=10

Please consider using find instead of ls.

with parallel:

$ parallel --plus --dry-run bbduk.sh  -Xmx1g in1={} in2={=s/_1/_2/=} out1=./bbduk_results/{=s/_1/_clean_1/=} out2=./bbduk_results/{=s/_1/_clean_2/=} ref=adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo qtrim=r trimq=10 maq=10 ::: *_1.fq.gz

bbduk.sh -Xmx1g in1=sample1_1.fq.gz in2=sample1_2.fq.gz out1=./bbduk_results/sample1_clean_1.fq.gz out2=./bbduk_results/sample1_clean_2.fq.gz ref=adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo qtrim=r trimq=10 maq=10
bbduk.sh -Xmx1g in1=sample2_1.fq.gz in2=sample2_2.fq.gz out1=./bbduk_results/sample2_clean_1.fq.gz out2=./bbduk_results/sample2_clean_2.fq.gz ref=adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo qtrim=r trimq=10 maq=10

Please also consider using snakemake or any other framework.

ADD COMMENT
0
Entering edit mode
3.0 years ago
avelarbio46 ▴ 30

After a lot of help, I learned one the fastest and easiest way to do so in a for loop. The thing is, manipulating $i is better, such as:

for i in `ls -1 /home/gabriel.gama/Dados_CD_genomics/TrueSeq_dezembro/*_1.fq.gz | sed 's/_1.fq.gz//'` 
do
bbduk.sh -Xmx1g in1=$i\_1.fq.gz in2=$i\_2.fq.gz out1=/home/gabriel.gama/Análises/Teste1/bbduk/${i##*/}\_clean_1.fq.gz out2=/home/gabriel.gama/Análises/Teste1/bbduk/${i##*/}\_clean_2.fq.gz ref=/home/gabriel.gama/bbduk/bbmap/resources/adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo qtrim=r trimq=10 maq=10 
done

The term

${i##*/}

gives the basename of the directory!

Thanks https://stackoverflow.com/users/140750/william-pursell for the response

ADD COMMENT

Login before adding your answer.

Traffic: 2660 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6