Biostar Beta. Not for public use.
for loop over bwa aln
0
Entering edit mode
2.1 years ago
phylofun • 10
@phylofun51674

My for loop around bwa aln and bwa sampe doesn't seem to work. First, I created paths to the read data:

ls -1r  reads/*/cleaned/*READ1.fastq.gz > READ1_list


Next, I made a for loop using READ1_list and READ2_list around the bwa aln function:

for i in READ1_list
do
bwa aln reference/reference.unaligned.fasta $i >${i%READ1.fastq.gz}read1.sai
done

for i in READ2_list
do
bwa aln reference/reference.unaligned.fasta $i >${i%READ2.fastq.gz}read1.sai
done


Then, I use a for loop around bwa sampe:

 for i in (cat READ1_list | sed s'/\-READ1.fastq.gz//' | sort) do bwa sampe reference/reference.unaligned.fastai\-READ1.sai $i\-READ2.sai$i\-READ1.fastq.gz $i\-READ2.fastq.gz \ | samtools view -bS - >$i\-pe.bam
done


Can anyone spot the issue?

bwa alignment next-gen • 330 views
1
Entering edit mode

I can't spot the issue. What happens? Are there error messages?

Unless you reads are shorter than 70bp, you should be using bwa mem instead of bwa aln.

0
Entering edit mode

There is an error message when I run bwa sampe, but I think the issue begins with bwa aln. No error message occurs when I run bwa aln; however, the .sai output file goes to the current working directory instead of following the path in READ1_list, and the sample name is not written for the .sai file (e.g. READ1_listread1.sai instead of sample_name-read1.sai)

0
Entering edit mode

Put an echo in your loop to see how it looks like:

 for i in (cat READ1_list | sed s'/\-READ1.fastq.gz//' | sort) do echo "bwa sampe reference/reference.unaligned.fastai\-READ1.sai $i\-READ2.sai$i\-READ1.fastq.gz $i\-READ2.fastq.gz \ | samtools view -bS - >$i\-pe.bam"
done


Please note that you could also do the following and pipe directly to samtools sort, without samtools view:

bwa sample ref.fa R1 R2 | samtools sort -o alignment.bam

0
Entering edit mode

I see two problems in your code:

1. The second alignment outputs a read1.sai file, it should be read2.sai
2. The final BAM file requires files named as READ but your alignments are read (file names are generally case-sensitive)

Also it is hard to find the errors without any detail, are you running this on Linux/Mac or Windows?, what are the names of the input files?, what is inside the READ1_list and READ2_list files?

0
Entering edit mode
for i in READ1_list


That shouldn't even work. You should instead e.g.

for i in $(cat READ1_list); do ...  Just try e.g. for i in READ1_list; do echo "$i"; done


vs.

for i in $(cat READ1_list); do echo "$i"; done


Only the latter works in bash (as intended)..

But anyway, I wouldn't even use a for loop for this but instead just define a function and use it with find and GNU parallel. Also, why are you aligning the reads separately? Surely it would make more sense to align read pairs properly in the same runs?

0
Entering edit mode
2.1 years ago
Joe 12k
@Joe

This is overkill, but you might be interested in my bash code for aligning/remapping: (You'll mainly want to look at Line 95 onward, everything before that is just picking up commandline args etc.)

With a little more bash magic, you can loop your files and pass the relevant paths as arguments:

paste -- "READS1.txt" "READS2.txt" |
while IFS='\t' read -r reads1 reads2; do bash align_and_remap.sh -r /path/to/reference.fasta -1 "reads1" -2 "\$reads2" -o /path/to/outdir
done


*Disclaimer, this script is not well tested at all. Do some testing of you own before letting it loose on all your data. This is also based on an old setup with bwa aln where you had to provide each read file separately. That doesn't need to happen with bwa mem I believe.