I am trying to write a simple Bash script emulating what our current external software at my lab does: Hepatitis C genotyping.
I want that for each FASTQ file input to the script, the script performs the aligning, indexing and produces a VCF file. I want that it then creates a consensus sequence. I was able to create this using 1 reference genome.
However, I wanted to introduce a degree of complexity which I'm not sure is possible with Bash. SInce there are 7 main HepC genotypes, I want to perform the aligning, indexing, VCF and consensus sequences with 7 genotypes at the same time (running in sync) so that at the end, I end up with 7 consensus sequences for each FASTQ file.
I then want to be able to compare each of these 7 consensus sequences to their respective references and obtain a % similarity score so that one can determine which is the most likely viral genotype present in the patient. I was thinking of using BLAST for this and awk to extract the necessary data.
I separated the scripts performing the alignment, indexing ect for each reference into their own script. Then I ran all these scripts inside a main script using a for loop like this
for SAMPLE_ID in $@
do
source hcv1script.sh
source hcv3script.sh
done
This gave me the consensus sequences for Genotype 1 only. How can I fix this? Do I need a nested loop?
Thanks a lot for your help
If you can identify the post that helped solve this question please identify that and we can move it to an answer). You can then accept it to provide closure to this thread.
Note: It may be @cpad0112's original comment since there is a lot of nested material under there that must have been ultimately needed.
Yes that is the post that solved it. Thanks