I have .bam files from a human whole genome sequence that I've had to separate into individual chromosomes (1-22, X, Y) to make another program run. I am now wanting to merge chromosomes so that I have every paired possible combination e.g. chr1_chr2, chr1_chr3... chr1_chrY and finally chrX_chrY. At the moment, each individual chromosome and its index are in the same directory.
to make 1 pair, I know that the command is as follows (taken from samtools manual):
samtools merge output.bam input1.bam input2.bam
e.g.
samtools merge chr1_chr2.bam chr1.bam chr2.bam
But how can I iterate through every possible combination of pairs, rather than writing each line of code (~ 300 lines of samtools merge ...) I don't need pairs to repeat, for example chr1 can pair with chr2:22,X,Y but chr2 doesn't need to then merge again with chr1 (or can this not be avoided/ delete later?)
I've got as far as
for file in *_chr*.bam; do samtools merge merge_${file}.bam $file $file; done
I understand that this will find all of my chromosome .bam files, then merge but this isn't going to iterate through every combination is it?
use two loops instead of one....
thanks! Very new to unix and other computational languages... any tips on how to go about that... ? ...?
https://www.google.com/search?q=bash+loop+tutorial
Why do you need pairs?
Do you realise you can use
And get all files merged?
Because a package I'm using downstream is eventually going to iterate through each pair. Alternatively I could keep the data as one large .bam file but its too large andexits before completing. This later package has supposedly been written specifically for whole genome sequencing analysis but doesn't write anything to file, rather it does everything in memory, reaches capacity then exits.