Hi All, I have a bit of a dilemma. I have to process a rather large BAM file into an interleaved FASTQ. After it's sorted and indexed, I use the samtools collate command as follows: samtools collate -u -o input_sorted_collated.bam input_sorted.bam
The size of input_sorted.bam is ~147GB and after a long execution the process (unsurprisingly) runs out of space on /tmp. Unfortunately, the collate command of samtools doesn't recognize the -T option to direct the temporary files elsewhere. I guess the developer hasn't considered the possibility of really large datasets.
Has anyone run into this issue before? Can you suggest a solution/hack/workaround? Any thoughts are very much appreciated!
Sasha
The solution Pierre Lindenbaum provided should work. Always read the manual first.
Just as a comment, better avoid these kinds of statements especially when the issue is about really basic things such as memory usage. The
samtools
developer(s) know what they do, these tools are routinely used by thousands of people all year, and some larger files are not uncommon. Fitting entire files into memory is uncommon as not every usage has a heavy server node available. Typically, if one encounters standard issues such as memory problems one is doing something wrong. 1xx Gb files are not uncommon either, standard 30-50x short-read human WGS for example. Thousands of these samples exist at NCBI.