Is it possible to parse the read entries in an (unaligned) BAM file one by one, without requiring huge amounts of memory?
samtools view <bamfile> | <python script> as well as using the
pysam.AlignmentFile() parser from inside the script, but both solutions on our cluster end up taking over 60GB of RAM for a 6GB BAM. I do believe we have nodes that have can handle a lot more ram than that, but I'm still annoyed by requirements that wouldn't run on a laptop if needed.
I've briefly tried to look around, but nobody seems to be asking this question with regards to simply parsing a BAM. Most memory-related topics for samtools seem to revolve around sorting.
So, is there a more resource-efficient way to parse BAMs progressively, or does the whole thing need to be decompressed into memory first (presumably that's what's happening) before the entries can be accessed sequentially?