I would like to find an efficient way of aligning large fastq files (to the human reference genome) by first splitting-up the fastq into smaller pieces so that they can be aligned in parallel. I can think of two ways of doing this: splitting the fastq up either into files with a fixed number of bases (e.g. a billion bases per file) or into files with a fixed number of reads (e.g. 10 million reads per file). I was wondering if anyone knows which approach should be more efficient in terms of run time? This question is particularly applicable when different fastq files have different read lengths.
I suppose another way of asking the way question is: Do aligners tend to scale with number of bases or with number of reads (in terms of run time)? The aligners I am most interested in are BWA, BFAST and stampy.
Many thanks,
Ian