My main goal is to quantify absolute abundance of a known bacterial sample. My samples have either E. coli, K. pneumoniae or both. These are lab-grown cultures so I know which samples have which bacteria.
In order to calculate absolute abundance, I spiked-in a known amount of a 150bp fragment of the human GAPDH gene during the library prep stage. Following this, I analyzed the sequencing results by first calculating:
- no. bacterial reads - percentage abundance (output from Kraken) x total number of reads (from bwa mem).
- No. of GAPDH reads mapped which was obtained by using bwa mem to a reference GAPDH.fasta and then counting the number of reads mapped.
I then took the number from (1) divided by (2) to obtain an absolute abundance value for that sample.
However, so far the method hasn't worked as expected and I am getting large variations in the absolute abundance. As an example, for a pure culture of E. coli grown at 30C for 2 hours, I get an average absolute abundance of 1096 and a stdev of ~300.
Can I check if the calculations I am doing make sense? Or if anyone else has had a similar experience in trying to normalize DNA sequencing results?