I have two questions:
1) Can you reliably and robustly predict the absence of a gene (either missing entirely or being non-functional) from an organism simply by not finding it in an assembly based on whole genome sequencing from a bacterial cell culture taken from clinical samples?
2) If no, is the lack of a gene in an assembly still some sufficient degree of evidence for absence in the original biological context? Would you bet on the gene being absent in the organism if you did not find it in an assembly even if you knew it could not be robustly scientifically inferred?
I suspect the answer to these questions is no, because:
the sampling could have gone wrong (i.e. sampled one clone from an infection that contains multiple clones and this particular clone happens to lack the gene but not the other).
DNA extraction could have gone wrong, so even though the gene exists in the organism, it might not end up in the DNA that gets successfully extracted.
The kit used for converting the DNA into a form that can be sequenced on a specific sequencing platform might have been less than theoretically perfect.
The library happened to be low complexity.
The gene might be more difficult to sequence than other genes due to sequence biases.
The sequence quality for the reads from that gene might be of too low quality and be filtered out in the quality filtering step.
The gene might have features that makes it difficult to assemble or exist in multiple copies so that the assembly collapses it and the specific variant one is looking for might not be detected.
Due to the specific idiosyncrasies of the assembler, the gene happened to be split among many contigs.
The algorithm used to detect the gene from the assembly might have limitations.
The database you were using did not even contain the gene you were looking for.
...or any number of other biological or bioinformatics reasons.
In other words, there are so many things that could theoretically have gone wrong that it is unwise to claim that the gene is not in the organism just because it is not in an assembly.
Is this largely accurate? Would you consider it obviously flawed to conclude absence of a gene in the organism from the mere observation that it is not found in an assembly?