Entering edit mode
7.4 years ago
f2369583
▴
10
I have isolate X that has been sequenced three times, each with an individual DNA/library prep and denovo assembly using SPAdes (X1, X2 and X3).
One of the assemblies has a far greater number of loci identified using a curated species database compared to the others. I.e. X1 = 1600 loci, X2/X3 = 1500 loci). The loci that are present in all are identical at the sequence level, but I'm not sure how to treat the difference in identified loci.
What factors could cause more (or less) loci being identified, and how could I verify which set is the 'correct' one?