I am currently performing blast searches for a species of interest by blasting a genome assembly against custom databases of metagenome shotgun data.
Everything seems to be working fine and I am getting hits, if I create a database from a single sample (i.e. paired reads, converted fastq to fasta). But, as soon as I create a custom database from multiple samples, I am only getting hits for a single sample, although I know that there must be hits in the other samples as well, since I tried making databases from every single sample.
So for example:
Lactobacillus crispatus gets - 578 hits from sample A - 1606 hits in sample B - 614 hits from sample C if I blast against individual databases for each sample
Combing the three samples in one custom database, I get 614 hits (and the subject IDs in the output are all from sample C). Not a single hit from sample A or B.
I have also had a look at sequence identity between samples (since its 150bp reads) and although there are some identical reads, I still should get plenty unique hits from each of the other samples.
Does anyone have an idea as to why it behaves that way? I have the feeling that its the sample with the largest amount of sequences, compared to the others. Or alternatively, alphabetically the last one in the database.
Any thoughts are much appreciated! Thanks, Christina