I am blasting (tblastn) a protein onto WGS on NCBI to search directly into the genomes of some taxa.
The protein is not present in every genome and I would like to be able to say "Protein X is present in n organisms out of the N in this lineage." (so, to be able to count N, the total number of sequenced genomes per taxon).
I have found two ways that give quite different results.
tblastn the protein on, say "arthropoda", and retrieve the number appearing in the corresponding field in the output page:
"wgs (676 databases)"
use this page and retrieve the number. Here, for "Arthropoda", it is
Would you know any other command line or online tool to get N? The ideal way would be to use a taxon number as input (6656 for Arthropoda).
Thanks for your help!