Discrepancy in Number of SRA between NCBI Website and BigQuery service (SQL request)
0
0
Entering edit mode
11 weeks ago
marie.harmel ▴ 10

Hello,

I recently came across an inconsistency between the number of Sequence Read Archive (SRA) datasets reported on the NCBI website and the count obtained through a SQL query on BigQuery.

As of February 2024, the NCBI website displays a total of 27,102,173 SRA available. ncbi_sra.

However, when running the following SQL query on BigQuery:

SELECT DISTINCT m.acc, m.sample_acc, m.biosample, m.sra_study, m.bioproject 
FROM `nih-sra-datastore.sra.metadata` as m,
`nih-sra-datastore.sra_tax_analysis_tool.tax_analysis` as tax 
WHERE m.acc=tax.acc and m.bioproject IS NOT NULL 
ORDER BY m.bioproject, m.sra_study, m.biosample, m.sample_acc

I obtain 25.636.505 SRA.

I am curious to know if this difference in numbers could be attributed to the timing of updates between the NCBI databases on BigQuery and those accessible directly through the NCBI website.

Thank you in advance for your time and assistance.

NCBI SQL BigQuery SRA • 165 views
ADD COMMENT

Login before adding your answer.

Traffic: 1413 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6