Biosql Vs Own Db Schema For A Custom Snp + Annotations Data Storage
2
4
Entering edit mode
13.3 years ago
Chronos ▴ 610

I'm building a database to store SNP data + different-level annotations from several sources (for further analysis).

To avoid re-inventing a wheel, I had a look at existing bio-databases, and found BioSQL.

However, it seems that the only benefit in using it is in Bio(Perl|Python|*) DB-level inter-interoperability.

  1. Are there any other benefits to using BioSQL vs creating a custom schema, esp. for the case of SNP+annotations data? For example, would using BioSQL simplify calls to BioP(erl|ython) functions?
  2. Do you know examples of BioSQL being used in large-scale [SNP] projects? In other words: is BioSQL schema good for scaling up and out without loosing performance?
database snp annotation • 4.0k views
ADD COMMENT
0
Entering edit mode

Who use BioSQL ? is it popular ?

ADD REPLY
0
Entering edit mode

I've seen some project mentioning it (this is how I found it), but cannot recollect which project that was...

ADD REPLY
0
Entering edit mode

GBrowse (GMOD project) can use BioSQL, though there's a note on "few users of BioSQL".

ADD REPLY
8
Entering edit mode
13.3 years ago

I used BioSQL as the storage for a web interface to archive and search RNAi experiments. While this doesn't touch on SNP representation directly, I can discuss some of the trade offs when using the general schema for more specific projects. Overall, BioSQL scaled well and we were happy with the performance and understandability of the code. Some pros:

  • BioSQL has general feature and annotation models which are compatible with Bio projects. This is very useful if you want to do things like upload features from a GenBank file, or output features from a region as GFF. Since the database objects are compatible with the Bio objects, it's straightforward to then convert these to and from common formats.

  • The schema contains additional goodies, like a solid structure for representing terms and ontologies. We ended up using these to drive a very nice structured query system; if we wouldn't have been using BioSQL I certainly would have ended up with a more ugly ad-hoc solution.

  • The general nature of BioSQL helps prevent the natural explosion of tables that happens as a project develops. Instead of re-working or adding new tables for each object, you work to organize them into the existing framework.

and cons:

  • It takes some up front work to understand the schema and structure; this is true of any system you'll adopt.

  • This is a general framework and items will first have to be converted to generic features before being stored; this will have more layers then if you design a table to directly support your object.

My specific suggestion would be a hybrid approach where you use BioSQL for general annotation needs and layer on custom tables for SNP storage, borrowing from schemas that Larry recommended.

ADD COMMENT
0
Entering edit mode

Thanks for sharing your experience. I've decided to do something similar to your suggestion: I've already built custom tables for SNPs, but as soon as I get to genome annotations and/or ontologies - I'll have a closer look at BioSQL (or GMOD's Chado - which is a different topic, I guess).

ADD REPLY
1
Entering edit mode
13.3 years ago

I cannot answer your questions 1 and 2 directly; we don't use BioSQL. I do suggest that you have a look around the Human Variome site in case you want to see how they suggest SNPs be annotated. Entities like genes and proteins have pretty good ontologies, but DNA and RNA do not. This is where annotating SNPs can be problematic.

ADD COMMENT
0
Entering edit mode

Thanks for the Variome link, haven't seen it before.

ADD REPLY
0
Entering edit mode

You should also check out the HGVS - just to see what is there. I don't think they have anything about annotation of variants. About as close as they get is nomenclature rules. Just is good to be aware.

ADD REPLY
0
Entering edit mode

Is there a link to data at the Human Variome site?

ADD REPLY
0
Entering edit mode

I don't know because I get my data from elsewhere. What the HV and HGVS groups try to do is engage a conversation about how to organize the annotation of SNPs and how to name them. Important stuff when going from dbSNP to biological function.

ADD REPLY

Login before adding your answer.

Traffic: 1546 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6