Question

Recommend Your Favorite User Interfaces For Public Web-Based Datasets -- And Tell Why

5

Entering edit mode

12.4 years ago

Alex Paciorkowski 3.5k

There are a number of ways to display phenotype/genotype data to the public -- our group makes frequent use of some excellent mouse gene expression datasets available at (in no particular order) MGI, LAMHDI, Allen Brain, GENSAT, and BGEM, for example. These data are then compared to results from our own in-house experiments, and we use a number of further tools -- some home-grown, some available through the web -- to address various hypotheses. But all of these analysis pipelines usually begin with mining a fair amount of publicly available data from somewhere.

What are users' experiences about the best ways to display datasets so that bioinformaticians can use them? What publicly available databases have the best user interfaces? Why? What features are essential to you as a bioinformatician when you need to get specific data, get the data quickly, and in a format you can use?

Some examples of features: How often do you need to (or wish you could) batch queries? Do you prefer a web-based utility that allows you to directly query the database backend? How often do you wish for the ability to download a local copy of data? What about the way the data is presented on the web -- the usual table format -- or graphics? Or both? What file format do you prefer to use to export your data?

Do you prefer a web-based dataset to be standalone -- and you build a local analysis pipeline yourself -- or do you prefer the web-based dataset to be already part of an analysis pipeline that gives the data to other open-access sites for you? (There may be a flexibility trade-off here...pipelines that are locally built can be changed to meet local needs...but what of the time saved and reproducibility of using an open-access established pipe? Where is the balance in your opinion?)

What works for you? Looking forward to hearing your thoughts!

database data visualization • 3.4k views

ADD COMMENT • link updated 12.4 years ago by Andra Waagmeester 3.2k • written 12.4 years ago by Alex Paciorkowski 3.5k

Ram · Answer 1 · 2011-11-29

I'll answer that one:

What features are essential to you as a bioinformatician when you need to get specific data, get the data quickly, and in a format you can use?

I like the public mysql servers for bioinformatics because, with the help of a well documented schema, you can query whatever you want.
The SOAP/WSDL -based web services are an unappreciated resource: with this kind of technology you don't need to write a specific parser or a client to call the service. see also: http://www.biocatalogue.org/
The REST-based services are great too but you often need to write a specific parser or to interpret the JSON/XML/TSV response.
IMHO , the NCBI-Utilities and Biomart are the most successful services for bioinformatics.
a XML response can be transformed to (almost) whatever you want using XSLT
a JSON response can be quickly used in a javascript program on the client-side.
line based response can be used bu most unix tools but they are not suitable for the structured data ("I hate the VCF format").

score 3 · Answer 2 · 2011-11-29

3

Entering edit mode

12.4 years ago

Simon Cockell 7.4k

Wow, that's a lot of questions... If I had to hold up an example for how I would like every database to work, I think UniProt would be a pretty good bet.

The data is presented cleanly and simply via the web interface, there are a useful set of core tools which really improve the utility (the implementation of BLAST, for instance, really adds value).

Finally, all of the data is available over a simple REST API, in a multitude of formats, from simple FASTA sequences, to RDF. Also, the complete data set is available, for those not satisfied with individual entries.

ADD COMMENT • link 12.4 years ago by Simon Cockell 7.4k

0

Entering edit mode

Simon, I agree -- UniProt does seem to have all the necessary features, doesn't it? Access to the complete data set is a plus.

ADD REPLY • link 12.4 years ago by Alex Paciorkowski 3.5k

score 2 · Answer 3 · 2011-11-29

2

Entering edit mode

12.4 years ago

Andra Waagmeester 3.2k

You could consider exposing your data to the "Semantic Web". One entry point could be http://thedatahub.org/ By exposing your data as linked open data, you enable easy integrate with other resources.

ADD COMMENT • link 12.3 years ago by Andra Waagmeester 3.2k

0

Entering edit mode

Andra, thanks for this link -- I didn't know about this site at all.

ADD REPLY • link 12.4 years ago by Alex Paciorkowski 3.5k

score 2 · Answer 4 · 2011-11-29

What publicly available databases have the best user interfaces? Why?

The databases exposed through BioMart (http://www.biomart.org/) have at least a common interface to query them. For example, you could use Ensembl's Perl API (http://useast.ensembl.org/info/data/api.html) to retrieve data from them, but you would have to write queries in another format if you want to access -- lets say -- WormBase (http://wormbase.org/). With BioMart you can use either the web-based interface, the REST or Java API to access a whole range of biomedical databases (http://central.biomart.org/).

Regarding your other questions, I personally prefer having a web-interface so that I can quickly get an idea of the data in a database, then have some kind of REST interface for running programmatic queries, and finally, having the option to bulk download the whole database is a must if I need to do some serious processing over it.