How Do I Import Rdf Data Into R?
2
17
Entering edit mode
13.7 years ago

What approach are you using to import Resource Description Framework data into R? There is minimal support with the R package Rredland, but that seems rather spartanic. There was an interesting Rswub, but that was lost in time. I also noted Rsparql, but the project does not seem to have delivered anything yet. And, of course, I can do something manually... what are your best practices to use RDF data from, for example, Bio2RDF?

r web • 17k views
ADD COMMENT
1
Entering edit mode

Your first link connects to the Swedish version of wikipedia. For the english version: http://en.wikipedia.org/wiki/Resource_Description_Framework

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Sorry, you lost me... Swedish RDF?

ADD REPLY
0
Entering edit mode

Oh, crap... OK, fixing... stupid, we're-so-smart-we-know-where-you-live websites... :(

ADD REPLY
0
Entering edit mode

Ah! Sorry about that; fixed now.

ADD REPLY
11
Entering edit mode
13.1 years ago

I started a package for just this purpose yesterday. It is available from CRAN, as functionality is a bit limited today:

library(rrdf)
m1 = load.rdf("one.rdf")
m2 = load.rdf("two.rdf")
m3 = combine.rdf(m1, m2)
summarize.rdf(m3)
sparql.rdf(m3, "SELECT ?s ?p { ?s ?p ?o }")

It is wrapping around Jena and using rJava to interface to it.

There is in fact also a Bioconductor package called Rredland.

Because the rrdf package now also supports SPARQL queries against remote databases, you can also do (following this BioStar answer):

library(rrdf)

endpoint = "http://rdf.farmbio.uu.se/chembl/sparql"

query = "
SELECT ?organism ?instance
WHERE {
  ?instance a <http://rdf.farmbio.uu.se/chembl/onto/#Target> ;
    <http://rdf.farmbio.uu.se/chembl/onto/#organism> ?organism .
}
";

data = sparql.remote(endpoint, query)

As of version 1.4 you can also use on of the SPARQL variables as values for the row names. For example, to get a single column with the protein names as row names, you do:

query = "
SELECT ?organism ?title
WHERE {
  ?instance a <http://rdf.farmbio.uu.se/chembl/onto/#Target> ;
    <http://purl.org/dc/elements/1.1/title> ?title ;
    <http://rdf.farmbio.uu.se/chembl/onto/#organism> ?organism .
}
";

data = sparql.remote(endpoint, query, rowvarname="title")

Resulting in a R matrix like:

                                                      organism                       
Maltase-glucoamylase                                  "Homo sapiens"                 
Sulfonylurea receptor 2                               "Homo sapiens"                 
Voltage-gated T-type calcium channel alpha-1H subunit "Homo sapiens"                 
Dihydrofolate reductase                               "Escherichia coli (strain K12)"
Tyrosine-protein kinase ABL                           "Homo sapiens"                 
DNA-directed RNA polymerase beta chain                "Escherichia coli (strain K12)"
ADD COMMENT
0
Entering edit mode
ADD REPLY
6
Entering edit mode
13.7 years ago
Michael 54k

The following hints are all far from perfect, and will require some experimenting on your side, but here's my best guess (I got only worst practices for language interfaces, not for reading data from BioRDF):

  • The Redland C library has many language bindings (Perl, Python, Ruby). If these bindings are more complete than Rredland, you could use e.g. the Perl-binding + RPy or RSPerl
  • There are java libraries out there, see the StackExchange answer. They can be interfaced using e.g. SJava or (less nicely) JRI.
  • Pimping the Rredland package to add the functionality you need (maybe most clean but takes a lot of your time)

I would maybe go for the SJava solution first because there at least four java libraries to chose from. I have had some mixed experiences with using language bindings, but in the end RSPerl and SJava worked with Perl and Java for me, and I heard that RPy works nicely too. So it should be possible in principleTM to access the libraries too. Whatever solution you come up with will likely be appreciated by the BioC community.

ADD COMMENT
1
Entering edit mode

Done, see my own answer.

ADD REPLY

Login before adding your answer.

Traffic: 2174 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6