How to access GeneCards via XML?
3
0
Entering edit mode
9.9 years ago
Ömer An ▴ 260

I have list of genes and need to get certain fields of information about them from GeneCards, which is very tedious job manually. Is it possible to access GeneCards via XML so that I can parse it and get the certain fields via script? I am already using xml parsing for NCBI and UniProt, however couldn't find any information for GeneCards on the net, except in GeneCards version log where they say they it can be displayed via xml.

GeneCards XML • 6.5k views
ADD COMMENT
3
Entering edit mode
7.3 years ago
support ▴ 40

They have some serious anti scraping measures in place. I attempted to scrape it with a PHP web crawler tool, I quickly ran into an obstacle:

enter image description here

The page is stored inside an iframe. The URL of the page in the iframe:

/_Incapsula_Resource?CWUDNSAI=9&xinfo=9-338452040-0 0NNN RT(1482287544723 2) q(0 -1 -1 -1) r(0 -1) B12(4,315,0) U18&incident_id=47001050619674560-2450421924911580825&edet=12&cinfo=04000000

The most unholy URL I have ever seen in my life. Like something from a horror movie. I'll be having nightmares about this tonight. I'm suffering some PTSD after seeing this URL. I tried pasting it into a browser, and ran into a captcha. It might be a hopeless case. Sad because they have all this data compiled into one place. Might have to go 10 extra miles and Tscrape everything from the individual sources that they use. Then again, there must be a way.

ADD COMMENT
2
Entering edit mode
7.4 years ago
paul.e.gradie ▴ 110

This is old, but there are over 500 views so I'll add an answer -

GeneCards seems to be a privately curated information repository. It is useful, but they charge a fee for allowing access to their database (if you are a commercial user). Making the data base available to the public and parsable as XML would undermine this business model since entities could simply write their own software to collect information as necessary. Instead, they have developed tools that you need to log in to use - either with an academic license, or with a commercial license.

I'm an academic user, and I admit that I found this question by asking google the same exact thing. I've been trying to find a way to parse the website's database using XML or JSON format.

I haven't spent much time digging around to see if this is an option for academic users, but if I find a solution - I'll come back an post it.

Otherwise - if anyone has already found a way to do this, please come back and post it. :D

Cheers, Paul

ADD COMMENT
1
Entering edit mode

R package httr will GET you going with Genecards:

library(httr)
res <- GET("http://www.genecards.org/cgi-bin/carddisp.pl?gene=TP53&keywords=TP53")
cont <- content(res)
cont

{xml_document}
<html data-ng-app="geneCardsApp">
[1] <head>\n<base href="/">\nTP53 Gene - GeneCards | P53 Protein | P53 Antibody\n<link rel="short ...
[2] &lt;body id=" genecards"="" data-ga-category="Card" data-ga-action="Data Link Click">\r\n        
ADD REPLY
0
Entering edit mode

GeneCards uses information from other databases (e.g. STRING, Uniprot,...) so depending on the type of information you want to get from GeneCards it might be possible and more convenient to directly access the 'source' of the data.

ADD REPLY
1
Entering edit mode
2.2 years ago
Shicheng Guo ★ 9.4k

Did you try this tutorial before? https://rpubs.com/janisharris/geneCards-summaryAndGeneTables

ADD COMMENT

Login before adding your answer.

Traffic: 1943 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6