Question

How to convert a URL into a DOI programmatically

1

Entering edit mode

7.6 years ago

entheologist33 ▴ 100

I have a database of references for my site, many of the references in there are pubmed, but some many of them are from other journals. I need a way to automatically convert the URL of studies from different journals into a DOI. I also need to get the abstract from them. I have this firefox plugin called zotero, you enter the link of a journal and it automacally gets the metadata for you so theres clearly a way to do this but I don't know how they do it. With pubmed articles its easy because they have an API, and they have their PMID in the URL. But I don't know what to do with articles from other journals.

For example: http://www.sciencedirect.com/science/article/pii/S0014483502920427 http://onlinelibrary.wiley.com/doi/10.1038/sj.bjp.0702844/full http://www.ingentaconnect.com/content/ben/mrmc/2007/00000007/00000006/art00004

They all have different URL structures. Is there an API where I can feed the link into it, and it gives me back metadata such as DOI, PMID and also the abstract?

pubmed doi url • 41k views

ADD COMMENT • link written 7.6 years ago by entheologist33 ▴ 100

1

Entering edit mode

Have you tried to use the cross-ref API?

ADD REPLY • link 7.6 years ago by roy.granit ▴ 880

1

Entering edit mode

DOIs are assigned by the International DOI Foundation. Not Zotero. They are not awarded to URLs ever, period.

I guess what you're asking for is a tool that visits the URL and finds any/all DOIs from that page. I guess such a web-scraping tool could be made - however, it seems like DOIs do not have a format that is easy to find/parse out of free text: http://stackoverflow.com/questions/27910/finding-a-doi-in-a-document-or-page

Zotero works on a select number of websites where the formatting is known and a site-specific-parser can be made by the community of web-scraping experts. This is a pretty huge task, but it's also why Zotero is so awesome. But if Zotero isn't working on some of your URLs (which has lead you to ask this question), i'm afraid it's unlikely any of us will be able to do better :(

ADD REPLY • link 7.6 years ago by John 13k

1

Entering edit mode

I agreed with John. If there are no standards, you will have to figure out yourself how to build the ULR for every datasource you want to include. And if they decide to change it one day, you'll have to change your program as well. What I don't know is if there is any way (read: hack) to use what Zotero is doing for your purpose. But again, I don't know, so you have to try to find that out by yourself.

That cross-ref API that roy posted looks like it might be what you want. But I have no idea if it covers the sources that you are looking for.

ADD REPLY • link 7.6 years ago by LLTommy ★ 1.2k

0

Entering edit mode

Hi I am not sure how good you are in programming (java/python) but if you are confident enough then you can write your own code to extract DOI using following conditions:

If you have URL address of your article then you need to get the TITLE using web page scrapping technique(it is very easy to do using python or java).
Then you need to send that TITLE to CrossRef browser to get DOI number. In order to send the query, you can use web automation technique (SeleniumWebdriver) and then you have to scrap your web page again to get the DOI number of your article.

Otherwise, you need to know the pattern of each journal how they display DOI on their web page which is much more complex in terms of regular expression.

Also if any journal using Dynamic web page to generate result then you need to use Selenium webdrive to catch those informations.

ADD REPLY • link 7.6 years ago by Pallab Bhowmick ▴ 20