Best Practice For Extracting Gene Names From Pubmed Abstracts
6
2
Entering edit mode
11.3 years ago
gundalav ▴ 380

I have a collection of several hundreds of Pubmed Abstracts. What I want to do is to extract all the gene names reported there.

What's the best way to go about it programmatically? Later I'd simply create a hash with pubmed ID as keys and gene names as its members.

pubmed • 5.7k views
ADD COMMENT
5
Entering edit mode
11.3 years ago

There is a webservice for whatizit: http://www.ebi.ac.uk/webservices/whatizit/helpws.jsp

it comes with an example:

String pipelineName = "whatizitSwissprot";
String pmid = "9879"; // The number is a Pubmed accession key
String xml = whatizit.queryPmid(pipelineName, pmid);
System.out.println(xml);
ADD COMMENT
5
Entering edit mode
11.3 years ago

You can get a list of NCBI Gene IDs based on Pubmed IDs using the GNAT webservice: http://textmining.ls.manchester.ac.uk:8081/. To call this webservice, provide a PMID list to the base URL with the pmid parameter as follows:

http://textmining.ls.manchester.ac.uk:8081/?pmid=21483786,21483692

For more information about this webservice, see this paper.

ADD COMMENT
0
Entering edit mode

Hi Thanks, Casey. Do you have any version that also recognize miRNA?

ADD REPLY
1
Entering edit mode

At the time we made this version of the NCBI gene name dictionaries, mirna's were included in the Entrez gene. It looks like they are now though, so we should consider this in the next version of the GNAT dictionaries that are built.

ADD REPLY
4
Entering edit mode
11.3 years ago

Reflect http://reflect.ws/ can be used programatically, and is designed to extract biological (and chemical) entities from text. It has a huge dictionary of synonyms sitting behind it, and could be useful for you - check out the "About" page REFLECT webpage for more info.

ADD COMMENT
1
Entering edit mode
11.3 years ago
Pappu ★ 2.1k

There are several papers out there on pubmed text mining i.e. papers which report protein-protein interactions. DId you check the pubmed/google before posting?

Once you have a list of gene names, you can easily make a dictionary out of PMID and gene names in abstracts in python. Also have a look at Pubmed API.

ADD COMMENT
1
Entering edit mode
11.3 years ago
Evangelos ▴ 10

+1 to aidan-budd! +1 to the Reflect API developers for providing us with such a stable service

In the "About" page of Reflect you may find a link to the Reflect's REST API (http://reflect.ws/REST_API.html) "GetEntities" invoked with the "pmid" parameters provide you with a solution.

ADD COMMENT
0
Entering edit mode

you should add this as a comment to the post

ADD REPLY
0
Entering edit mode
10.1 years ago

(Added Feb 2014) http://www.ncbi.nlm.nih.gov/pubmed/23736528

Bioinformatics. 2013 Aug 1;29(15):1915-6. doi:

10.1093/bioinformatics/btt317. Epub 2013 Jun 4. BeCAS: biomedical concept recognition services and visualization. Nunes T1, Campos D,

Matos S, Oliveira JL.

it provides a nice REST-API: http://bioinformatics.ua.pt/becas

ADD COMMENT

Login before adding your answer.

Traffic: 1598 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6