Fetching NCBI gene symbols from NCBI protein ids or GI identifiers
1
1
Entering edit mode
6.5 years ago

I have the following information from blastx annotations of bacterial genes predicted by prodigal:

Sequence name   Sequence desc.  Sequence length Hit desc.   Hit ACC
gene_1_contig_1 excinuclease ABC subunit A  228 gi|1055624747|ref|WP_067265422.1|excinuclease ABC subunit A [Sulfitobacter sp. HI0054] gi|1024544140|gb|KZY51396.1| excinuclease ABC subunit A [Sulfitobacter sp. HI0054]   WP_067265422, KZY51396
gene_2_contig_1 excinuclease ABC subunit A  210 gi|1055651942|ref|WP_067291557.1|excinuclease ABC subunit A [Sulfitobacter sp. EhC04] gi|1032103716|gb|OAN76192.1| excinuclease ABC subunit A [Sulfitobacter sp. EhC04] WP_067291557, OAN76192
gene_3_contig_1 MFS transporter 432 gi|1055624744|ref|WP_067265419.1|MFS transporter [Sulfitobacter sp. HI0054] gi|1024544139|gb|KZY51395.1| hypothetical protein A3734_05250 [Sulfitobacter sp. HI0054]    WP_067265419, KZY51395
gene_4_contig_1 MFS transporter 561 gi|1055624744|ref|WP_067265419.1|MFS transporter [Sulfitobacter sp. HI0054] gi|1024544139|gb|KZY51395.1| hypothetical protein A3734_05250 [Sulfitobacter sp. HI0054]    WP_067265419, KZY51395

I wish to fetch gene symbols using the information (either the gi identifiers or the protein accessions) from the blastx results; may be using entrex efetch.

So, the result would be as below:

Gene Name                         Gene symbol
excinuclease ABC subunit A        UvrA

See, the link here. However, I am not sure how to proceed in this case. Can anybody please suggest something?

efetch NCBI entrez gene • 2.9k views
ADD COMMENT
0
Entering edit mode

Hi Vijay, Did you try using Biomart? it has some useful function to fetch gene symbols.

ADD REPLY
0
Entering edit mode

The gene symbol appears to have been included in the description: https://www.ncbi.nlm.nih.gov/protein/1055624747/

ADD REPLY
0
Entering edit mode

Unfortunately, that is not true for all the entries which I have. That had saved a lot of time

ADD REPLY
0
Entering edit mode

How about db2db where you would convert RefSeq Protein Accession to Gene ID? https://biodbnet-abcc.ncifcrf.gov/db/db2db.php

ADD REPLY
0
Entering edit mode

You could do something like:

esearch -db protein -query "1055624747" | efetch -format docsum | xtract -pattern Title -element Title

Problem is you are dealing with WP* entries which are non-redundant protein entries from multiple strains etc. so the gene symbol is not separately annotated.

ADD REPLY
3
Entering edit mode
6.5 years ago

Hi,

You can use GI ids to retrieve associated information from uniprot "Retrieve/ID mapping" UniProtKB. Here, from "GI number" to "UniProtKB" should be selected and it will give output with all the information you need in tabular format and you can select columns of your interest.

Another way is to use batchentrez to get gene bank data and you need to parse the information.

ADD COMMENT

Login before adding your answer.

Traffic: 2631 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6