How to get annotation information from identifiers in NCBI nt database?
2
0
Entering edit mode
5.0 years ago
O.rka ▴ 710

I have some Otus that I suspect are contaminants from faulty primers. How can I blasted them against the nt database and the results look like this (w/ outfmt=6):

Otu000056       gi|1163074592|gb|CP020116.1|    99.52   209     1       0       4       212     4182852 4183060 4e-102  381
Otu000056       gi|1163034213|gb|CP020058.1|    99.52   209     1       0       4       212     2494582 2494790 4e-102  381
Otu000056       gi|1163027546|gb|CP020055.1|    99.52   209     1       0       4       212     2523673 2523465 4e-102  381
Otu000056       gi|1163005269|gb|CP020048.1|    99.52   209     1       0       4       212     2155734 2155942 4e-102  381
Otu000056       gi|1162933287|gb|CP020107.1|    99.52   209     1       0       4       212     559506  559298  4e-102  381
Otu000056       gi|1162922101|gb|CP020106.1|    99.52   209     1       0       4       212     5006373 5006581 4e-102  381
Otu000056       gi|1162894325|gb|CP020092.1|    99.52   209     1       0       4       212     4746448 4746656 4e-102  381

I have thousands of these hits. How can I go from gi|1163074592|gb|CP020116.1| to Escherichia coli strain AR_0104, complete genome for example w/ the actual annotated hit? Preferably a command line tool that I can just feed my blast6 output into if possible.

assembly • 718 views
ADD COMMENT
1
Entering edit mode
5.0 years ago
$ for G in 1163074592 1163034213 1163027546 1163005269; do echo -n "$G " && wget -q -O - "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=nucleotide&id=${G}" | xmllint --xpath '//eSummaryResult/DocSum/Item[@Name="Title"]/text()' - && echo ; done

1163074592 Escherichia coli strain AR_0104, complete genome
1163034213 Escherichia coli strain AR_0061, complete genome
1163027546 Escherichia coli strain AR_0069, complete genome
1163005269 Escherichia coli strain AR_0118, complete genome
ADD COMMENT
1
Entering edit mode
5.0 years ago
GenoMax 141k

Using Entrezdirect:

$ for G in 1163074592 1163034213 1163027546 1163005269; do esearch -db nuccore -query $G | efetch -format docsum | xtract -pattern DocumentSummary -element Caption,Title; sleep 3; done
CP020116    Escherichia coli strain AR_0104, complete genome
CP020058    Escherichia coli strain AR_0061, complete genome
CP020055    Escherichia coli strain AR_0069, complete genome
CP020048    Escherichia coli strain AR_0118, complete genome
ADD COMMENT

Login before adding your answer.

Traffic: 1969 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6