Question

How to change the type of the returned accession number

0

Entering edit mode

5.9 years ago

erans995 • 0

Hello everyone

I have the following code that prints an accession number to a file based on the Gene ID the user entered:

use LWP::Simple;

#assemble the URL
$base = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
$url = $base . "efetch.fcgi?db=nucleotide&id=$ARGV[0]&rettype=acc";

#post the URL
$output = get($url);
my $filename = 'acc_num.txt';

open(FH, '>', $filename) or die $!;

print FH $output; 

close(FH);

The accession number of the GID 3773153 written to the output file is: AI211211.1

I want the printed accession number to be: NC_007596.2

What am I supposed to change in my code?

Thanks in advance!!

NCBI accession number type perl • 2.1k views

ADD COMMENT • link 5.8 years ago by erans995 • 0

0

Entering edit mode

I am not sure why do you think NC_007596.2 should be returned instead of AI211211.1 as AI211211.1 and NC_007596.2 do not seem to be linked on the NCBI web pages.

ADD REPLY • link 5.9 years ago by Sej Modha 5.3k

0

Entering edit mode

The accession number of the GID 3773153 written to the output file is: AI211211.1

I want the printed accession number to be: NC_007596.2

Wanting something is fine but is the example given real? Because AI211211 does not seem to have any correlation to NC_007596.2.

ADD REPLY • link 5.9 years ago by GenoMax 141k

0

Entering edit mode

Like I've said, the accession number AI211211.1 is the output of the program. This web page proves the correlation between the the GID and the desired accession number: https://www.ncbi.nlm.nih.gov/gene/?term=3773153

ADD REPLY • link 5.9 years ago by erans995 • 0

0

Entering edit mode

You should to stop using giID. While NCBI uses them internally, they stopped supporting their use externally two years back. You should always use actual accession numbers.

ADD REPLY • link 5.9 years ago by GenoMax 141k

0

Entering edit mode

Look, I've got the code I posted from the NCBI API page. It doesn't matter if it's still relevant or not, I just need to modify the code to print the desired accession number. Can you help me with that?

ADD REPLY • link 5.9 years ago by erans995 • 0

0

Entering edit mode

I guess you are searching different database in your script. I get following output using the NCBI command line eutils:

esearch -db gene -query 3773153|efetch


1. CYTB
Cytb [Mammuthus primigenius (woolly mammoth)]
Other Designations: Cytb; cytochrome b
Mitochondrion: MT
Annotation: Chromosome MT NC_007596.2 (14151..15286)
ID: 3773153

ADD REPLY • link 5.9 years ago by Sej Modha 5.3k

0

Entering edit mode

The following error is printed to the output file:

<entrezgene-set>

Cytb [Mammuthus primigenius (woolly mammoth)]

Cytb

<dl class="details"/><dl class="details"><dt class="desig">Other Designations: </dt><dd class="desig">Cytb; cytochrome b</dd></dl><dl class="details"><dt class="desig">Mitochondrion: </dt><dd class="desig">MT</dd></dl><dl class="details"><dt class="desig"> Annotation: </dt><dd class="desig">Chromosome MT, NC_007596.2 (14151..15286)</dd></dl>

</entrezgene-set>

If I'm using perl, how am I supposed to access the database gene?

ADD REPLY • link 5.9 years ago by erans995 • 0

0

Entering edit mode

You'd need to change the db to db=gene in following line

 $url = $base . "efetch.fcgi?db=nucleotide&id=$ARGV[0]&rettype=acc";

ADD REPLY • link 5.9 years ago by Sej Modha 5.3k

0

Entering edit mode

How can I extract just the accession number from the output file using Perl?

ADD REPLY • link 5.9 years ago by erans995 • 0

0

Entering edit mode

Up...

How can I extract just the accession number from the output file using Perl?

ADD REPLY • link 5.8 years ago by erans995 • 0

score 0 · Answer 1 · 2018-06-21

0

Entering edit mode

5.9 years ago

GenoMax 141k

Where is that page located. Can you provide a link?

Here is your problem. Depending on the type of database you are using you are going to get a different result.

$ efetch -db nuccore -id "3773153" -format acc
AI211211.1

$ efetch -db gene -id "3773153" -format acc

1. CYTB
Cytb [Mammuthus primigenius (woolly mammoth)]
Other Designations: Cytb; cytochrome b
Mitochondrion: MT
Annotation: Chromosome MT NC_007596.2 (14151..15286)
ID: 3773153

ADD COMMENT • link 5.9 years ago by GenoMax 141k

0

Entering edit mode

Yes, here it is: https://www.ncbi.nlm.nih.gov/books/NBK25498/ Scroll down to application 1, the code I posted is a modified version of it.

ADD REPLY • link 5.9 years ago by erans995 • 0

0

Entering edit mode

See the comment above. If you use gene database instead of nucleotide then you get what you want.

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=3773153&rettype=acc

At this time if you use 3773153 with nucleotide database you are going to get AI211211.1 accession number.

If you purely need just accession number then:

$ efetch -db gene -id "3773153" -format docsum | xtract -pattern DocumentSummary -element ChrAccVer
NC_007596.2

ADD REPLY • link 5.9 years ago by GenoMax 141k

0

Entering edit mode

How can I extract just the accession number from the output file using Perl?

ADD REPLY • link 5.9 years ago by erans995 • 0

0

Entering edit mode

I don't think you can without modifying the script that NCBI provides. We don't know how many ID's you have and what kind of response they are going to return (it looks like it will be multi-line).

You may want to look into either using the eUtils option I noted above or use downstream parsing fo results you retrieve with the script from NCBI.

ADD REPLY • link 5.8 years ago by GenoMax 141k

0

Entering edit mode

The output file looks just like what you posted yesterday:

CYTB Cytb [Mammuthus primigenius (woolly mammoth)] Other Designations: Cytb; cytochrome b Mitochondrion: MT Annotation: Chromosome MT NC_007596.2 (14151..15286) ID: 3773153

How can I modify the script so that just the accession number will be written to the file?

ADD REPLY • link 5.8 years ago by erans995 • 0