Question

Where Can I Download Corresponding Nucleotide Sequences For Genes In Nr And Env_Nr?

1

Entering edit mode

12.2 years ago

Tianyang Li ▴ 500

Hi,

Is there any place where I can directly download the nucleotide sequences for all the genes in nr and env_nr?

I want to save some energy by not having to extract nucleotide sequences from databases using accession numbers in nr and env_nr by myself.

I did some Googling but apparently I haven't yet been able to get anything.

Thanks for your help!

database nucleotide sequence gene • 3.5k views

ADD COMMENT • link updated 12.2 years ago by Neilfws 49k • written 12.2 years ago by Tianyang Li ▴ 500

1

Entering edit mode

please tell me what you want to do with the sequences, then it is going to be easier to help.

ADD REPLY • link 12.2 years ago by Michael 54k

0

Entering edit mode

The answers so far are based on this part of your question: "extract nucleotide sequences from databases using accession numbers." You did not state that you want the entire database. See comments and edits below (and try to be more specific in your question).

ADD REPLY • link 12.2 years ago by Neilfws 49k

score 2 · Answer 1 · 2012-01-26

2

Entering edit mode

12.2 years ago

Neilfws 49k

All of the Bio* projects (Bioperl, BioPython, BioRuby etc.) provide programmatic tools to retrieve sequences using identifiers. As an example, Bioperl ships with a script named bp_fetch, which you can run like this:

bp_fetch net::genbank:NM_001205816.1

You can also use Batch Entrez to upload a list of GIs or accession numbers and retrieve records from most of the NCBI Entrez databases.

EDIT: your original question is unclear and implies that you want to retrieve specific sequences by accession number. See comment below and improve your question.

ADD COMMENT • link 12.2 years ago by Neilfws 49k

0

Entering edit mode

I'd like to get all of nr and env_nr entries' nucleotide sequences, so I'm looking forward to some faster way of getting them.

ADD REPLY • link 12.2 years ago by Tianyang Li ▴ 500

0

Entering edit mode

Your original question states that you don't want to "extract nucleotide sequences from databases using accession numbers." If you want all of nr and env_nr, then you should download the sequences from the NCBI FTP site and index them.

ADD REPLY • link 12.2 years ago by Neilfws 49k

0

Entering edit mode

So I guess there's nothing available then.

ADD REPLY • link 12.2 years ago by Tianyang Li ▴ 500

score 1 · Answer 2 · 2012-01-26

Another answer, because you didn't read the first one carefully according to your comment:

As I said, there are no nucleotide entries in NR nor env_NR, there are amino acid entries though. To retrieve all or some entries from a blast database using blast+ tools you can use the blastdbcmd tool.

to get all entries in fasta format for example try the following:

blastdbcmd -db nr -entry all

>gi|15674171|ref|NP_268346.1| 30S ribosomal protein S18 [Lactococcus lactis subsp. lactis Il1403] >gi|116513137|ref|YP_812044.1| 30S ribosomal protein S18 [Lactococcus lactis subsp. cremoris SK11] >gi|125625229|ref|YP_001033712.1| 30S ribosomal protein S18 [Lactococcus lactis subsp. cremoris MG1363] >gi|281492845|ref|YP_003354825.1| 30S ribosomal protein S18 [Lactococcus lactis subsp. lactis KF147] >gi|13878750|sp|Q9CDN0.1|RS18_LACLA RecName: Full=30S ribosomal protein S18 >gi|122939895|sp|Q02VU1.1|RS18_LACLS RecName: Full=30S ribosomal protein S18 >gi|166220956|sp|A2RNZ2.1|RS18_LACLM RecName: Full=30S ribosomal protein S18 >gi|12725253|gb|AAK06287.1|AE006448_5 30S ribosomal protein S18 [Lactococcus lactis subsp. lactis Il1403] >gi|116108791|gb|ABJ73931.1| SSU ribosomal protein S18P [Lactococcus lactis subsp. cremoris SK11] >gi|124494037|emb|CAL99037.1| 30S ribosomal protein S18 [Lactococcus lactis subsp. cremoris MG1363] >gi|281376497|gb|ADA65983.1| SSU ribosomal protein S18P [Lactococcus lactis subsp. lactis KF147] >gi|300072039|gb|ADJ61439.1| 30S ribosomal protein S18 [Lactococcus lactis subsp. cremoris NZ9000] >gi|326407763|gb|ADZ64834.1| 30S ribosomal protein S18 [Lactococcus lactis subsp. lactis CV56] >gi|354692797|gb|EHE92602.1| hypothetical protein LLCRE1631_01913 [Lactococcus lactis subsp. cremoris CNCM I-1631] >gi|358750736|gb|AEU41715.1| SSU ribosomal protein S18p [Lactococcus lactis subsp. cremoris A76]
MAQQRRGGFKRRKKVDFIAANKIEVVDYKDTELLKRFISERGKILPRRVTGTSAKNQRKVVNAIKRARVMALLPFVAEDQ
N
......

To retrieve specific entries use the options -entry <gi:identifier> or -entry_batch <fileWithGis>. To modify the output format, use -outfmt option. For example to create mapping output containing accession number and NCBI taxon-id separated by semicolon for one gi entry you could use:

blastdbcmd -db nr -entry 'gi|15674171' -outfmt '%a;%T'
NP_268346.1;272623
YP_812044.1;272622
YP_001033712.1;416870
YP_003354825.1;684738