Where Can I Download Corresponding Nucleotide Sequences For Genes In Nr And Env_Nr?
2
1
Entering edit mode
12.2 years ago
Tianyang Li ▴ 500

Hi,

Is there any place where I can directly download the nucleotide sequences for all the genes in nr and env_nr?

I want to save some energy by not having to extract nucleotide sequences from databases using accession numbers in nr and env_nr by myself.

I did some Googling but apparently I haven't yet been able to get anything.

Thanks for your help!

database nucleotide sequence gene • 3.5k views
ADD COMMENT
1
Entering edit mode

please tell me what you want to do with the sequences, then it is going to be easier to help.

ADD REPLY
0
Entering edit mode

The answers so far are based on this part of your question: "extract nucleotide sequences from databases using accession numbers." You did not state that you want the entire database. See comments and edits below (and try to be more specific in your question).

ADD REPLY
2
Entering edit mode
12.2 years ago
Neilfws 49k

All of the Bio* projects (Bioperl, BioPython, BioRuby etc.) provide programmatic tools to retrieve sequences using identifiers. As an example, Bioperl ships with a script named bp_fetch, which you can run like this:

bp_fetch net::genbank:NM_001205816.1

You can also use Batch Entrez to upload a list of GIs or accession numbers and retrieve records from most of the NCBI Entrez databases.

EDIT: your original question is unclear and implies that you want to retrieve specific sequences by accession number. See comment below and improve your question.

ADD COMMENT
0
Entering edit mode

I'd like to get all of nr and env_nr entries' nucleotide sequences, so I'm looking forward to some faster way of getting them.

ADD REPLY
0
Entering edit mode

Your original question states that you don't want to "extract nucleotide sequences from databases using accession numbers." If you want all of nr and env_nr, then you should download the sequences from the NCBI FTP site and index them.

ADD REPLY
0
Entering edit mode

So I guess there's nothing available then.

ADD REPLY
1
Entering edit mode
12.2 years ago
Michael 54k

Another answer, because you didn't read the first one carefully according to your comment:

As I said, there are no nucleotide entries in NR nor env_NR, there are amino acid entries though. To retrieve all or some entries from a blast database using blast+ tools you can use the blastdbcmd tool.

to get all entries in fasta format for example try the following:

blastdbcmd -db nr -entry all

>gi|15674171|ref|NP_268346.1| 30S ribosomal protein S18 [Lactococcus lactis subsp. lactis Il1403] >gi|116513137|ref|YP_812044.1| 30S ribosomal protein S18 [Lactococcus lactis subsp. cremoris SK11] >gi|125625229|ref|YP_001033712.1| 30S ribosomal protein S18 [Lactococcus lactis subsp. cremoris MG1363] >gi|281492845|ref|YP_003354825.1| 30S ribosomal protein S18 [Lactococcus lactis subsp. lactis KF147] >gi|13878750|sp|Q9CDN0.1|RS18_LACLA RecName: Full=30S ribosomal protein S18 >gi|122939895|sp|Q02VU1.1|RS18_LACLS RecName: Full=30S ribosomal protein S18 >gi|166220956|sp|A2RNZ2.1|RS18_LACLM RecName: Full=30S ribosomal protein S18 >gi|12725253|gb|AAK06287.1|AE006448_5 30S ribosomal protein S18 [Lactococcus lactis subsp. lactis Il1403] >gi|116108791|gb|ABJ73931.1| SSU ribosomal protein S18P [Lactococcus lactis subsp. cremoris SK11] >gi|124494037|emb|CAL99037.1| 30S ribosomal protein S18 [Lactococcus lactis subsp. cremoris MG1363] >gi|281376497|gb|ADA65983.1| SSU ribosomal protein S18P [Lactococcus lactis subsp. lactis KF147] >gi|300072039|gb|ADJ61439.1| 30S ribosomal protein S18 [Lactococcus lactis subsp. cremoris NZ9000] >gi|326407763|gb|ADZ64834.1| 30S ribosomal protein S18 [Lactococcus lactis subsp. lactis CV56] >gi|354692797|gb|EHE92602.1| hypothetical protein LLCRE1631_01913 [Lactococcus lactis subsp. cremoris CNCM I-1631] >gi|358750736|gb|AEU41715.1| SSU ribosomal protein S18p [Lactococcus lactis subsp. cremoris A76]
MAQQRRGGFKRRKKVDFIAANKIEVVDYKDTELLKRFISERGKILPRRVTGTSAKNQRKVVNAIKRARVMALLPFVAEDQ
N
......

To retrieve specific entries use the options -entry <gi:identifier> or -entry_batch <fileWithGis>. To modify the output format, use -outfmt option. For example to create mapping output containing accession number and NCBI taxon-id separated by semicolon for one gi entry you could use:

blastdbcmd -db nr -entry 'gi|15674171' -outfmt '%a;%T'
NP_268346.1;272623
YP_812044.1;272622
YP_001033712.1;416870
YP_003354825.1;684738
ADD COMMENT
0
Entering edit mode

Actually, I'd like to get all the nucleotide sequences in one big file similar to nr and env_nr. I just feel retrieving them like this is a bit inefficient, so I'm wondering if I could retrieve them in one big file.

ADD REPLY
0
Entering edit mode

Why not use the NT database then? ftp://ftp.ncbi.nlm.nih.gov/blast/db/ I guess you want to run blastn then

ADD REPLY
0
Entering edit mode

I guess I'll have to filter them out myself.

ADD REPLY

Login before adding your answer.

Traffic: 2004 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6