nr- protein database
1
4
Entering edit mode
8.7 years ago
Eva_Maria ▴ 190

Hi

I want to download all nr-protein database from ncbi. Is there any link is available for this?

blast Assembly next-gen-sequencing • 36k views
ADD COMMENT
4
Entering edit mode

Either download entire fasta and make your own database ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz or download database which can be downloaded in multiple chunks.

ADD REPLY
0
Entering edit mode

Hi @geek_y, the FTP that @glihm has provided seems that is an updating FTP that new files are going to add to it after a while. Is the "one-file-format" that you have mentioned above is an updating file or just it is always old? thanks

ADD REPLY
0
Entering edit mode

Hi, is there a way to download just a file with the taxonomy information. I mean, a tab delimiter with:

name_of_protein   organism_source(plant, bacteria, other)

I need getting the organism source, but if I take a look for nr db directly have a huge header for each protein and don't exist any pattern a priori to getting that.

ADD REPLY
0
Entering edit mode

Hi,

I know this question is rather old but maybe someone will be needing this information anyways: Use epost and esummary (NCBIs eutils) to obtain information on the lineage. Something like this could help:

cat "$ListWithAccessionNumbers" | epost -db protein |\
    esummary -db taxonomy -format xml | \
    xtract  -pattern Seq-entry -element Org-ref_taxname, OrgName_lineage, NCBIeaa, Textseq-id_accession \
    > SummaryTable.tsv

You finde more detailed information and suggestions on how and what to execute specifically here: How to get summary for acc.no. not starting with 'WP_' ?

ADD REPLY
0
Entering edit mode

Hi everyone, Where can I download nr.fa file of 2016 which contains gi id's

ADD REPLY
0
Entering edit mode

Archival copies of blast databases are not available from NCBI, so there is no easy way to get/recreate a copy of nr database as it existed in 2016. gi numbers have also been deprecated for end-users. Use accession numbers instead.

ADD REPLY
0
Entering edit mode

Please do not post new questions in the answer field, it is reserved for answers only. Using the answer field to post a new question is strongly discouraged, please post a new question with relevant details instead.

ADD REPLY
7
Entering edit mode
8.7 years ago
glihm ▴ 660

Hi there,

You have the FTP site of the NCBI where all databases are available (Url, if the link does not work : ftp://ftp.ncbi.nlm.nih.gov/blast/db/).

Then, in the README, you can find all descriptions of these databases.

For instance:

nr.*tar.gz                    | Non-redundant protein sequences from GenPept, 
                                Swissprot, PIR, PDF, PDB, and NCBI RefSeq
ADD COMMENT
0
Entering edit mode

Thank you for your reply

I want to download all available nr - protein database as a single file

ADD REPLY
7
Entering edit mode

Try this:

wget 'ftp://ftp.ncbi.nlm.nih.gov/blast/db/nr.*.tar.gz'
cat nr.*.tar.gz | tar -zxvi -f - -C .
ADD REPLY
0
Entering edit mode

Files size is huge. You can not have one file with all data. The solutions proposed by Eliad allows you to download all "nr" databases subfiles in one command.

ADD REPLY
0
Entering edit mode

Hi, may I know how to format the nr databases subfiles before using for blast?

ADD REPLY
0
Entering edit mode

Hi, I found a command to format nr db:

./blast-2.2.18/bin/formatdb -i NR -p T -o T

http://zhanglab.ccmb.med.umich.edu/bbs/?q=node/100 it's an older report. I didn't prove it.

ADD REPLY

Login before adding your answer.

Traffic: 1609 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6