Difference between NCBI non-redundant and refseq database
1
5
Entering edit mode
8.5 years ago
hdy ▴ 180

What is the difference between nr and refseq? Based on NCBI's own definition, "RefSeq database is a non-redundant set of reference standards derived from the INSDC databases that includes chromosomes, complete genomic molecules (organelle genomes, viruses, plasmids), intermediate assembled genomic contigs, curated genomic regions, mRNAs, RNAs, and proteins", refseq is also redundant. But when you perform blast searches, you can select either nr/nt or refseq. So I assume there is a difference.

refseq nr • 29k views
ADD COMMENT
21
Entering edit mode
8.5 years ago

Nr database encompasses sequences from both non-curated and curated databases:

Non-curated databases (low quality):

  • GenBank/GenPept - unreviewed sequences submitted from individual laboratories and large-scale sequencing projects. Since these sequence records are owned by the original submitters and can not be altered, GenBank might contain many low quality sequences.
  • trEMBL - unreviewed section of UniProt. This section contains a computer-annotated supplement of SwissProt that contains all the translations of EMBL nucleotide sequence entries not yet integrated in SwissProt

Curated databases (high quality):

  1. RefSeq - GenBank sequences that are manually curated by the NCBI staff. RefSeq records are owned by NCBI and can be updated as needed to maintain current annotation or to incorporate additional information.
  2. SwissProt - manually annotated and reviewed protein sequences
  3. PIR - non-redundant annotated protein sequence database
  4. PDB - experimentally-determined structures of proteins, nucleic acids, and complex assemblies
ADD COMMENT

Login before adding your answer.

Traffic: 2633 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6