Downloading viral database
1
0
Entering edit mode
3.0 years ago

Dear Colleagues, I am planning to download the viral data from NCBI to be used later in my pipeline virus identification. However, I found several file types for nucleotide and protein data. My plan is to download every three types of files and combine them into one file. Thus, which of the following group of files should be selected for nucleotide data? viral.1.1.genomic.fna.gz; viral.2.1.genomic.fna.gz; viral.3.1.genomic.fna.gz viral.1.genomic.gbff.gz; viral.2.genomic.gbff.gz; viral.3.genomic.gbff.gz

Which of the following group of files should be selected for protein data? viral.1.protein.faa.gz ; viral.2.protein.faa.gz; viral.3.protein.faa.gz
viral.1.protein.gpff.gz; viral.2.protein.gpff.gz; viral.3.protein.gpff.gz In advance, I appreciate any comment.
Best Regards Adnan

database Virus NCBI • 1.1k views
ADD COMMENT
0
Entering edit mode

Why is this a Job type post?

ADD REPLY
0
Entering edit mode

Dear Ram, I am new in using BioStars. Thus, I might choose the incorrect type of post. Thanks for your question Regards Adnan

ADD REPLY
1
Entering edit mode
3.0 years ago
vkkodali_ncbi ★ 3.7k

For genomic sequences in FASTA format, use the files with suffix genomic.fna.gz, for genomic sequences in GenBank flatfile format, use the files with the suffix genomic.gbff.gz, and for protein sequence files in FASTA or GenPept flatfile formats use the files with suffixes protein.faa.gz and protein.gpff.gz, respectively.

To have a complete set of, say, viral genome sequences, you should download all of the genomic.fna.gz files and concatenate them.

ADD COMMENT
0
Entering edit mode

Dear vkkodali, Thanks for your prompt advice. Regards

ADD REPLY

Login before adding your answer.

Traffic: 2271 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6