Opening A Fasta File In Windows
6
0
Entering edit mode
12.1 years ago
Vivek • 0

Hi all,

I am a beginner with Blast+.I am using Windows.My aim as of now is to download the nr protein sequence in Fasta format and then format it using makeblastdb.then extract the first 1000 characters from the nr file as a seperate file (say qa.fasta) and then query it against the whole database.

Now i downloaded the nr database in Fasta format from this link

ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz (are these the original fasta files??)

then i used to makeblastdb command like this

makeblastdb -in nr -dbtype prot -out outnr -> This resulted in the nr file to be split into different parts nr.00 to nr.03.(Is this normal).

Now i need help to extract the first 1000 char from nr file.But how to i open a Fasta file in windows??? How do i proceed??

fasta blast makeblastdb • 47k views
ADD COMMENT
0
Entering edit mode

Why do you need the first 1000 char? Why did you put bioperl in the tags?

ADD REPLY
0
Entering edit mode

I've removed the bioperl tag.

ADD REPLY
2
Entering edit mode
12.1 years ago
Geparada ★ 1.5k

fasta are plain texts files, you can open with notepad or even word.

If you'll often do this kind of stuff, you should use unix. The life is too short to use windows.

ADD COMMENT
1
Entering edit mode

In the long term switching to using a UNIX style system may make sense. However there is a learning curve to take into account... I suggest trying a biology targeted Linux distribution, see http://en.wikipedia.org/wiki/BioLinux, in a virtual machine, for example using VirtualBox (https://www.virtualbox.org/) as a starting point.

ADD REPLY
2
Entering edit mode
12.1 years ago

Hi, first, I'm not sure "original" is the good term, but if you mean: "do these fasta files correspond exactly to the official nr db sequences?" the answer is yes. Second, the fact the db files are splitted is a normal behavior. Nevertheless, I have a doubt the db building process worked until the end: personally, I 've never tried on nr but NCBI provides the nr ready-to-go blastdb that iterates until nr.05. . Do you have the alias file (nr.pal) created? Finally, as Geparada told you, fasta files are text files. So open it with any text editor (better than processor BTW, you don't want any grammar correction, or a Times New Roman font for ids and Arial Italic for sequences, and more importantly, you want to save your first 1000 aa as text, not doc, rtf... ). The difficulty is actually not the type of file, but the size. I've never tried on windows, but a former coworker used Notepad++ and seemed to be happy with this one.

ADD COMMENT
0
Entering edit mode

The 'nr' BLAST database from NCBI contains additional information not present in the fasta sequence format data, since it is generated from the ASN.1. In order to ensure maximum compatibility it is likely a smaller part size is also used by NCBI, this avaoids problems with some filesystems. So it isn't surprising that a manual generation would give fewer parts.

ADD REPLY
0
Entering edit mode

See http://en.wikipedia.org/wiki/List_of_text_editors for a list of text editors, many of which are available for MS Windows. You may find reading http://en.wikipedia.org/wiki/Text_editor helpful since it contains a definition of a text editor.

ADD REPLY
1
Entering edit mode
12.1 years ago
Swbarnes2 ★ 1.6k

If you want to stick with Windows, use gvim, or something like it for Windows. It's more powerful than a Notepad, it has no problem handling very large text files (and I think it's easier on the eyes than Notepad)

ADD COMMENT
0
Entering edit mode

+1. And also Windows/OSX native text editors all treat some characters (whitespace) a bit differently. Linebreak is 'n' in unix, but r in osx for example.

ADD REPLY
0
Entering edit mode
12.1 years ago
ALchEmiXt ★ 1.9k

I did not get why you didn't directly downloaded the preformatted databases from ncbi in the first place? You can blast against it directly and literally get any info from it using the provided utilities. Even on winhoo$.

At best try to use an editor that can handle line-endings conversion (they are different for windhoos en unix and some tools will fail with incorrect line endings. Not all windows-2-unix convert these accuratly. I personally prefere notepad++ where you can interconvert line endings as well).

ADD COMMENT
0
Entering edit mode
12.1 years ago

When opening large fasta files, I have been more than satisfied with JWrite. All other editors used to crash from time to time, especially when handling really large datasets.

ADD COMMENT
0
Entering edit mode
12.1 years ago
Vivek • 0

Hi all,

Thanks for the replies.Apologies for being late to get back.

I am working on a research project with my professor.Thats y i downloaded the fasta files as i was asked to do so :)

The file is too big to be opened by windows (by any editor) and hence i need to extract the first 1000 chars just to take one sequence so that i can do a blast using a test query.

Manu Prestat - Yes i have the nr.pal file created.

ADD COMMENT

Login before adding your answer.

Traffic: 2370 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6