Biostar Beta. Not for public use.
0
Entering edit mode
22 months ago
gbl1 • 70

Hi,

I have a "complete" genome of a plant in a fasta file (16Go) I need to find some part of it, so I use blast. My first issue, when creating a database, it only accept 1Go… meaning most of my reads are not considered. How to proceed to get all my reads in a given database. Then, If I blast a sequence from another species in my db, how to get the output in a fasta file? By the way, is it possible to use a local blast through a browser, looking like NCBI, instead of a terminal? Eventually, is it possible to obtain a consensus sequence from those sequences? Or doing a local assembly.

0
Entering edit mode

Can you clarify what you mean with 'Go'?

1
Entering edit mode

giga octet?

Is that obsolete word?

3
Entering edit mode

I don't think so ... though somewhat french-centric ;)

just not very common (if existing at all) in english language , where it's apparently even translated as 'gigabyte'

0
Entering edit mode

I'm guessing he means 'Gb'

0
Entering edit mode

" is it possible to use a local blast through a browser"

Yes you can.

here is the tool: https://openwetware.org/wiki/Wwwblast

bonus: NCBI workbench but it have more than blast. https://www.ncbi.nlm.nih.gov/tools/gbench/

0
Entering edit mode

I have a "complete" genome of a plant in a fasta file (16Go)

Is this "complete" plant genome an assembled genome? Or raw sequencing reads?

4
Entering edit mode
21 months ago
colindaven ♦ 1.3k
Hannover Medical School

To answer one part of your question, there is a graphical blast web client which can be easily installed locally using docker. It is still a bit buggy though. It may also be able to index your sequence properly, not sure what your issue is there really.

https://github.com/wurmlab/sequenceserver

0
Entering edit mode

installation failed...

1
Entering edit mode

That was not really informative enough to help any further. Just a recommendation anyway.

1
Entering edit mode
17 months ago
gb • 780

Lots of questions, maybe you should break it down. There is also an other post that has the same question about the webblast. A: local web BLAST server

If the reference genome is to big (your pc has not enough memory) you could split your reference plant genome and make more smaller databases. To BLAST against everything you can make a .nal file https://www.ncbi.nlm.nih.gov/books/NBK279693/. You can also add all the database in de command like.

blastn -query query.fa -db "part1 part2 part3" -out output


Here you can see how you can get the output in a fasta file: Blast+ results file parsing to fasta file you need to use a custom output and format it in a fasta format afterwards.

Not sure about the consensus part, maybe you should use something like bowtie or BWA to map and use samtools to make a consensus. With blast you only have two sequences the target and the query. For a consensus you need at least 3?

1
Entering edit mode

As complementary to the answer, here is a small article regarding the size of rams and building blast db:

http://etutorials.org/Misc/blast/Part+IV+Industrial-Strength+BLAST/Chapter+11.+BLAST+Databases/11.2+BLAST+Databases/

0
Entering edit mode

That is a nice page. I am not a big fan of how ncbi explains BLAST

1
Entering edit mode
22 months ago
gbl1 • 70

Hi,

So, let me tell you how it went:

-First, I found a small program called "fasta-splitter.pl" that helped me having smaller and suitable fasta files.

-I could make 6 databases, meaning: progress, but the 10 more are still missing…

-making an alias DB failed…

-for getting close enough to fasta: blastn -query file.txt -outfmt '6 qseqid sseqid sseq evalue' -db "db1 db2 db3 …" has worked

0
Entering edit mode

Good to hear that you made progress, also you are getting more familiar with blast, tools and the command line. It is always hard to start with new things like this. And yes it is exhausting sometimes. Keep up the work, you can do it =)

0
Entering edit mode

I am familiar with commande lines… but… I need to get what are the arguments to use… thats another issue!

blastn -query file.txt -outfmt '6 qseqid sseqid sseq evalue' -db "db1 db2 db3 …" is not what I look for: it only returns the aligned sequence… My aim is to know what just outside! No point in computing what I already know!

Is there any arguments of blast that would return all the full sequences that match?

1
Entering edit mode

I do not see that option in the outfmt possibilities, you can write a small script for that. But this page did not helped you? Blast+ results file parsing to fasta file

0
Entering edit mode

I read it several time… It does not do what I want for it only gives a fasta with the part of the sequence that are matching… technically, it gives me only what I allready have!

0
Entering edit mode

oh I see it now. You could do something like this in python:

from Bio import SeqIO
hits = set()
with open("blastoutput", "r") as blastoutput:
for hit in blastoutput:

with open("fastafile", "r") as fasta, open("output.fa", "a") as output:
for record in SeqIO.parse(fasta, "fasta"):
if strrecord.id) in hits:
output.write(">"+str(record.description)+"\n")
output.write(">"+str(record.seq)+"\n")


Think it is not the cleanest way, but just give an example on top of my head

0
Entering edit mode

looks cool... how to use that script? where do I tell blast the db to use? and to tell my input file?

0
Entering edit mode

Blast will never return anything else than aligned sequences (actually alligned HSPs not even sequences). If you want to get the full sequences you will need to parse the blast result, get the hit IDs and extract those from the blastDB(s) you used.

blastdbcmd -db" db1 db2 db3" -entry <hit ID>


After which you can then do a global aligned of them using a different software

0
Entering edit mode

#!/bin/sh

awk -F "\t"{print $1}' blastoutput.txt | while read line; do blastdbcmd -db" db1 db2 db3" -entry$line
done

or simply use the -entry_batch option, which takes a file with hit IDs as input