Blast Against Sra Dataset
2
2
Entering edit mode
10.5 years ago
biolab ★ 1.4k

Dear all, I have downloaded a number of SRA genomic seqeunce datasets. I am wondering if it's feasible to perform BLAST againt these datasets using a number of gene seqeunce. This analysis will identify my aim genes' homologous short reads, which can then be further assembled (which softwares can be used? this is another question of mine). These SRA data, which is huge, is obtained by illumina sequencing. I worry SRA reads are too short for BLAST. THANKS!

blast • 8.1k views
ADD COMMENT
0
Entering edit mode

Wouldn't it make more sense to just align the reads against the gene with pretty loose mapping criterion?

ADD REPLY
0
Entering edit mode

what softwares you suggest to map reads onto genes? As I know SOAP allow maximum of 3 bp mismatch. I need more loose criterion. Any of your advices will be helpful.

ADD REPLY
0
Entering edit mode

Bwa(1st one)... bowtie(faster)... Segemehl(better performance)... These are for alignment.... I used SOAP only for denovo, I donno if it can be used for alignment

ADD REPLY
0
Entering edit mode

bowtie2, bwa, etc. With bowtie2, you might need to lower --score-min and decrease the seed length. Then again, if the organisms aren't that different then the defaults might work OK. Use local alignment, rather than end-to-end.

ADD REPLY
1
Entering edit mode
10.5 years ago

Blast works fine for shorter sequences if you tune the search, for example reduce the word size to minimum -W 7 and depending on the database size you will need to raise the expectation value (also search for more advice on this, like primer search with blast).

Note however that the computational resources required to perform the alignments will be many orders of magnitude higher than that of running a short read aligner.

Run some test and evaluate the runtimes and see if you have the computational capacity to perform the searches.

ADD COMMENT
1
Entering edit mode
10.5 years ago

You can definitely run BLAST against SRA data. If you want to use the official BLAST tools and databases, you can download them from here

http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download

The online BLAST website automatically uses different parameters for short sequences, e.g. for sequences less than 50 or so bases it uses the "blastn-short" program instead of "blastn" to do the alignment. You could calculate the sequence length of your SRA files (e.g. manually, or by using something like FastQC) and then decide whether to run "blastn", "blastn-short" or one of the other BLAST programs

If you have a small amount of data, e.g. a few hundred sequences, you can use the BLAST programs in a "remote" mode, where they access the online databases rather than needing to download the databases to your local machine. But if you have a large amount of data to align, it would probably be better to download the BLAST databases to your local machine (they are about 400GB uncompressed for the main databases)

You can also use the BLAST programs to do alignments against your own databases rather than using the official BLAST databases, but you need to format them in a special way before BLAST can use them

Be aware that BLAST runs extremely slowly, especially blastn and blastn-short which are the most accurate versions. Also, if you are downloading and compiling the tools yourself, the multi-threaded mode might not work (it did not work for me). If you have a multi-core computer, you might find it useful to run the BLAST programs in parallel on different sequences, e.g. write a small script to do this, or use GNU Parallel or a similar tool. You could also try running BLAST on several different computers in parallel. I did some BLASTing recently using about 400 computers in parallel, because BLAST was running so slowly and would have taken years to complete otherwise.

If you don't need to use BLAST specifically, but just need to do alignment, then you should consider using a faster alignment tool like Bowtie2 (as others have said)

PS. For assembly you could try Trinity

ADD COMMENT
0
Entering edit mode

You mentioned that it's possible to run the search remotly if you have only a few sequences. Would you know how to specify in command line blast that you want to use the SRA database and specify which SRA experiment?

ADD REPLY
0
Entering edit mode

Hi Adrain, I forgot my previous blastn command (this post is 17 month ago), but I think the ordinary blastn should be OK ( blastn -db <db_file> -query <query_file> -evalue 1e-5 -out <output_file>). It worked on my PC.

I also suggest you to try bwa and bowtie, both of which are faster.

ADD REPLY

Login before adding your answer.

Traffic: 2432 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6