How To Extract Max Score Blast Hit Among A Huge Data Set?
1
0
Entering edit mode
10.2 years ago
HG ★ 1.2k

Hi everyone, I want to extract max score blast sequence among 50 genomic data set : my approach like this:

  1. I did all the denovo assembly of all 50 genome.

  2. Next i predicted all the ORF using Prodigal: (A Microbial Gene Prediction Software)

  3. Now i want to set up a local blast with a query gene sequence vs all 50 genomic ORF.
  4. Finally extract the best blast score sequence among all 50 genome. Can anyone suggest me how to proceed step 3 and step 4

Thank you advance.

blast • 2.5k views
ADD COMMENT
0
Entering edit mode
10.2 years ago
Daniel ★ 4.0k

You can make a blast database of your sequences with the makeblastdb command (in linux):

makeblastdb -in genomesORFs.fas -dbtype nucl -out my_db

Then blast your sequences against it with a command like this, but choose your own parameters (there's loads on info at sites like this and this:

blastall -p blastp -i my_input_file -d my_db -o my_output_file -m 8

You can run it through a perl script to print the best match for whatever parameter you want. Something like this pulled from a different script (NOT THIS):

#!/usr/bin/perl
use strict;
use warnings;

my $BLASTn_file = $ARGV[0] ;

open BLAST, "< $BLASTn_file or die("$BLASTn_file\n$!");
while (my $line = <BLAST>){
    if ( $line !~ /^#/ ) {
            chomp $line;

            my @fields = split(/\t/, $line);
            my $query = $fields[0];
            my $match = $fields[1];
            my $evalue = $fields[10];

            if ((not exists  $Match{$query}) or ($evalue <  $Evalue{$query})){
            # Select lowest eval
                    $Match{$query} = $match;
                    $Evalue{$query} = $evalue;
            }
    }
}

   while (my ($key, $value) = each (%Match)) {
        print "$key\t$value";
  }
ADD COMMENT

Login before adding your answer.

Traffic: 2084 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6