Biostar Beta. Not for public use.
How To Convert Blast Results To Gff
16
Entering edit mode
6.3 years ago
Michael Barton ♦ 1.8k
Akron, Ohio, United States

I'd like to visualise the results of a BLAST search in a genome browser. Is there a simple way to get the results in GFF format without having to write a parser myself?

ADD COMMENTlink
8
Entering edit mode
15 months ago
Darked89 4.2k
Barcelona, Spain

Start with tabulated blast output myfile.blast.out. Then check two-liners from:

http://bergman-lab.blogspot.com/2009/12/ncbi-blast-tabular-output-format-fields.html

Few lines tooutput proper gff are missing, but you may either go for minimalistic gff or try to encode everything in column 9. Also you may try validating your gff3 here:

http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online

NOTE: The blog linked above does not seem to exist anymore, here is the content of it from the wayback machine:

NCBI Blast Tabular output format fields

Certainly, with the new NCBI Blast+ tools, you won't need this anymore, but as long as we are sticking with the old blastall programm with its horrible documentation, I keep forgetting the format of the BLAST tabular reports. Tabular format is created when you specify -m 8. This is the most useful format for parsing blast yourself without having to learn strange libraries like BioPerl, BioJava, BioPython or BioErlang (doesn't this exist yet, Mike?)

So here is the meaning of the fields:

queryId, subjectId, percIdentity, alnLength, mismatchCount, 
gapOpenCount, queryStart, queryEnd, subjectStart, subjectEnd, eVal, bitScore

Parsing is then simple

Python:

for line in open("myfile.blast"):
    (queryId, subjectId, percIdentity, alnLength, mismatchCount, 
    gapOpenCount, queryStart, queryEnd, subjectStart, 
    subjectEnd, eVal, bitScore) = line.split("\t")

Perl:

while (<>) {
   ($queryId, $subjectId, $percIdentity, $alnLength, $mismatchCount, 
   $gapOpenCount, $queryStart, $queryEnd, $subjectStart, 
   $subjectEnd, $eVal, $bitScore) = split(/\t/)
}
ADD COMMENTlink
2
Entering edit mode

The blog post referred to above has moved to:

http://bergmanlab.smith.man.ac.uk/?p=41

ADD REPLYlink
1
Entering edit mode
ADD REPLYlink
0
Entering edit mode

Using the examples this was relatively simple to do. The GFF3 validator helped too.

ADD REPLYlink
5
Entering edit mode
17 months ago
Bergen, Norway

http://jperl.googlecode.com/svn-history/r16/trunk/Blast2Gff.pl

You can use the script Pierre found swith a slight modification, actually it is a bit crude and does no real error checking but it works. The error is it does not work if the blast file has a header like this:

# BLASTN 2.2.21 [Jun-14-2009]
# Query: 16383 sequences
# Database: genomedata/GenomeOfDeath.fas
# Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score
GeneOfDeath  GenomeOfDeath  100.00  295     0       0       1       295     152626  152920  4e-168   585

So, one should filter out lines beginning with "#" and it does no harm to skip lines which are empty or contain only white spaces. So edit the file Blast2Gff.pl: in line 149 add:

next if (/^\#/ || /^\s*$/); # filter comments and empty lines

Such that this part looks like below, then try again.

while (<BLASTIN>)
{
  next if (/^\#/ || /^\s*$/); # filter comments and empty lines

$HitNum++;

my ($QryId, $SubId, $PID, $Len, 
    $MisMatch, $GapOpen, 
    $QStart,$QEnd, $SStart, $SEnd,
    $EVal, $BitScore) = split(/\t/);
ADD COMMENTlink
1
Entering edit mode
14 months ago
London, UK

Have you tried these scripts: http://gmod.org/wiki/Load_BLAST_Into_Chado, http://www.bioperl.org/pipermail/bioperl-l/2002-November/010223.html ??

maybe the PSL format is better to represent an alignment. You can also look at the BED format so later you can play with BedTools

ADD COMMENTlink
0
Entering edit mode

I don't have bioperl on my computer and the documentation states that installing bioperl is "non-trivial" which is not encouraging. Perhaps it would be simpler just to write a parser myself.

ADD REPLYlink
1
Entering edit mode
15 months ago
Malcolm.Cook ♦ 1.0k
kansas, usa

If your genome browser of choice displays psl tracks (as does IGV, wink wink), the save your blast results as XML and convert them in two steps to psl using the Jim Kent utility blastXmlToPsl

Here's an example call:

blastXmlToPsl -pslx -tName=Hit_id -qName=query-def0  blast_hits.xml   blast_hits.psl

Else, take it another step, and generate a .bed, with

pstToBed blast_hits.psl blast_hits.bed
ADD COMMENTlink
0
Entering edit mode
6.3 years ago
nuria • 0

Hi! I tried something similar to what Darked89 explained, but I have a problem:

The subject end is not always larger than the subject start (plus/minus), and bedtools complains:

"Error: malformed BED entry at line 3. Start was greater than end. Exiting."

Does anyone know a way to change start and end values when end is smaller?

Thank you

ADD COMMENTlink
0
Entering edit mode

This should be rather a comment, not an answer to the question, to solve the issue, swap start and stop if start > stop: by some code like: ($start, $stop) = sort ($start, stop)

ADD REPLYlink
0
Entering edit mode
6.3 years ago
nuria • 0

I solved my problem (wasn't too complicate) so I'm posting my solution here. In case someonle else want to turn a blast tabular output into a simple bed file, but has problems with subject start values larger than end values.

I mean turn this:

scaffold01109    994    1071    HWUSI-EAS1662:3:114:11269:10361#5/2
scaffold01109    1018    1071    HWUSI-EAS1662:3:1:4638:10404#5/1
scaffold01109    1071    1009    HWUSI-EAS1662:3:23:15834:10646#5/1
scaffold01109    2289    2245    HWUSI-EAS1662:3:24:9622:2914#5/1

I got this file doing:

cat blast+tabularfile.input | awk '{print $2,$9,$10,$1}'> almostbedfile.output

into this:

scaffold01109   994     1071    HWUSI-EAS1662:3:114:11269:10361#5/2
scaffold01109   1018    1071    HWUSI-EAS1662:3:1:4638:10404#5/1
scaffold01109 1009 1071 HWUSI-EAS1662:3:23:15834:10646#5/1
scaffold01109 2245 2289 HWUSI-EAS1662:3:24:9622:2914#5/1

 

cat input.tsv | awk '{if ($3<$2) print  $1,$3,$2,$4; else if ($3>$2)print $0}' > output.bed

But remember to sort it before use it!

ADD COMMENTlink
0
Entering edit mode
17 months ago
cmdcolin ♦ 1.2k
United States

The tool bp_search2gff.pl can do this fairly well.

Install bioperl (cpanm --notest BioPerl)

Then run blast and the bp_search2gff.pl

blastn -query query.fa -db blast.fa > blast.txt
bp_search2gff.pl --input blast.txt --addid --version 3 --type hit

If you use the --match option it can group HSP too

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3