How To Find The Best Hit From Output.Psl
3
3
Entering edit mode
10.4 years ago
alok.helix ▴ 120

After running a standalone blat on my linux terminal now I wish to find the best hit from the output.psl file which is generated!

I have gone through pslReps and seen the filtration criteria in the following link http://dnaresearch.oxfordjournals.org/content/early/2013/11/25/dnares.dst049.full but I found less clarity in how the parameter are set!

Please guide me in doing the needful!

blat alignment • 6.2k views
ADD COMMENT
4
Entering edit mode
10.4 years ago
5heikki 11k

Is the output tabular with the following fields:

MATCHES - Number of non-repeat matches.
MISMATCHES - Number of mismatches.
REPMATCHES - Number of repeat matches.
NCOUNT - Number of Ns.
QNUMINSERT - Number of inserts in query.
QBASEINSERT - Number of bases inserted in query.
SNUMINSERT - Number of inserts in subject.
SBASEINSERT - Number of bases inserted in subject.
STRAND - Strand.
Q_ID - Query ID.
Q_LEN - Query length.
Q_BEG - Query begin.
Q_END - Query end.
S_ID - Subject ID.
S_LEN - Subject length.
S_BEG - Subject begin.
S_END - Subject end.
BLOCKCOUNT - Block count.
BLOCKSIZES - Block sizes.
Q_BEGS - Query sequence blocks begins.
S_BEGS - Subject sequence blocks begins.

If yes, you would probably begin by sorting with Query IDs, i.e.
sort -k10,10
sort -k10,10 -k2,2g ..would sort queries and then within queries mismatches from least to most
sort -k10,10 -k2,2g output.psl | sort -u -k10,10 --merge > bestHitsWithThisCriteria.psl ..would give you a file with "best hits" with this criteria, but really, you have to decide what makes a best hit.

ADD COMMENT
0
Entering edit mode

yes the output is in the tabular form...i was going through this paper http://dnaresearch.oxfordjournals.org/content/early/2013/11/25/dnares.dst049.full here i request you to look at how they have selected the region with best hit!! i donot understand how have they utlized this on their .psl output

ADD REPLY
0
Entering edit mode

Maybe they used some other output? Their formula includes "match score" which is missing from your output if it's as above..

ADD REPLY
2
Entering edit mode
10.4 years ago
Prakki Rama ★ 2.7k

You can get the blat output in -out=blast8 NCBI blast tabular format and take the best hit based on highest bit score

ADD COMMENT
0
Entering edit mode
7.2 years ago
osowiecki • 0

Blat Score from biopython/Bio/SearchIO/BlatIO.py :

def _calc_score(psl, is_protein):
    # calculates score
    # adapted from http://genome.ucsc.edu/FAQ/FAQblat.html#blat4
    size_mul = 3 if is_protein else 1
    return size_mul * (psl['matches'] + (psl['repmatches'] >> 1)) - \
            size_mul * psl['mismatches'] - psl['qnuminsert'] - psl['tnuminsert']

in short, when size_mul = 1 :

match_score=(float(x[0])+float(x[2]))-float(x[1])-float(x[4])-float(x[6]) #from psl

I'd say that score displayed in IGV is 1000-match_score

ADD COMMENT

Login before adding your answer.

Traffic: 3458 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6