Hi everyone,
I'm using SearchIO module for python to parse my HMM3 domtab files.
I'm using hmmscan and domtblout files. Problem is when i want to parse my results i get an assert error : assert len(cols) == 23
I found out that in hmmer3_domtab.py those line and i get why.
def _parse_row(self):
"""Returns a dictionary of parsed row values."""
assert self.line
cols = [x for x in self.line.strip().split(' ') if x]
# if len(cols) > 23, we have extra description columns
# combine them all into one string in the 19th column
if len(cols) > 23:
cols[22] = ' '.join(cols[22:])
elif len(cols) < 23:
cols.append('')
assert len(cols) == 23
Point is i dont get why i get this error for the last hit of everyfiles or if there is only one hit by file (even if there is the same number of cols in each hit)
--- full sequence --- -------------- this domain ------------- hmm coord ali coord env coord
target name accession tlen query name accession qlen E-value score bias # of c-Evalue i-Evalue score bias from to from to from to acc description of target
------------------- ---------- ----- -------------------- ---------- ----- --------- ------ ----- --- --- --------- --------- ------ ----- ----- ----- ----- ----- ----- ----- ---- ---------------------
profil1 - 834 hitname|-|1356354..1357713 - 452 2.6e-08 18.2 0.0 1 1 3.7e-08 3.7e-08 17.7 0.0 439 530 198 289 159 295 0.84 - profil1 - 834 hitname|-|1357766..1360262 - 831 1.5e-147 478.8 16.2 1 1 1.6e-147 1.6e-147 478.6 16.2 4 833 2 830 1 831 0.96 - #
Program: hmmscan
Version: 3.1b2 (February 2015)
Pipeline mode: SCAN
[ok]
I get the error for the last hit, but not for the first one.
Here part the code i use
try :
for qresult in SearchIO.parse(handle, 'hmmscan3-domtab'):
query_info = qresult.id #sequence ID from fasta
query_info = query_info.split('|')
query_ID = query_info[0]
query_strand = query_info[1]
query_pos = query_info[2]
query_pos = query_pos.split('..')
query_start = query_pos[0]
query_end = query_pos[1]
query_len = qresult.seq_len
hits = qresult.hits
align = qresult.fragments
num_hits = len(hits)
count = 0
longueur_align_query = 0
longueur_align_hit = 0
except AssertionError:
print('better luck next time')
pass
I raised the exception to keep on going because the script would just stop if not. The problem is i just dont parse most of the results... Does anyone know how to fix this ?
PS : Sorry for my broken english :'(