Question

Parse Ncbixml Output Into A Python List Of Hits

0

Entering edit mode

12.3 years ago

Zach Powers ▴ 340

I would like to parse an NCBIXML file to obtain a list if the format:

known_results[i]=(query title, (hit_name,hit_name,hit_name....))

However I am having trouble getting the slice operator to work:

    knowns = "output.xml" 
    i=0
    for record in NCBIXML.parse(open(knowns)): 
        print record.query_id
        known_results[i] = record.query_id     
        known_results[i][1] = (align.title for  align in i.record.alignment)     
        i+=1

which results in:

list assignment index out of range

since i can do known_results[1]= "sample text" I think the problem is that I cannot use the slice method with a variable.

Can anyone suggest and alternative way to create this list?

thanks zach cp

crossposted with answer at StackOverflowlink text

There are two good answers on stackoverflow. The first uses list.append(), the second uses dictionaries. THe major problem with my construct is that you cannot assign values to parts of a list that have yet to be created.

biopython list • 3.6k views

ADD COMMENT • link updated 12.3 years ago by Damian Kao 16k • written 12.3 years ago by Zach Powers ▴ 340

0

Entering edit mode

Is known_results a list, or a list of lists?

ADD REPLY • link 12.3 years ago by Niek De Klein ★ 2.6k

0

Entering edit mode

its a list of lists. the answer on the best ways to do this is on the StackExchange link.

ADD REPLY • link 12.3 years ago by Zach Powers ▴ 340

score 1 · Answer 1 · 2011-12-28

It's probably better to use a dictionary in this case:

knowns = "output.xml" 
known_results = {}
for record in NCBIXML.parse(open(knowns)): 
   print record.query_id
   known_results[record.query_id] = [align.title for align in i.record.alignment]

Now if you want to access your data, you can:

for queryID, alignments in known_results.items():
    print queryID
    for alignment in alignments:
        DO STUFF WITH ALIGNMENT