NCBIXML parser get hit sequence
1
0
Entering edit mode
6.6 years ago
yarmda ▴ 40

I want to retrieve the complete sequences of BLAST results that are stored in an XML file. I can see that hsp_hseq is not the complete sequence. Is there an easier way to do this than by pulling the identifier and downloading with Entrez?

<Hit>
  <Hit_num>2</Hit_num>
  <Hit_id>gi|755995789|gb|CP009605.1|</Hit_id>
  <Hit_def>Bacillus cereus strain S2-8, complete genome</Hit_def>
  <Hit_accession>CP009605</Hit_accession>
  <Hit_len>5271178</Hit_len>
  <Hit_hsps>
    <Hsp>
      <Hsp_num>1</Hsp_num>
      <Hsp_bit-score>366858</Hsp_bit-score>
      <Hsp_score>198661</Hsp_score>
      <Hsp_evalue>0</Hsp_evalue>
      <Hsp_query-from>3859112</Hsp_query-from>
      <Hsp_query-to>4061503</Hsp_query-to>
      <Hsp_hit-from>311811</Hsp_hit-from>
      <Hsp_hit-to>109360</Hsp_hit-to>
      <Hsp_query-frame>1</Hsp_query-frame>
      <Hsp_hit-frame>-1</Hsp_hit-frame>
      <Hsp_identity>201234</Hsp_identity>
      <Hsp_positive>201234</Hsp_positive>
      <Hsp_gaps>132</Hsp_gaps>
      <Hsp_align-len>202488</Hsp_align-len>
      <Hsp_qseq>AATAATAATAATTAAAATAAAAAAACTTTAGAATTTTCTTATTTCAAAACAGTAGACAAAATTCAAAAAATTGTGTTAGAATTTGTTTCAATATCATATCGCTTGTTAAATTCCTTTCAAAAGGAAAATAGGTACACGAACATTTCGTTTCGTGTTTAAAAGGGAAGCTTGGTGAAACTCCAACACGGTCCCGCCACTGTAAATGCTGAGATTTCTTTTTGATACCAC$
      <Hsp_hseq>AATAAGAATAATAAAAATAAAAAAACTTTAGAATTTTCTTATTTCAAAACAGTAGACAAAATTCCAAAAATTGTGTTAGAATTTGTTTCAATATCATATCGCTTGTTAAATTCCTTTCAAAAGGAAAATAGGTACACGAACATTTCGTTTCGTGTTTAAAAGGGAAGCTTGGTGAAACTCCAACACGGTCCCGCCACTGTAAATGCTGAGATTTCTTTGTAGTGCCAC
biopython blast ncbixml parse • 1.5k views
ADD COMMENT
0
Entering edit mode
6.6 years ago

use blastdbcmd with your blast database.

ADD COMMENT

Login before adding your answer.

Traffic: 2100 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6