How to converts output from rpsblast XML format to CSV format?
2
0
Entering edit mode
7.4 years ago

Hienter code here

Does anyone have a script that converts output from rpsblast XML format to CSV format? Is this a fragment of my XML result?

    <Hit>
      <Hit_num>1</Hit_num>
      <Hit_id>gnl|CDD|289286</Hit_id>
      <Hit_def>pfam12505, DUF3712, Protein of unknown function (DUF3712).  This domain family is found in eukaryotes, and is approximately 130 amino acids in length.</Hit_def>
      <Hit_accession>289286</Hit_accession>
      <Hit_len>124</Hit_len>
      <Hit_hsps>
        <Hsp>
          <Hsp_num>1</Hsp_num>
          <Hsp_bit-score>93.4135</Hsp_bit-score>
          <Hsp_score>233</Hsp_score>
          <Hsp_evalue>9.11946e-22</Hsp_evalue>
          <Hsp_query-from>846</Hsp_query-from>
          <Hsp_query-to>970</Hsp_query-to>
          <Hsp_hit-from>2</Hsp_hit-from>
          <Hsp_hit-to>122</Hsp_hit-to>
          <Hsp_query-frame>1</Hsp_query-frame>
          <Hsp_hit-frame>1</Hsp_hit-frame>
          <Hsp_identity>39</Hsp_identity>
          <Hsp_positive>60</Hsp_positive>
          <Hsp_gaps>4</Hsp_gaps>
          <Hsp_align-len>125</Hsp_align-len>
          <Hsp_qseq>PLGQIAMPNVSLAGDVGADLNIDAAFAVADVGHLTDFTTYLLTQPSFTWQIYGQNLAVSALGITVPGISILKNVVLDGMDGFKGLVKIESFDLPANDPAGGITLTLATSLTNPSSVGVALSQIGF</Hsp_qseq>
          <Hsp_hseq>PFATVPLPGIKAAGN-GTTLVVDQTLDITDVDAFTDFAKALVFSESFTLSVKGKT-DLKLGGLPFSGVTLDKTVTLKGLNNLKG-FSITDFDLP-LPPADGINLVATATIPNPSVLTIELGNVTL</Hsp_hseq>
          <Hsp_midline>P   + +P +  AG+ G  L +D    + DV   TDF   L+   SFT  + G+   +   G+   G+++ K V L G++  KG   I  FDLP   PA GI L    ++ NPS + + L  +  </Hsp_midline>

I would like a table in CSV in this form:

query id,subject id,% identity,alignment length,mismatches,gap opens,q. start,q. end,s. start,s. end,evalue,bit score,subject description S89_g3,gnl|CDD|109488,43.59,39,22,0,247,285,6,44,3.98E-05,548,457,pfam00432: Prenyltrans: Prenyltransferase and squalene oxidase repeat.

Because from it I can work in excel.

blast • 3.4k views
ADD COMMENT
0
Entering edit mode

Have tried any regular blastXML to tab conversion scripts? For eg., https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/blastxml_to_tabular.py

ADD REPLY
0
Entering edit mode

Yes, but I need csv or xls format. Whit Pierre's script I'm almost succeeding.

ADD REPLY
0
Entering edit mode

well, if you have a tsv, you can open it directly in excel. Also, converting from tsv to csv is relatively simple with sed command: sed 's/\t/,/g' file.tsv > file.csv

ADD REPLY
0
Entering edit mode

Actually, I have a result of rpbsblast in xml and I want this result listed a table in xls cleanly, just as I exemplified in my first question. So I would like to convert the xml output or to tsv or to csv, so that would make it easy for me to use in xls. I have a script that works great when I use with the output of blastp (using BLAST +), but for output of rpbs blast does not work. I've tried to fix this but, unsuccessfully.

ADD REPLY
1
Entering edit mode
7.4 years ago

I wrote a blast2tsv : https://github.com/lindenb/xslt-sandbox/blob/master/stylesheets/bio/ncbi/blast2tsv.xsl , you can modify it to get the columns you want.

usage:

xsltproc --novalid blast2tsv.xsl blast.xml
ADD COMMENT
0
Entering edit mode

Almost. Not returning the column "Hit-def".

I made this exchange:

59 <xsl:value-of select="Hsp_qseq"/>

60 <xsl:text> </xsl:text>

To

59 <xsl:value-of select="Hit_def"/>

60 <xsl:text> </xsl:text>

but the sequences continues on the output.

Where am I going wrong?

ADD REPLY
0
Entering edit mode

no sure, juste delete the lines 58 to 63 ?

ADD REPLY
0
Entering edit mode

Hey Pierre, I sent an email to for you at this address plindenbaum@yahoo.fr about some erros that I had. I sent it there because I sent you some files, I do not know how to send here.

ADD REPLY
1
Entering edit mode
7.4 years ago

I got it! I changed blast2 by NCBI BLAST + last version and I used this script to conver xml to csv:

https://github.com/Sunhh/NGS_data_processing/blob/master/annot_tools/blast_xml_parse.py.

Only that!

Thanks for everything!

ADD COMMENT

Login before adding your answer.

Traffic: 2878 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6