parsing tabular blast output
1
0
Entering edit mode
7.2 years ago
nh75 • 0

Hello to all!

I have seen that Biopython recommend xml output to parse blast file. I need however to do my blast in -outfmt 7

How to parse my file having a tabular output? What I want is to associate queries (end of line) to each organism hits: input

Query= R1_sam10_filt_denovo_18-02-17_c1    cov=4.55 len=204 gc=41.67 nseq=6
ref|XR_001550329.1|  PREDICTED: Oryza brachyantha beta-glucosidas...  53.6    0.002
ref|XR_001550328.1|  PREDICTED: Oryza brachyantha beta-glucosidas...  53.6    0.002
ref|XM_006654251.2|  PREDICTED: Oryza brachyantha beta-glucosidas...  53.6    0.002
ref|XM_006654251.1|  PREDICTED: Oryza brachyantha beta-glucosidas...  53.6    0.002
emb|LN590686.1|  Cyprinus carpio genome assembly common carp geno...  48.2    0.064
Query= R1_sam10_filt_denovo_18-02-17_c2    cov=4.54 len=198 gc=52.78 nseq=6
emb|LN590686.1|  Cyprinus carpio genome assembly common carp geno...  48.2    0.064

output

emb|LN590686.1|  Cyprinus carpio genome assembly common carp geno...   c1     c2
ref|XR_001550329.1|  PREDICTED: Oryza brachyantha beta-glucosidas...   c1

Thanks for your answers!

parsing tabular blast • 1.8k views
ADD COMMENT
1
Entering edit mode
7.2 years ago

not the output you asked (but anyway, this is a bad output :-) )

$ awk '/^Query=/ {n=split($2,a,/_/);Q=a[n];next;} {print $1,$2, Q;} ' input.blast

ref|XR_001550329.1| PREDICTED: c1
ref|XR_001550328.1| PREDICTED: c1
ref|XM_006654251.2| PREDICTED: c1
ref|XM_006654251.1| PREDICTED: c1
emb|LN590686.1| Cyprinus c1
emb|LN590686.1| Cyprinus c2
ADD COMMENT
0
Entering edit mode

thank you very much Pierre!! that's almost what I expected! do you have an idea how to 'merge' the 2 last lines of final output and give them 'c1, c2'? (I already grep the original blast output)

ADD REPLY
1
Entering edit mode

sure: a simple awk ,groupby, datamash ...

but it's usually a bad idea to reformat such data.

ADD REPLY
0
Entering edit mode

ok with groupby

bedtools groupby -i output.txt -grp 1 -c 2 -o collapse

what will be the command in awk?

other question related to the output file :

  • how to have tabulation between fields?
  • how to print query information (cov=4.55 len=204 gc=41.67 nseq=6) into a new file?
ADD REPLY

Login before adding your answer.

Traffic: 1961 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6