I am using 'bwa' to map Illumina reads to a large database and would like to be able to store (and analyse) the information on multiple hits. I have forced this very brutishly using the following:
bwa samse -n 1000 database.fasta aln_sa.sai reads.fasta > aln_1000.sam
However, the output is no longer in classical .sam format ( _i.e._ no flags, no CIGAR string, ...). Would someone know how to force a real .sam output?
Just as an example, here is the kind of output I get:
>1-815511 3 3 bta-mir-21 +8 0 mRNA___ENSBTAG00000029897|ENSBTAT00000042276|bta-mir-21|bta-mir-21 +18 0 19 +11033079 0 >2-592417 3 3 bta-mir-21 +8 0 mRNA___ENSBTAG00000029897|ENSBTAT00000042276|bta-mir-21|bta-mir-21 +18 0 19 +11033079 0 ...
What I have tried so far with no luck:
- use the xa2multi.pl script which parses the information in the XA:Z: (information on other hits when their number is inferior to the value of the -n option) to output one line per hit -- this scripts is now part of the 'bwa' package.
Unfortunately, the output of a normal:
bwa samse database.fasta aln_sa.sai reads.fasta > aln.sam
does not contain any XA:Z: field... Even for read 2-592417 which has several hits:
1-815511 0 mRNA___ENSBTAG00000029897|ENSBTAT00000042276|bta-mir-21|bta-mir-21 18 0 23M * 0 0 TAGCTTATCAGACTGATGTTGAC * XT:A:R NM:i:0 X0:i:3 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:23 2-592417 0 19 11033079 0 22M * 0 0 TAGCTTATCAGACTGATGTTGA * XT:A:R NM:i:0 X0:i:3 X1:i:0 XM:i:0 XO:i:0 XG:i:0MD:Z:22
- install the previous bwa-0.5.7 version for which a patch doing exactly what I need has been developed at Broad Institute. But the patch has vanished from its ftp site and is now unavailable.