Biostar Beta. Not for public use.
How To Force 'Bwa Samse' To Output Multiple Hits In .Sam Format?
2
Entering edit mode
12 months ago
Liège, Belgium

I am using 'bwa' to map Illumina reads to a large database and would like to be able to store (and analyse) the information on multiple hits. I have forced this very brutishly using the following:

bwa samse -n 1000 database.fasta aln_sa.sai reads.fasta > aln_1000.sam

However, the output is no longer in classical .sam format ( _i.e._ no flags, no CIGAR string, ...). Would someone know how to force a real .sam output?

Just as an example, here is the kind of output I get:

>1-815511 3 3
bta-mir-21    +8    0
mRNA___ENSBTAG00000029897|ENSBTAT00000042276|bta-mir-21|bta-mir-21    +18    0
19    +11033079    0
>2-592417 3 3
bta-mir-21    +8    0
mRNA___ENSBTAG00000029897|ENSBTAT00000042276|bta-mir-21|bta-mir-21    +18    0
19    +11033079    0
...

What I have tried so far with no luck:

  • use the xa2multi.pl script which parses the information in the XA:Z: (information on other hits when their number is inferior to the value of the -n option) to output one line per hit -- this scripts is now part of the 'bwa' package.

Unfortunately, the output of a normal:

bwa samse database.fasta aln_sa.sai reads.fasta > aln.sam

does not contain any XA:Z: field... Even for read 2-592417 which has several hits:

1-815511    0    mRNA___ENSBTAG00000029897|ENSBTAT00000042276|bta-mir-21|bta-mir-21    18    0    23M    *    0    0    TAGCTTATCAGACTGATGTTGAC    *    XT:A:R    NM:i:0    X0:i:3    X1:i:0    XM:i:0    XO:i:0    XG:i:0    MD:Z:23
2-592417    0    19    11033079    0    22M    *    0    0    TAGCTTATCAGACTGATGTTGA    *    XT:A:R    NM:i:0    X0:i:3    X1:i:0    XM:i:0    XO:i:0    XG:i:0MD:Z:22
  • install the previous bwa-0.5.7 version for which a patch doing exactly what I need has been developed at Broad Institute. But the patch has vanished from its ftp site and is now unavailable.

Any ideas?

bwa sam • 4.3k views
ADD COMMENTlink
2
Entering edit mode

I just tried this on a test data generated from the yeast genome and bwa version 0.5.9 seems to fill out all the SAM fields as expected - what version are you running?

ADD REPLYlink
0
Entering edit mode

I was actually running 0.5.5 (!) but have now installed 0.6.1 and will run it to see if the problem is solved. Thanks a lot!

ADD REPLYlink
0
Entering edit mode

Perfect! In the latest bwa version, the output is in a fully populated SAM file (XA:Z: field is now present). I just then run the xa2multi.pl on it to obtain one hit per line.

ADD REPLYlink
0
Entering edit mode

Could you please add this as an answer. Thanks.

ADD REPLYlink
4
Entering edit mode
12 months ago
Liège, Belgium

Thanks to Istvan's suggestion, in the latest bwa version, the output is in a fully populated SAM file (XA:Z: field is now present). I just then run the xa2multi.pl on it to obtain one hit per line.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1