Question

Help understanding wgsim_eval.pl output

0

Entering edit mode

9.2 years ago

Daniel Standage 4.1k

I used wgsim to simulate some Illumina reads, and then I mapped the reads back to the original sequence using several different aligners. The wgsim code distribution includes a wgsim_eval.pl script for, I presume, evaluating alignment accuracy. I'm getting output like this...

06x            0 / 149888              149888  0.000e+00
05x            0 / 29                  149917  0.000e+00
04x            0 / 83                  150000  0.000e+00
03x            0 / 0                   150000  0.000e+00
02x            0 / 0                   150000  0.000e+00
01x            0 / 0                   150000  0.000e+00
00x            0 / 0                   150000  0.000e+00

and this.

04x            0 / 138592              138592  0.000e+00
03x            0 / 529                 139121  0.000e+00
02x           18 / 6503                145624  1.236e-04
01x           11 / 1580                147204  1.970e-04
00x          207 / 1202                148406  1.590e-03

As far as I can tell the wgsim documentation doesn't describe the output format. Can anyone explain this output to me?

wgsim bwa bam alignment • 2.7k views

ADD COMMENT • link updated 9.2 years ago by Devon Ryan 104k • written 9.2 years ago by Daniel Standage 4.1k

score 1 · Answer 1 · 2015-02-02

1

Entering edit mode

9.2 years ago

Devon Ryan 104k

The output format is kind of strange. Anyway, the first column is just the MAPQ. The second is the number of correct alignments at that MAPQ. This is followed by a "/" and then the total number of alignments at that MAPQ. The fourth value is a cumulative sum of the total number of alignments. The final column is ratio of the cumulative correct alignments over the cumulative total alignments.

Perhaps you need to tweak the -g setting to get correct results...unless the aligner is really just not handling the reads well. You can also manually check where wgsim_eval.pl is expecting something to align by just looking at the read name. The format is chromosome_startForward_startReverse, where startReverse is the start position if the alignment should have bit 0x10 set.

ADD COMMENT • link 9.2 years ago by Devon Ryan 104k

0

Entering edit mode

Hi Devon, Thanks for your patient explanation. Sorry to say I was still a little bit confused by wgsim_eval.pl. I have the following question: 1) when should I and how can I use the option -p or -g 2) how can we explain their outcomes? 3) which column should I focus on if I want to get the mapping accuracy? 04x 0 / 138592 138592 0.000e+00 03x 0 / 529 139121 0.000e+00 02x 18 / 6503 145624 1.236e-04 01x 11 / 1580 147204 1.970e-04 00x 207 / 1202 148406 1.590e-03 (Is "the final column is ratio of the cumulative correct alignments over the cumulative total alignments." as you said)? 4) And how can I calculate the mapping accuracy? Looking forward to your reply. He

ADD REPLY • link 8.1 years ago by allrev • 0