Differences In Alignments When Run Twice (Using The Same Reads/Reference As Input)
2
2
Entering edit mode
11.3 years ago
Stroehli ▴ 40

Hi, I wonder if it is possible to get different resulting alignments using the same reads and the same reference as input?

I did the alignment step twice and my BAM-files from the two runs differ slightly. I used BWA for the alignment step.

Is there any possibility that BWA's decisions where to align a read can differ within two runs having exactly the same parameters and input?

Are there any "random" steps in the algorithm? Does multi-threading affect the alignment?

Furthermore I found, that the regions differing in between the two BAMs showed strikingly low read-quality. But I don't know if this has something to do with the observed problem.

Any help or further insight is appreciated.

Cheers, Stroehli

bam bwa alignment • 3.2k views
ADD COMMENT
5
Entering edit mode
11.3 years ago

Yes, some C random functions are used by bwa:

$ grep rand bwa-0.6.2/*.c | grep -v strand

bntseq.c:            if (c >= 4) c = lrand48()&3;
bntseq.c:    bns->seed = 11; // fixed seed for random generator
bntseq.c:    srand48(bns->seed);
bwa.c:    // count number of hits; randomly select one alignment
bwa.c:        if (drand48() * (p->l - p->k + 1 + cnt) > (double)cnt) {
bwa.c:            one->sa = p->k + (bwtint_t)((p->l - p->k + 1) * drand48());
bwape.c:    srand48(bns->seed);
bwase.c:            if (drand48() * (p->l - p->k + 1 + cnt) > (double)cnt) {
bwase.c:                s->sa = p->k + (bwtint_t)((p->l - p->k + 1) * drand48());
bwase.c:         * number of random hits. */
bwase.c:                    double p = 1.0, x = drand48();
bwase.c:    srand48(bns->seed);
bwtsw2_aux.c:            if (p->flag&1) q->qual = 0; // this is a random hit
bwtsw2_aux.c:            if (c >= 4) { c = (int)(drand48() * 4); ++k; } // FIXME: ambiguous bases are not properly handled
bwtsw2_aux.c:            if (c >= 4) c = (int)(drand48() * 4);
bwtsw2_core.c:    { // choose a random one
bwtsw2_core.c:        j = (int)(i * drand48());
bwtsw2_main.c:    srand48(11);
ADD COMMENT
2
Entering edit mode
11.3 years ago
Fred ▴ 780

The BWA documentation states that :

"sampe[...] Generate alignments in the SAM format given paired-end reads. Repetitive read pairs will be placed randomly". It could explain the differences, especially in the low quality reads.

ADD COMMENT
0
Entering edit mode

Thanks for your reply. Could you elaborate on that? Do I get that right, that repetitive read pairs are reads that map equally well in more than one region on the reference? How would this be consistent with the fact that I have low quality reads? So a low quality read is more likely to map at more than one position equally well (or rather equally badly in this case) and therefore will be placed randomly more often? Is that what you are trying to say?

ADD REPLY
0
Entering edit mode

Repetitive hits are reads that map at multiple positions on the reference.

Concerning the quality, I indeed meant that low quality reads may tend to map at more than one position, but it has to be verified because the bwa doc states that:

"Base quality is NOT considered in evaluating hits"

ADD REPLY

Login before adding your answer.

Traffic: 1531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6