Question

Different results in bwa mem paired sequential and threaded versions

2

Entering edit mode

9.2 years ago

chemaster ▴ 20

Hi all,

first all, I am new in bioinformatics, I came from the High Performance Computing world. I am trying the bwa tool to check it performance and I am facing the following problem. If I execute bwa mem paired in sequential mode (this is, without the -t option), then later I execute the program again with the same input and with 8 threads ( -t 8 ) the resulting sam file has a lot of differences with the sequential one. These are all the steps I did.

1.- Download the reference from: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz

2.- Create the index with bwa index -a bwtsw

3.- Download the reads from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/NA12750/sequence_read/ERR000589_1.filt.fastq.gz and ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/NA12750/sequence_read/ERR000589_2.filt.fastq.gz

4.- Execute bwa in sequential mode with bwa mem Reference ERR000589_1.filt.fastq ERR000589_2.filt.fastq > ExitSequential.sam (I previously unzipped the input files)

5.- Same as in point 4 but adding the -t 8 option

6.- I made a diff between the exits and I found a lot of differences in the FLAG, MAPQ, RNEXT and PNEXT fields of the sam files

Can somebody please help me with this? Thanks in advance.

The bwa version is 0.7.12-r1039 from github

bwa mem paired thread • 4.1k views

ADD COMMENT • link updated 23 months ago by Ram 43k • written 9.2 years ago by chemaster ▴ 20

1

Entering edit mode

This sort of behaviour has been reported on and off for the past year or so (see this thread: Bwa Mem Have Different Alignment Result When Using Different Threads ). Keep in mind that if you compare multimapping alignments between the two settings that you shouldn't necessarily expect the same results.

ADD REPLY • link 9.2 years ago by Devon Ryan 104k

0

Entering edit mode

According to the internet, in case of equally good multiple alignments, BWA will pick an alignment randomly. Thus, you wouldn't probably get identical results from two sequential runs either, unless maybe there's a fixed seed for every thread.. I don't know.

ADD REPLY • link 9.2 years ago by 5heikki 11k

0

Entering edit mode

I've read the post Devon Ryan wrote. But they talk about a bug in the 0.7.5a version that is already solved, and other people said that they have the same results in the sequential and threaded version. I think these people tried it with a small quantity of reads and not with a big data entry.

In the case that 5heikki said, I've executed the sequential version several times and I've obtained the exact same output. This is because bwa inits always with the same seed (I think it is a 11) and the randomly selected alignment is always the same because the init seed does not change.

This could be the reason why the results are different (each thread inits its own random number generator with the same seed, and after that the random numbers are different in the sequential version and in each thread, I don't know if I am explaining this well), but I am not sure of this.

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.2 years ago by chemaster ▴ 20

1

Entering edit mode

Exactly, the question is whether the RNG is reinitialized per thread, per alignment, or per run. If it's initialized with a new seed for each alignment then the results should be the same regardless of thread number (presuming the seed is some function of the alignment). In the other two cases, however, different results would be produced.

If you filter out the multimappers, do you still obtain discordant alignments between the two runs?

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.2 years ago by Devon Ryan 104k

0

Entering edit mode

Yes, if I filter with samtools view -q1 I still have discordances. Here goes a discordant alignment example:

-> ERR000589.781818    97    18    63487149    60    51M    1    55326423    0    GCAGAACCCGTGTTCCTTTGCTTCCCTAAAGTGCGAAAGCTTTCAGAAATG    IIIIHIIIIIIII@ICII87GG8HG38A,41-15I+)$?#11201
1*(,+3    NM:i:1    MD:Z:39A11    AS:i:46    XS:i:20
<- ERR000589.781818    65    18    63487149    60    51M    7    148149099    0    GCAGAACCCGTGTTCCTTTGCTTCCCTAAAGTGCGAAAGCTTTCAGAAATG    IIIIHIIIIIIII@ICII87GG8HG38A,41-15I+)$?#11201
1*(,+3    NM:i:1    MD:Z:39A11    AS:i:46    XS:i:20

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.2 years ago by chemaster ▴ 20

0

Entering edit mode

That's not a different alignment, its mate is just aligned differently. Is the mate multimapped?

ADD REPLY • link 9.2 years ago by Devon Ryan 104k

0

Entering edit mode

No, it is not multimapped.

One of the entries is from the sequential version, another one from one launched with 16 threads.

ADD REPLY • link 9.2 years ago by chemaster ▴ 20

0

Entering edit mode

That's not actually the question I asked. I realize that you showed an alignment of the same read from the single and multithreaded run. The question pertains to that reads' mate in each run. The alignment you showed is the same between each run. The differences pertain only to its mate. So, if the mate is multimapping then you'll get this, but it's not a bug or otherwise odd behaviour.

ADD REPLY • link 9.2 years ago by Devon Ryan 104k

0

Entering edit mode

As an aside, this would indicate that the RNG isn't reseeded with each alignment. This makes sense, since that'd be computationally wasteful.

ADD REPLY • link 9.2 years ago by Devon Ryan 104k