bbmap paired-end alignment problem
0
0
Entering edit mode
6.1 years ago
mark.rose ▴ 50

Hi All

I'm having a problem with aligning Illumina PE reads with BBMap version 35.92.

Execution of

bbmap.sh overwrite=t ref=Z.fasta in=R1_001_val_1.100.fq in2=R2_001_val_2.100.fq minid=0.25 mappedonly=true out=out.sam  outu=unaligned_reads.fa maxindel=100000

yields out.sam which contains only alignments of read1 even though the log info seems to suggest read2 reads were also aligned

Pairing data:           pct reads       num reads       pct bases          num bases

mated pairs:            100.0000%             100       107.4763%              59558
bad pairs:                0.0000%               0         0.0000%                  0
insert size avg:          412.82


Read 1 data:            pct reads       num reads       pct bases          num bases

mapped:                 100.0000%             100       100.0000%              29779
unambiguous:            100.0000%             100       100.0000%              29779
ambiguous:                0.0000%               0         0.0000%                  0
low-Q discards:           0.0000%               0         0.0000%                  0

perfect best site:        0.0000%               0         0.0000%                  0
semiperfect site:        35.0000%              35        35.1758%              10475
rescued:                  0.0000%               0

Match Rate:                   NA               NA        88.8176%              26449
Error Rate:              65.0000%              65         4.4763%               1333
Sub Rate:                65.0000%              65         0.3627%                108
Del Rate:                 0.0000%               0         0.0000%                  0
Ins Rate:                49.0000%              49         4.1136%               1225
N Rate:                 100.0000%             100         6.7061%               1997


Read 2 data:            pct reads       num reads       pct bases          num bases

mapped:                 100.0000%             100       100.0000%              25636
unambiguous:            100.0000%             100       100.0000%              25636
ambiguous:                0.0000%               0         0.0000%                  0
low-Q discards:           0.0000%               0         0.0000%                  0

perfect best site:        0.0000%               0         0.0000%                  0
semiperfect site:        26.0000%              26        25.3745%               6505
rescued:                  2.0000%               2

Match Rate:                   NA               NA        86.8232%              22258
Error Rate:              74.0000%              74         4.9852%               1278
Sub Rate:                74.0000%              74         0.8894%                228
Del Rate:                 0.0000%               0         0.0000%                  0
Ins Rate:                42.0000%              42         4.0958%               1050
N Rate:                 100.0000%             100         8.1916%               2100

As a test I then tried

bbmap.sh overwrite=t ref=Z.fasta in=R2_001_val_2.100.fq minid=0.25 mappedonly=true out=out.sam  outu=unaligned_reads.fa maxindel=100000

Here read2 reads were aligned.

Any ideas what is wrong with the first command line for the paired-end reads?

Thanks

Mark

bbmap align paired-end • 3.8k views
ADD COMMENT
0
Entering edit mode

What is the output of:

samtools view -f 0x0040 out.sam | wc -l

and

samtools view -f 0x0080 out.sam | wc -l
ADD REPLY
0
Entering edit mode
[rosema1@demeter trunk]$ samtools view -f 0x0040 out.sam | wc -l
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "out.sam".
0
[rosema1@demeter trunk]$ samtools view -f 0x0080 out.sam | wc -l
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "out.sam".
0

I also tried

samtools view -o out.bam out.sam
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "out.sam".

I just noticed that the sam file has as many lines as it should given paired-end reads and that there are 2 lines for every read ID but they are both labeled, for instance, "M00645:16:000000000-B666D:1:1101:18619:5889 1:N:0:48", though they are different sequences. Going back to the fastq files I find the sequence for read1 in the read1 fastq file ( with, of course the ID above). When I look for read2 in the read2 file, I find the base ID above and the sequence is as it is in the sam file except it is reverse complemented.

ADD REPLY
0
Entering edit mode

It is as I suspected, bbmap removes the trailing /1 and /2, but keeps track of first read and second read using sam flags. However, I don't know why the samtools command is failing, which version of samtools are you using?

ADD REPLY
0
Entering edit mode

I checked the FLAG and you are indeed correct, one is read1 and the other read2 despite their QNAMEs. So I guess all is fine, if unexpected, in regard to the representation of paired-ends. Thanks for your help.

As for samtools, I'm using version 0.1.19-44428cd.

ADD REPLY
0
Entering edit mode

You really need to upgrade to the latest samtools (now v.1.7). That version is ancient and can only lead to other troubles.

ADD REPLY
0
Entering edit mode

mark.rose : If you have samtools available in your $PATH you can directly create bam files by specifying out=myfile.bam when running bbmap.

ADD REPLY
0
Entering edit mode

Thanks for the tip

ADD REPLY
0
Entering edit mode

Not that it matters for the original question, but with .sam files, it should be samtools view -S -f 0x0040 out.sam, or has that changed in a recent version?

ADD REPLY

Login before adding your answer.

Traffic: 2785 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6