Is "paired reads have different names" error due to ! at beginning of line in fastq
1
0
Entering edit mode
9.2 years ago

I'm running a perl script (clipPairedEnd.pl) which uses cutadapt to trim Illumina adapters from paired-end fastq files. I then use bwa aln, bwa sampe, and samtools view to generate aln.bam, this bam file has 248 lines. When I use the same process on the uncut fastq files I get 5M lines in the bam file. After some digging in my log files I found this

[bwa_sai2sam_pe_core] print alignments... [bwa_sai2sam_pe_core] paired reads have different names: "HWI-ST1293:246:HFG23ADXX:1:1101:9277:1898", "HWI-ST1293:246:HFG23ADXX:1:1101:9432:1843"

When I try to find this position in the fastq files (pre and post adapter cut) here is what I see

less R1.fastq

495 +
496 #1=DDFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJFHIJHH>GIIIIJJIJJIGHHCEHFFFDBDFEEDDBB##############################################################################
497 @HWI-ST1293:246:HFG23ADXX:1:1101:9277:1898 1:N:0:TCGCAGG
498 GGCTTTCCGGGTGTGTGTTTAAATTTTTTTTCTATTTAATAATGTTTTTTATTTGTGTTGTAGAATGCCAGAGGACTTGGATCTGAGCTAAAGGACAGTATTCCAGTTACTGAACTAGATCGGAAGAGCACACGTCTGAACTCCAGTCACT
less R2.fastq

495 +
496 #######################################################################################################################################################
497 @HWI-ST1293:246:HFG23ADXX:1:1101:9277:1898 2:N:0:TCGCAGG
498 AGTTCAGTAACTGGAATACTGTCCTTTAGCTCAGATCCAAGTCCTCTGGCATTCTACAACACAAATAAAAAACATTATTAAATAGAAAAAAAATTTAAACACACACCCGGAAAGCCAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAG

Adapter trimmed fastq

less R1.fastq

495 +
496 !
497 @HWI-ST1293:246:HFG23ADXX:1:1101:9277:1898 1:N:0:TCGCAGG
498 GGCTTTCCGGGTGTGTGTTTAAATTTTTTTTCTATTTAATAATGTTTTTTATTTGTGTTGTAGAATGCCAGAGGACTTGGATCTGAGCTAAAGGACAGTATTCCAGTTACTGAACT
less R2.fastq
495 +
496 #######################################################################################################################################################
497 @HWI-ST1293:246:HFG23ADXX:1:1101:9277:1898 2:N:0:TCGCAGG
498 AGTTCAGTAACTGGAATACTGTCCTTTAGCTCAGATCCAAGTCCTCTGGCATTCTACAACACAAATAAAAAACATTATTAAATAGAAAAAAAATTTAAACACACACCCGGAAAGCCAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAG

Can anyone tell me if the ! is causing "paired reads have different names" error message. If so any ideas on how to fix this? I find about 2000 lines that begin with ! in my adapter cut R1.fastq, none in R2.fastq?

Here is my trimming command

clipPairedEnd.pl -m1 read1.fastq -m2 read2.fastq -o1 R1.fastq -o2 R2.fastq -a1 AGATCGGAAGAGCACACGTCTGAACTCCAGTC -a2 TCTAGCCTTCTCGCAGCACATCC -s1 R1.stat -s2 R2.stat
alignment • 4.5k views
ADD COMMENT
0
Entering edit mode

Seeing lines 491-502 might be helpful for a little more context. There's nothing obviously wrong with the files from what you have posted, although that exclamation point was not an original quality score, and the reads were trimmed to different lengths, which is odd.

ADD REPLY
0
Entering edit mode

I just realized you asked for line 491-502, this seems like quite a few lines.

R1.FASTQ

499 +
500 CCCFFFDFHHHDFGHHHIIJJJJJIJJJJJJJJIIJJIJJJJJJIJJJJHHFFFFDFEEDEEEEDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDEECDEEDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDC
501 @HWI-ST1293:246:HFG23ADXX:1:1101:10485:1959 1:N:0:TCGCAGG
502 CATATGCATGGCCTGGCATTTCTAGAAGAGAACTACTCCCATCAGAATGCCAAGAAGATCGTGGCCACCCACCAGCTTCTTGGTGATGTGCAGAGAGTGATTGAGGTTCTGCATGGCCTGCAGCTCAAGATGAGCATCTTGCAGTAAGTGT

R2.fastq

499 +
500 CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHHHHHFFFFFFFEEEEEEDDDDDDDEDDDBDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD09?CDDDCDD@AC:CCD>
501 @HWI-ST1293:246:HFG23ADXX:1:1101:10485:1959 2:N:0:TCGCAGG
502 GCTTTCCAATTTCTCAGATTTACTCAGCCCCCAGACCATGCCAAACAGACTGCTCCCAGCACTGCAGGTGCCACACTTACTGCAAGATGCTCATCTTGAGCTGCAGGCCATGCAGAACCTCAATCACTCTCTGCACATCACCAAGAAGCTG

Trimmed fastqs

R1.fastq

499 +
500 CCCFFFDFHHHDFGHHHIIJJJJJIJJJJJJJJIIJJIJJJJJJIJJJJHHFFFFDFEEDEEEEDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDEECDEEDDDDDDDDDDD
501 @HWI-ST1293:246:HFG23ADXX:1:1101:10485:1959 1:N:0:TCGCAGG
502 CATATGCATGGCCTGGCATTTCTAGAAGAGAACTACTCCCATCAGAATGCCAAGAAGATCGTGGCCACCCACCAGCTTCTTGGTGATGTGCAGAGAGTGATTGAGGTTCTGCATGGCCTGCAGCTCAAGATGAGCATCTTGCAGTAAGTGT

R2.fastq

499 +
500 CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHHHHHFFFFFFFEEEEEEDDDDDDDEDDDBDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD09?CDDDCDD@AC:CCD>
501 @HWI-ST1293:246:HFG23ADXX:1:1101:10485:1959 2:N:0:TCGCAGG
502 GCTTTCCAATTTCTCAGATTTACTCAGCCCCCAGACCATGCCAAACAGACTGCTCCCAGCACTGCAGGTGCCACACTTACTGCAAGATGCTCATCTTGAGCTGCAGGCCATGCAGAACCTCAATCACTCTCTGCACATCACCAAGAAGCTG
ADD REPLY
0
Entering edit mode
9.0 years ago
mark.ziemann ★ 1.9k

Skewer works really well for simultaneous adapter clipping and quality trimming of paired-end data. Here is a blog post on it.

ADD COMMENT

Login before adding your answer.

Traffic: 1720 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6