Question

Paired-end reads merge tool: Multiple @ lines in merged output of FLASH tool

0

Entering edit mode

5.1 years ago

Ankit ▴ 500

Hi everyone,

I have a query regarding merging paired-end read files. I am using FLASH for merging data. I ran flash as follows:

./flash sample_rep1_R1.fastq sample_rep1_R2.fastq -m 5 -t 5 -o sample_merge 2>&1 | tee flash.log

In sample_merge.extendedFrags.fastq I noticed some lines with multiple @ and quality score. For example,

> > <B/B-:@@D@D@:-:@-D@-:@D@@D-DDD@D@:@---D--::D:DDD BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFDDDDDDDDDDDDD#::DDDDDDDD@@DDDDDDDDDDDDDDDDDDDDDDDDF#FFFFFFFFFFFFFFFFF<<
> BBBBBF/BFFFBFFFFFDDDD:DDDDDDDDDDDDDDDDD@D:D@DDDDDDD:@D::@@DD@D-D@DDDDDFDB@FFFDF@:::
> BBBBBFFFFFDDD-D@DDDD@-D@-:D:@DDDDD-@DDDDDDD@:D@D-@D-D-@-D-5D@D@FFFFFFFFFF<<<-:7:
> BB@@@DF<FFF<DDDDDDDDDDFDDDDFFFFFDFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<DDDDDDFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBB
> BBBBBFFFFBFFFFB/FFFFFFFF<FFFFFFFFFFFFFFFFFFFFFB<FBFFFBFFBFBBFFFFBD@DDDDDDDD@D#::
> BB@@@DDDDDDDDDD@DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDFFFFF#FFFFFFFF#F#FFFFFFDDF@DD@D@DDDDDDDDDDDDDDDDDDDDDDDDDDDD@DDDDDDDDDDDDFBB

While the unmerged files: sample.notCombined_1.fastq and sample.notCombined_2.fastq does not have these lines.

I am wondering if these multi @ lines in extendedFrags.fastq are normal or are related to the parameter I have chosen.

My reads are 125X2

It would very help if someone can guide me.

Thanks

Ankit

flash paired-end read merge fastq • 2.5k views

ADD COMMENT • link updated 5.1 years ago by gb ★ 2.2k • written 5.1 years ago by Ankit ▴ 500

score 1 · Answer 1 · 2019-03-29

Not fully sure if I understand you. But that "@" character stands for a certain quality score, is this case 64 (https://www.drive5.com/usearch/manual/quality_score.html). If you look up the merged read in sample_rep1_R1.fastq and sample_rep1_R2.fastq you will see those "@" characters in the line after the line starting with a "+".

What could be the cause why you don't see them often in the non-merged files is that FLASH does not merge if the --max-mismatch-density exceeds. It does not merge if there are to many mismatches. It can be that there are mismatches because the sequencer read the basepair wrong. And mostly those wrong basepairs have a lower quality score. The "@" character stand for a relatively higher score.

You should check how to interpret fastq files. FLASH will choose the basepair with highest quality score.