PEAR seems to have sense to me
0
0
Entering edit mode
5 weeks ago
ManuelDB ▴ 80

I have conducted a simulation to determine the optimal tool for merging paired-end reads. Currently, I am evaluating PEAR.

The procedure involves generating two FASTQ files, each containing 153-base reads. These reads share identical sequences in the 50-base segments at both ends, while the three central bases are varied to facilitate the simulation. Specifically, for the R1 reads, the middle three bases are GGG, and for the R2 reads, they are AAA, as illustrated below.

 (RACP) [mdb1c20@cyan51 combinatory_model]$ head simulated_R1_dif.fastq
@read_1 R1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII!!!IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@read_2 R1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII!!"IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@read_3 R1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
(RACP) [mdb1c20@cyan51 combinatory_model]$ head simulated_R2_dif.fastq
@read_1 R2
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII!!!IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@read_2 R2
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII"!!IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@read_3 R2
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

In all R1 and R2 reads, the sequences are consistent as previously described. I employ a combinatorial method to generate all possible quality scores for these reads. The objective of this simulation is to observe the outcome of the reads merged using PEAR. Based on my understanding, I anticipate that in the merged reads, the base selected should be the one from either R1 or R2 that possesses the highest quality score. Importantly, I expect this process to result in the retention of the original bases from R1 or R2 based on their quality scores, rather than the introduction of a new variant. This expectation aligns with the principle that merged read quality should reflect the highest confidence base calls from the pair, without generating novel base variations not present in the original reads.

 (RACP) [mdb1c20@cyan51 combinatory_model]$ head ./pear_data/pear_output.fastq.assembled.fastq
@read_1 R1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII!!!IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@read_2 R1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII!!"IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@read_3 R1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Why do I get T in the merged read when in R1 I have G and in R2 I have A?? I don't understand...

The code used for PEAR

 pear \
-f ./simulated_R1_dif.fastq \
-r ./simulated_R2_dif.fastq \
-o ./pear_data/pear_output.fastq

NGmerge seems to do what is expected

       R1 Base-Quality (Base 1) R2 Base-Quality (Base 1) Merged Base-Quality (Base 1)  ... R1 Base-Quality (Base 3) R2 Base-Quality (Base 3) Merged Base-Quality (Base 3)
0                          G-0                      A-0                          G-0  ...                      G-0                      A-0                          G-0
1                          G-0                      A-1                          G-0  ...                      G-1                      A-0                          G-1
2                          G-0                      A-2                          G-0  ...                      G-2                      A-0                          G-2
3                          G-0                      A-3                          G-0  ...                      G-3                      A-0                          G-3
4                          G-0                      A-4                          G-0  ...                      G-4                      A-0                          G-4
...                        ...                      ...                          ...  ...                      ...                      ...                          ...
68916                     G-40                     A-36                         G-40  ...                     G-36                     A-40                         G-36
68917                     G-40                     A-37                         G-40  ...                     G-37                     A-40                         G-37
68918                     G-40                     A-38                         G-40  ...                     G-38                     A-40                         G-38
68919                     G-40                     A-39                         G-40  ...                     G-39                     A-40                         G-39
68920                     G-40                     A-40                         G-40  ...                     G-40                     A-40                         G-40
PEAR • 214 views
ADD COMMENT
0
Entering edit mode

Why do I get T in the merged read when in R1 I have G and in R2 I have A?? I don't understand...

Is PEAR allowing one error in the overlap?

ADD REPLY

Login before adding your answer.

Traffic: 1479 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6