Biostar Beta. Not for public use.
0
Entering edit mode
23 months ago
snishtala03 • 10
@snishtala0337463

Hello,

I have some paired end (2x150 bps) RNA-Seq reads from MiSeq for a viral genome. I need to merge the reads for a downstream analysis.(Also, since I noticed that when I merge my reads, there are a lot of reads which have a high overlap rate, merging them makes sense) -

bbmerge.sh in1=R1.fastq in2=R2.fastq out=merged.fastq outu1=R1_unmerged.fastq outu2=R2_unmerged.fastq


Here is the terminal output of bbmerge I get -

Pairs:                  3328768
Joined:                 2925342         87.881%
Ambiguous:              370409          11.128%
No Solution:            33017           0.992%
Too Short:              0               0.000%
Avg Insert:             176.0
Standard Deviation:     44.0
Mode:                   147

Insert range:           35 - 293
90th percentile:        243
75th percentile:        204
50th percentile:        167
25th percentile:        142
10th percentile:        126


Now, I use bwa to align to my reference genome allowing secondary alignments and there are a lot of cases where a read does align to multiple regions on the genome. When I was going over the alignments, I found some strange behaviour of the merged reads Where I see -

@M02091:32:000000000-C28N4:1:1106:22793:14654 1:N:0:7
GTCTTTGGGTATACATTTGAACCCTAATAAAACCAAACGTTGGGGCTACTCCCTTAACTTCATGGGATATGTAATTGGAAGTTGGGGTACTTTACCACAGGAACATATTGTAATGAAACTCAAGCAATGTTTTCGGAAACTGCCTGTAAAT
+
DCEEEFFFEBFFGGGGGGGGGGHGHHHHHHHHHGHHHHGHHHHGGGGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGGGGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGGGGGHHHHHHHHHHHH

@M02091:32:000000000-C28N4:1:1106:22793:14654 2:N:0:7
AAAGAATTGTGGGTCTTTTGGGCTTTGCTGCCCCTTTTACACAATGTGGCTATCCTGCTTTGACAGACTTTCCAATCAATAGGTCTATTTACAGGCAGTTTCCGAAAACATTGCTTGAGTTTCATTACAATATGTTCCTGTGGTAAAGTAC
+
CCDDDFFFFFFCGGGGGGGGGGHHHHHHHHHHHGHHHHHHHHHGHHGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHGHHHHHGGGGGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHF

AAAGAATTGTGGGTCTTTTGGGCTTTGCTGCCCCTTTTACACAATGTGGCTATCCTGCTTTGA

I tried using alternative merging softwares like vsearch and flash as well to compare my results and interestingly, using both flash and vsearch, I see this this read pair to be merged correctly (see below) but a similar case comes up with a different example -
AAAGAATTGTGGGTCTTTTGGGCTTTGCTGCCCCTTTTACACAATGTGGCTATCCTGCTTTGACAGACTTTCCAATCAATAGGTCTATTTACAGGCAGTTTCCGAAAACATTGCTTGAGTTTCATTACAATATGTTCCTGTGGTAAAGTACCCCAACTTTCAATTACATAACCCATGAAGTTAAGGGAGTAGCCCCAACGTTTGGTTTTATTAGGGTTCAAATGTATACCCAAAGAC

My command line for v search is -
vsearch --fastq_mergepairs R1.fastq --reverse R2.fastq --eetabbedout error_stats --fastqout merged.fastq --fastqout_notmerged_fwd fw_unmerged.fastq --fastqout_notmerged_rev rev_unmerged.fastq

My command line for flash is -
flash R1.fastq R2.fastq -M 151

2. This question is not about merging but more about the nature of my reads. As you can see from the example above, my R1 undergoes reverse complement which shows that for the R1.fastq and R2.fastq files have a mix of forward and reverse reads. Is there a way I can solve this and put all R1 reads in one file and R2 reads in the other file. I am trying to remove duplicates after I align my reads, and this is causing problem as it prevents reads from being deduplicated correctly.
bbmerge fastq alignment flash vsearch • 194 views
1
Entering edit mode
23 months ago
h.mon 25k
@h.mon6093

Which version of BBTools are you using? I just tested the sequence you provided as example and it bbmerge.sh (BBTools 38.43) merged the pairs correctly:

@M02091:32:000000000-C28N4:1:1106:22793:14654 1:N:0:7
GTCTTTGGGTATACATTTGAACCCTAATAAAACCAAACGTTGGGGCTACTCCCTTAACTTCATGGGATATGTAATTGGAAGTTGGGGTACTTTACCACAGGAACATATTGTAATGAAACTCAAGCAATGTTTTCGGAAACTGCCTGTAAATAGACCTATTGATTGGAAAGTCTGTCAAAGCAGGATAGCCACATTGTGTAAAAGGGGCAGCAAAGCCCAAAAGACCCACAATTCTTT
+
DCEEEFFFEBFFGGGGGGGGGGHGHHHHHHHHHGHHHHGHHHHGGGGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHHGHHHHHHHHHGHHHHHHHHHHHGGGGGGGGGGCFFFFFFDDDCC


In addition, the merged read you showed as example has a very strange substitution at position 160. At this position the reads do not overlap, so the consensus should correspond to read 1. However, there is a T at the consensus read, while it is a C at the original read 1. Is the example you showed from vsearch or flash? Does any of them perform some form of error correction?

0
Entering edit mode

Thank you for your response. I was using an older version, I updated my version to the current one - 38.44 and I get correctly merged reads!

I used vsearch for that example, I think they do account for errors, not sure though.

0
Entering edit mode

Which older version of BBTools? It would be interesting to know the version affected.

Similar Posts
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.