Biostar Beta. Not for public use.
How to align Trimmomatic unpaired reads with BWA?
0
Entering edit mode
2.5 years ago
mcff23 • 60

Hi everyone!

I have filtered the adapters from my Illumina PE reads with _Trimmomatic_. This was the output (as I expected): sample.R1.trimmed.fastq , sample.R2.trimmed.fastq , sample.R1.unpaired.fastq and sample.R2.unpaired.fastq.

Then I aligned the trimmed.fastq pair with BWA just fine. But when I tried to align the unpaired reads I got this:

[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (4, 1, 1, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[mem_sam_pe] paired reads have different names: "HWI-1KL178:67:HAE0RADXX:1:1101:2363:2000", "HWI-1KL178:67:HAE0RADXX:1:1101:11567:2000"

This is the command line:

bwa/bin/bwa mem -aM -t 6 ${REF_BWA_INDEX}/genome.fa ${SAMPLE}.R1.unpaired.fastq ${SAMPLE}.R2.unpaired.fastq > ${i}.sam

My goal is to align trimmed and unpaired files separately because BWA do not support them together.

Thanks in advance!

Monica

ADD COMMENTlink
5
Entering edit mode
9 weeks ago
University Park, USA

Run each unpaired data separately.

bwa/bin/bwa mem -aM -t 6 ${REF_BWA_INDEX}/genome.fa ${SAMPLE}.R1.unpaired.fastq >R1.unpaired.sam

...

Be careful with combining paired and unpaired data.

Information gleaned from a read pair usually cannot (should not) be combined with that obtained from two unpaired reads. That is because a paired read provides measurements from the same DNA fragment that is measured (sequenced) twice, whereas unpaired reads measure different DNA fragments.

ADD COMMENTlink
2
Entering edit mode

Just a note that the latest bwa-mem supports this:

(seqtk mergepe sample.R?.trimmed.fastq; cat sample.R?.unpaired.fastq) | bwa mem -p -

i.e., you can merge paired and unpaired reads in one stream, as long as paired reads are next to each other.

ADD REPLYlink
0
Entering edit mode

Thanks Istvan for your quick response!

I am kind of lost. My main goal here is to call variants, what do yo suggest me to do with these unpaired files once I aligned them separately? I was going to merge them with the trimmed ones and then call the variants...

Do I have to take them into account or I should only use the trimmed ones?

Thanks!

Monica

ADD REPLYlink
2
Entering edit mode

check the documentation of the variant caller for information on whether it handles mixed content. We usually discard the unpaired reads to keep things simple but typically these are no more than a few percent of data - won't actually affect the results.

ADD REPLYlink
0
Entering edit mode

Hi Istvan,

Would you please give a general number for "a few percent"? I filtered out 8% unpaired reads. Will this amount of data loss affect the downstream analysis?

Thank you!

ADD REPLYlink
0
Entering edit mode

8% is not all that much but then it all depends how much data do you have left. The general rule is that it is best to get rid of bad data than to try to salvage it. in my opinion better data even if it is fewer is more desirable than salvaged data.

That is because errors rarely come isolated - we may think that we were able fix all that by trimming off the bad bases but perhaps there were more reasons that drove those errors in some regions of the flowcell and even the data that looks reliable is not.

ADD REPLYlink
0
Entering edit mode
14 months ago

If your unpaired reads are being generated by Trimmomatic's pallindromic mode (i.e. If forward and reverse reads end up containing the same sequence after trimming adapters), try using the "keepBothReads" function of ILLUMINACLIP

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1