bwa mem skip orientation when aligning a small subset of reads
1
0
Entering edit mode
9 months ago
Yihang • 0

For pair-end read files, read1.fastq, read2.fastq, I use the following command cat read1.fastq | head -n 4000 > read1_first1000.fastq (and cat read2.fastq | head -n 4000 > read2_first1000.fastq) to extract first 1000 reads, and align with bwa mem hg19.fa read1_first1000.fastq read2_first1000.fastq. I find that all four orientations, FF, FR, RF, RR are skipped because there are not enough pairs, and bwa will map these reads without pair-end. Therefore, the same read pair will be mapped to a different place as when I align with bwa mem hg19.fa read1.fastq read2.fastq.

If I use the command cat read1.fastq | head -n 400000 > read1_first100000.fastq (and cat read2.fastq | head -n 400000 > read2_first100000.fastq) to extract first 100000 reads and align them again, bwa will detect the orientation FR and then the results will be the same as aligning with bwa mem hg19.fa read1.fastq read2.fastq. (for the first 100000 reads).

Therefore, I am curious about the following questions:

  1. How does bwa compute unique pairs for (FF, FR, RF, RR)?
  2. How many candidate unique read pairs would be enough for detecting each orientation? Is this related to the parameter -K (fixed chunk size)?
  3. I find that usually, for a batch of e.g. 100000 read pairs, only around half of them will become candidate unique pairs for the orientations. If e.g., FR has significantly more candidate read pairs than other orientations, will bwa assume all reads (rather than only candidate read pairs of FR) have the orientation FR and map them accordingly?

Thanks!

fastq alignment bwa • 454 views
ADD COMMENT
1
Entering edit mode
9 months ago
Asaf 10k

Yes, it does guess the orientation and the average insert size from the data. It is hard-coded and cannot be set using parameters. The minimal number of pairs of each orientation to be tested is 10 but they can fail other steps of filtering. If you share your log (look for messages such as skip orientation ... or analyzing insert size distribution for orientation ...) we could tell you more.

ADD COMMENT

Login before adding your answer.

Traffic: 1739 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6