Extract R1 and R2 bam files from merged bam file
1
0
Entering edit mode
5.8 years ago
c_u ▴ 520

Hi,

I started with R1 and R2 fastq files, and using a pipeline (https://github.com/ArimaGenomics/mapping_pipeline/blob/master/Arima_Mapping_UserGuide.pdf), I combined them to give a merged bam file (it also does other things like filtering for mapping quality, adding read groups and remove PCR duplicates).

Now, the files are from a HiC experiment, and I want to analyze them using HicPro, but HicPro cannot work with merged bam files, it needs separate bam files for R1 and R2. So, I wanted to know if there is a way to unmerge the merged bam file to the corresponding R1 and R2 bam files. Trying to search online I mostly found ways to convert the bam file back to fastq files (which I could do, and then again do fastq to bam, but that seems unintelligent).

Any help would be great. Suggestions for improvement are welcome.

RNA-Seq samtools • 5.6k views
ADD COMMENT
1
Entering edit mode

use samtools view with flags first in pair or second in pair . see How To Know From Which File ( R1 Or R2 ) A Read Is Coming From Based On Sam Output

ADD REPLY
4
Entering edit mode
5.8 years ago
samtools view -hbf 64 mydata.bam > R1.bam
samtools view -hbf 128 mydata.bam > R2.bam
ADD COMMENT
0
Entering edit mode

Thanks a lot for this!!

ADD REPLY
0
Entering edit mode

Hi,

I tried the command you mentioned and it resulted in the R1 and R2 bam files. But, when I try to do HiC analysis using them, a code in the pipeline that is supposed to merge the 2 bam files gives the error -

## Merging forward and reverse tags ... Forward and reverse reads not paired. Check that BAM files have the same read names and are sorted.

In other words, the 2 files (R1 and R2) are not paired. I tried to sort both of them individually, but got the same error ( I also tried to look at the files after sorting and the lines on them did not match). Can you suggest something that could be done so that the R1,R2 files come out paired?

ADD REPLY
0
Entering edit mode

If the orignal fastqs contain every single read, and the bams contain each and every read once and only once, sorting by name should line things up.

ADD REPLY

Login before adding your answer.

Traffic: 2533 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6