Split BAM into 1 BAM per contig
1
0
Entering edit mode
8.8 years ago
Marvin ▴ 220

Hi,

I have mapped reads against a reference with bwa and got a big bam file now.

I would like to retrieve a separate bam file for every contig.

I do not mean "I want 1 bam for every chromosome". A chromosome is not a contig since my coverage is 0 in many parts of the genome (aDNA reads).

In this situation e.g. (I hope he doesn't mess up when I post this):

--------------------------------------
  ---   ---               ---   ---
    ----                    -----

I would like to get 2 bam files.

How do I do that?

sam contig • 3.3k views
ADD COMMENT
2
Entering edit mode
8.8 years ago

Use genomecov (http://bedtools.readthedocs.org/en/latest/content/tools/genomecov.html) to get a bed with >0 coverage. Then, loop over the bed to extract the reads.

awk '{printf("%s:%s-%s\n",$1,$2,$3);}' cov.bed | while read F; do samtools view -o "${F}.bam" -b your.bam "${F}"; done
ADD COMMENT
0
Entering edit mode

Thank you! Don't know how to use that bedtools genomecov but I will find out.

ADD REPLY
0
Entering edit mode

I'm struggling, how do I achieve this?

ADD REPLY
0
Entering edit mode

I just can't figure it out. The problem is that he opens a new bed line as soon as the coverage within a contig changes. But what I need is 1 line per contig. How is that possible with a single genomecov command? I know how to do it with a pipe but from your answer it sounds like it could be done immediately?

ADD REPLY
0
Entering edit mode

Oh my god I finally made it.

Thank you for helping me alot by pointing me towards the right direction (bedtools is the right choice [didn't know that program before], then awk with samtools as posted by you). However that genomecov suggestion gave me struggles because it didn't get me what I wanted ... all I need is:

bedtools merge -i aln_sorted.bam

This gives me 1 line per contig. Afterwards I can apply the awk-samtools-pipe that you have posted :)

Thanks again!

ADD REPLY
0
Entering edit mode

Can you share the code you used and the files you used as input? I would also like to obtain individual bam files per contig. I tried this process posted here but I am getting zero output files after running the awk command (without genomecov).

I instead used bedtools merge -i aln_sorted.bam > mycontigs.txt then used the awk command and instead of cov.bed I used mycontigs.txt

Do I need a bed file at all for this?

ADD REPLY

Login before adding your answer.

Traffic: 2393 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6