Convert 10gb .bam file into smaller 1gb .bam files
2
1
Entering edit mode
9.5 years ago
kavitarege ▴ 10

Dear All,

This is my first post on this site.I have different bam file size of 10gb, 50gb, 200gb, ... 600gb. I want to split these .bam files into 1gb file each(for ex. 10gb .bam file split into 10 1gb bam file) and later I want to insert into database all these chunks. I am using threading to insert these files, If I try to insert the whole 10gb data, it fills up the memory and hangs, the size of RAM is 8gb. Kindly help.

genome split bam • 4.7k views
ADD COMMENT
7
Entering edit mode

why on earth do you want to insert a BAM in a database?

ADD REPLY
1
Entering edit mode
9.5 years ago

It's a bad idea to split the files and put them in a database.

But if you still want to, you should use "samtools view"

That will give the appropriate header information to maintain a valid BAM file.

Viewing on a sorted, indexed file is rapid by genomic coordinate, so you can extract chr1 from base 1 to 1 Megabase more easily than byte 1 to 1Gigabyte.

ADD COMMENT
1
Entering edit mode

Personally, I'd just write a small program to iterate through a BAM file, writing reads to a new file as I go. It'd be easy enough to just monitor the output file size every X number of added reads and close it/open a new file when it starts getting close to the ideal file size.

Having said that, you and Pierre are absolutely correct. The whole thing is a bad idea.

ADD REPLY
1
Entering edit mode
9.5 years ago
Renesh ★ 2.2k
  1. Convert bam to sam
  2. Count the number of lines and divide it by 10 (take integer value and say its num)
  3. split file into 10 files

    split -l num file.sam
    

this will split in 10 files and convert these files to bam. You can also use -b option with split command to split files based on size in bytes.

ADD COMMENT
0
Entering edit mode

Thank you all for the suggestions,Since it is demand of my project, I have to do it. I am using pysam package to extract required data and insert into database, with 1gb data it works fine. I will try this idea to split the sam file into number of bytes. and convert back each into bam file

ADD REPLY

Login before adding your answer.

Traffic: 1697 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6