Large BAM file to GRanges object and then save as RData
1
2
Entering edit mode
10.0 years ago

Hi all,

I have several BAM files which I would like to work in R environment. This are big files with more than 100M reads each. Is there any way to convert them in GRanges object and then save it as RData compressed file?

So far what I've tried is the following R code:

library(GenomicRanges)
library(Rsamtools)

args = commandArgs(TRUE)
filename <- args[1]
name <- args[2]

## Create GRanges object from the BAM file and save it to disk
param <- ScanBamParam(what=c("qname","flag"))
b <- readGAlignments(filename, format="BAM", param=param);
save(b,file=paste(name,".RData",sep=""), compression_level=9);

This code gave me error messages because R is not able to allocate vectors more than 800 ~ 1000 Mb size, which I understand, but is there any way to create GRanges object of large BAM files?

Thanks for your help!!

GRanges RData R BAM • 5.7k views
ADD COMMENT
4
Entering edit mode
10.0 years ago
Michael 54k

It works, you just need a lot of memory and coffee. I have tested with a 21GB BAM file containing ~400Million aligned reads. The process grows to ~70GB in size and the loading takes several hours. Even though it works in principle, it might be a good idea to a basic summarization outside of R, e.g. read counting or coverage calculation. To save a little space you could maybe ommit the read names. Another possibility might be to split the input by chromosome, but then again, maybe R is not the the optimal software for this volume.

ADD COMMENT
0
Entering edit mode

Thanks for your answer Michael,

I know that R is not the best environment for this but the purpose of my analysis need to pass throught this step. Just one question: when you load your 21GB BAM file did you vary your code compare to mine? I want to know if I'm missing something, thanks!!

ADD REPLY

Login before adding your answer.

Traffic: 2744 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6