Split Bam Files In Galaxy
1
3
Entering edit mode
12.2 years ago

Hey Guys, I'm quite new to bioinformatics and i've been struggling to work with large bam files (80Gb) for the last few weeks. Can anyone tell me if it is possible to split these bam files into smaller ones by chromosomal location in Galaxy?

bam galaxy • 4.1k views
ADD COMMENT
2
Entering edit mode

"samtools view" can easily extract the alignments for specific regions and convert them back to BAM, but I don't know how to do it in Galaxy.

ADD REPLY
2
Entering edit mode
12.2 years ago
Michael 54k

It might be possible in principle, but therefore you'd have to install Galaxy locally because the public instance will never allow you to upload 80G. Given that this would be more complex than to install Samtools and you might have to install Samtools in addition anyway, I'd go with Ketils advice.

Edit: please see comments below, it is possible to use 250GB. Unfortunately, I didn't find a tool that would allow to run samtools view with the given parameters to filter by chromosome, it might be possible somehow (e.g. by installing a new tool, or building a workflow bam2sam-> filter data -> sam2bam, would be very inefficient and use up 250 GB ) but I didn't find how that would work.

ADD COMMENT
0
Entering edit mode

Thanks for your answer.

I'm getting my genome alignment from ftp and when I use the command

samtools view -bh ftp://ftp-mouse.sanger.ac.uk/current_bams/PWK_PhJ.bam chr11

it gives the error: segmentation fault. But if I just try to print the same sequence, it works!

Do you know why i'm having this problem?

ADD REPLY
0
Entering edit mode

Did you try to download the file first, then run samtools locally? How much memory does your computer have?

ADD REPLY
0
Entering edit mode

500 Gb. No I didn't download it because I thought it would be simpler this way and then I would have to download each of the 17 genomes available on the ftp. but if this is the problem, I can do it!

ADD REPLY
0
Entering edit mode

80GB files can be loaded into the public Galaxy. You will have to use FTP, but it can be done. The quota for registered users on usegalaxy.org is 250GB (see http://wiki.g2.bx.psu.edu/Main#User_data_and_job_quotas). Of course, you may run out of space pretty quickly, once you are into your analysis.

ADD REPLY
0
Entering edit mode

Sorry I didn't know that!

ADD REPLY
0
Entering edit mode

But still I don't see how you could call samtools view with these options from inside galaxy maybe it is possible somehow, but I don't see it.

ADD REPLY
0
Entering edit mode

Leandro, cool machine with half a terabyte of RAM, I mean that really should be enough. I would just try to download and run samtools again, I don't know if that will help at all, but it's worth a try. Btw, which version of samtools are you using?

ADD REPLY
0
Entering edit mode

Thanks Dave, but even by loading to Galaxy, like Michael said, I don't know how it is possible to do that in Galaxy either. And it is true that I would run out of space. Michael, i'm using the last one, 0.1.18 right? i'm sorry for all these questions but in my lab none has ever analyzed NGS data and I'm starting it with no specific bioinformatics/programming background.

ADD REPLY
0
Entering edit mode

Leandro, please open a new question where you describe your samtools problem with an 80GB BAM file, and/or send an email to samtools-help@lists.sourceforge.net including exact command, command version, link to the data, output, the output of 'uname -a', and your machine memory specs. Maybe the authors of samtools have more insight.

ADD REPLY

Login before adding your answer.

Traffic: 2701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6