Question: Split Bam Files In Galaxy
3
Entering edit mode

Hey Guys, I'm quite new to bioinformatics and i've been struggling to work with large bam files (80Gb) for the last few weeks. Can anyone tell me if it is possible to split these bam files into smaller ones by chromosomal location in Galaxy?

ADD COMMENTlink 7.9 years ago Leandro Batista • 100 • updated 7.9 years ago Michael Dondrup 46k
Entering edit mode
2

"samtools view" can easily extract the alignments for specific regions and convert them back to BAM, but I don't know how to do it in Galaxy.

ADD REPLYlink 7.9 years ago
Ketil
3.9k
2
Entering edit mode

It might be possible in principle, but therefore you'd have to install Galaxy locally because the public instance will never allow you to upload 80G. Given that this would be more complex than to install Samtools and you might have to install Samtools in addition anyway, I'd go with Ketils advice.

Edit: please see comments below, it is possible to use 250GB. Unfortunately, I didn't find a tool that would allow to run samtools view with the given parameters to filter by chromosome, it might be possible somehow (e.g. by installing a new tool, or building a workflow bam2sam-> filter data -> sam2bam, would be very inefficient and use up 250 GB ) but I didn't find how that would work.

ADD COMMENTlink 7.9 years ago Michael Dondrup 46k
Entering edit mode
0

Thanks for your answer. I'm getting my genome alignment from ftp and when I use the command samtools view -bh ftp://ftp-mouse.sanger.ac.uk/current_bams/PWK_PhJ.bam chr11 it gives the error: segmentation fault. But if I just try to print the same sequence, it works! Do you know why i'm having this problem?

ADD REPLYlink 7.9 years ago
Leandro Batista
• 100
Entering edit mode
0

Did you try to download the file first, then run samtools locally? How much memory does your computer have?

ADD REPLYlink 7.9 years ago
Michael Dondrup
46k
Entering edit mode
0

500 Gb. No I didn't download it because I thought it would be simpler this way and then I would have to download each of the 17 genomes available on the ftp. but if this is the problem, I can do it!

ADD REPLYlink 7.9 years ago
Leandro Batista
• 100
Entering edit mode
0

80GB files can be loaded into the public Galaxy. You will have to use FTP, but it can be done. The quota for registered users on usegalaxy.org is 250GB (see http://wiki.g2.bx.psu.edu/Main#User_data_and_job_quotas). Of course, you may run out of space pretty quickly, once you are into your analysis.

ADD REPLYlink 7.9 years ago
Dave Clements
• 610
Entering edit mode
0

Sorry I didn't know that!

ADD REPLYlink 7.9 years ago
Michael Dondrup
46k
Entering edit mode
0

But still I don't see how you could call samtools view with these options from inside galaxy maybe it is possible somehow, but I don't see it.

ADD REPLYlink 7.9 years ago
Michael Dondrup
46k
Entering edit mode
0

Leandro, cool machine with half a terabyte of RAM, I mean that really should be enough. I would just try to download and run samtools again, I don't know if that will help at all, but it's worth a try. Btw, which version of samtools are you using?

ADD REPLYlink 7.9 years ago
Michael Dondrup
46k
Entering edit mode
0

Thanks Dave, but even by loading to Galaxy, like Michael said, I don't know how it is possible to do that in Galaxy either. And it is true that I would run out of space. Michael, i'm using the last one, 0.1.18 right? i'm sorry for all these questions but in my lab none has ever analyzed NGS data and I'm starting it with no specific bioinformatics/programming background.

ADD REPLYlink 7.9 years ago
Leandro Batista
• 100
Entering edit mode
0

Leandro, please open a new question where you describe your samtools problem with an 80GB BAM file, and/or send an email to samtools-help@lists.sourceforge.net including exact command, command version, link to the data, output, the output of 'uname -a', and your machine memory specs. Maybe the authors of samtools have more insight.

ADD REPLYlink 7.9 years ago
Michael Dondrup
46k

Login before adding your answer.

Powered by the version 1.8