Question: Trying To Pull Regions From An Online Bam File Without Downloading
3
Entering edit mode

I am trying to pull regions from a bunch of BAM files on an online server. I'd like to pull the reads mapping to a certain 1kb or so chunk and download them for analysis. They are far too massive to download them all, and it's impractical even to wget them one at a time and pull the regions out using samtools (tried it, and it worked, but it took forever). Since I'll have to do this for a number of regions that I won't know in advance, I need a better way.

I noticed that samtools is capable of running 'samtools view' off of a web address. Sadly, this data is protected behind an https server, which samtools doesn't know how to handle. I notice that IGV is able to read the BAM files of the net by asking for my login and querying specific regions only that I bring up to view, but I don't have a way of automating the process on hundreds of files.

Does anyone have any ideas of how to run something like samtools view on specific regions over an https connection?

ADD COMMENTlink 7.5 years ago Wjeck • 480
Entering edit mode
2

did you try to put your password in the url ? e.g: "https://userid:password@anywhere.org/bams/my.bam"

ADD REPLYlink 7.5 years ago
Pierre Lindenbaum
120k
Entering edit mode
1

This does work for me in my case, thanks Pierre

ADD REPLYlink 2.6 years ago
cmdcolin
♦ 1.2k
Entering edit mode
0

Doesn't seem to work for me. I am not sure that samtools recognizes https is a web address. The response I get is:

open: No such file or directory [main_samview] fail to open "https://uname: _*_ @website/my.bam" for reading.

(with my website, password etc, of course)

ADD REPLYlink 7.5 years ago
Wjeck
• 480
3
Entering edit mode

samtools, and other direct-BAM-access programs like IGV, are capable of opening local files as well as remotely and publicly served files. these remote locations include ftp and http protocols, but unfortunately do not include encrypted transfer protocols such as scp, ssh nor https. it's not a matter of authentication, which is solved on http and ftp, but of handling data encryption which is far more complicated. the only way you may work with all that BAM files you are interested in is either asking the server managers to open them through http, or either downloading them all and dealing with them locally.

ADD COMMENTlink 7.5 years ago Jorge Amigo 11k
Entering edit mode
1

Just a note, contrary to this post IGV does seem to be able to handle https and the associated encryption, but samtools does not.

ADD REPLYlink 7.5 years ago
Wjeck
• 480
Entering edit mode
0

good to know that. thanks for the information.

ADD REPLYlink 7.4 years ago
Jorge Amigo
11k
2
Entering edit mode

Possibly can be done through curl. In the following command, --negotiateoption enables SPNEGO in curl. The -u option is required but the user name is ignored. The -b and -c options are used to store and send HTTP cookies. The -s is to silence curlchucking out status. (Try typing the command as it is as the bam file exists)

curl --negotiate -u : -b ~/cookienumnumnum.txt -c ~/cookienumnumnum.txt -s http://gasv.googlecode.com/files/Example.bam | samtools view -h - | head

will give

@SQ    SN:chr17    LN:78774742
chr17_15_201_1:0:0_1:0:0_209a6    163    chr17    15    60    50M    =    152    187    GTTCCTGCATAGATAATTGCATGACAATTGCCTTGTCCCTCCTGAATGTG    22222222222222222222222222222222222222222222222222    XT:A:U    NM:i:1    SM:i:37    AM:i:23    X0:i:1    X1:i:0    XM:i:1    XO:i:0    XG:i:0    MD:Z:40G9
chr17_45_242_0:0:0_1:0:0_50f5f    99    chr17    45    60    50M    =    193    198    CCTTGTCCCTGCTGAATGTGCTCTGGGGTCTCTGGGGTCTCACCCACGAC    22222222222222222222222222222222222222222222222222    XT:A:U    NM:i:0    SM:i:37    AM:i:37    X0:i:1    X1:i:0    XM:i:0    XO:i:0    XG:i:0    MD:Z:50
chr17_123_290_3:0:0_1:0:0_27b3b    163    chr17    123    60    50M    =    241    168    ATAACAAACATATGTCCAGCGAATACCTGCATCCCTAGAAGTGAAGCGAC    22222222222222222222222222222222222222222222222222    XT:A:U    NM:i:3    SM:i:25    AM:i:25    X0:i:1    X1:i:0    XM:i:3    XO:i:0    XG:i:0    MD:Z:0T10C35C2
chr17_15_201_1:0:0_1:0:0_209a6    83    chr17    152    60    50M    =    15    -187    CATCCCTAGAAGTGAAGCCACCGCCCAAAGACACGCCCATATCCAGCTTA    22222222222222222222222222222222222222222222222222    XT:A:U    NM:i:1    SM:i:23    AM:i:23    X0:i:1    X1:i:1    XM:i:1    XO:i:0    XG:i:0    MD:Z:40G9
chr17_164_380_1:0:0_0:0:0_aa5e4    99    chr17    164    60    50M    =    331    217    TGAAGCCACCGCCCAATGACACGCCCATGTCCAGCTTAACCTGCATCCCT    22222222222222222222222222222222222222222222222222    XT:A:U    NM:i:1    SM:i:37    AM:i:37    X0:i:1    X1:i:0    XM:i:1    XO:i:0    XG:i:0    MD:Z:16A33
chr17_45_242_0:0:0_1:0:0_50f5f    147    chr17    193    60    50M    =    45    -198    TCCAGCTTAACCTGCATCCCTAGAAGGGAAGGCACCGCCCAAAGACACGC    22222222222222222222222222222222222222222222222222    XT:A:U    NM:i:1    SM:i:37    AM:i:37    X0:i:1    X1:i:0    XM:i:1    XO:i:0    XG:i:0    MD:Z:26T23
chr17_204_401_0:0:0_0:0:0_43ea1    99    chr17    204    60    50M    =    352    198    CTGCATCCCTAGAAGTGAAGGCACCGCCCAAAGACACGCCCATGTCCAGC    22222222222222222222222222222222222222222222222222    XT:A:U    NM:i:0    SM:i:23    AM:i:23    X0:i:1    X1:i:1    XM:i:0    XO:i:0    XG:i:0    MD:Z:50
chr17_224_415_1:0:0_1:0:0_a2d53    163    chr17    224    60    50M    =    366    192    GCACCGCCCAAAGACACGCCCATGTCCAGCTTATTCTCCCCAGTTCCTCT    22222222222222222222222222222222222222222222222222    XT:A:U    NM:i:1    SM:i:37    AM:i:37    X0:i:1    X1:i:0    XM:i:1    XO:i:0    XG:i:0    MD:Z:37G12
chr17_123_290_3:0:0_1:0:0_27b3b    83    chr17    241    60    50M    =    123    -168    GCCCATGTCCAGCTTATTCTGCCCAGTTCCTCTCCAGATAGGCTGCATGG    22222222222222222222222222222222222222222222222222    XT:A:U    NM:i:1    SM:i:37    AM:i:25    X0:i:1    X1:i:0    XM:i:1    XO:i:0    XG:i:0    MD:Z:38A11

Best Wishes,

Umer

ADD COMMENTlink 5.7 years ago umer.zeeshan.ijaz ♦ 1.7k

Login before adding your answer.

Powered by the version 1.8