Isolating reads from specific region from bam file
1
1
Entering edit mode
7.1 years ago
Sara ▴ 240

I have RNA-seq data and would like to get the reads from specific region for example -/+ 50 reads before and after stop site. do you know how I can get it from bam file

RNA-Seq • 7.1k views
ADD COMMENT
1
Entering edit mode

Hello Sara, You may find some info in this post: C: alignment containing sequences from position a to b

ADD REPLY
5
Entering edit mode
7.1 years ago
bruce.moran ▴ 960

Easiest way is SAMtools view, you can specify a single region:

samtools view -h <your.bam> chr1:1000-2000

or you can give a BED file (I think in tab-delimited format):

samtools view -h -L <your.bed> <your.bam>

Make sure to use same chromosomal naming (either with 'chr' or without) as in the BAM. You can check with

samtools  view -H <your.bam>
ADD COMMENT
0
Entering edit mode

Hey Bruce, not sure why this wasn't upvoted (I upvoted just now), as it's the best and most concise answer I've seen on this particular question.

ADD REPLY
1
Entering edit mode

Thanks, I find it particularly useful for IGV or other viewers, instead of trying to load a whole BAM locally.

ADD REPLY
0
Entering edit mode

will this include only the reads the fully lie in the position? i.e. 900-1200 no, and 1100-1400 yes?

ADD REPLY
1
Entering edit mode

No, any that overlap the interval. You can, nevertheless, still use the above command and then pipe the output to, e.g., awk, in order to further filter for reads only in each desired interval.

ADD REPLY
0
Entering edit mode

could please give me an example how to do this? Or a link to something similar?

ADD REPLY
0
Entering edit mode

There is likely some 'random' script out there that can do this. Perhaps even BEDTools has, these days, some way to do this.

What you need to do is start with the POS field in the BAM/SAM file, which represents the left-most co-ordinate to which your read aligns / maps. Then, you have to parse the CIGAR string to understand how many more bases are involved in the alignment.

...or, if all of your reads are 50bp and your interval of interest is 100bp, then you just need to keep any read whose POS value starts between base 1 and 50.

ADD REPLY
0
Entering edit mode

I am having trouble making it work:

samtools view -h -q 10 input.bam chrX:280-350 | awk 'BEGIN{OFS="\t"}{if($1 ~ /^"@"/) {print} else {if($4 >=280 || $4+length($10)-1 <= 350 && length($10) > 60) {print} else {}}}' | samtools view -Sbo output3.bam -

could you please give a look at my question: Extract reads overlapping a specific region in bam file

ADD REPLY
0
Entering edit mode

Hi, I do not have time to check this code, but, what one needs to do is (for each read in BAM):

  1. extract read start position (stored in SAM format)
  2. extract read length
  3. determine read end position
  4. read start and end are within desired range?
ADD REPLY

Login before adding your answer.

Traffic: 1942 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6