Extract reads overlapping a specific region in bam file
1
2
Entering edit mode
2.9 years ago
User000 ▴ 690

Hello,

ref --------------start-----------------------------------------stop-------------------------------
r1                   -------------------------------------------------------
r2             ---------------------------------------------------------------
r3                  -----------------    --------------------------------------
r4              -----------------------------------------------------------------------------

a. I would like to extract reads from bam file that overlap entirely the start and stop region from both directions (from start to stop and from stop to start). From the example above, I need to keep only r1,r2 and r4, but not r3.

How to do this?

I tried this, but I still have some small fragmented reads...

samtools view -b -h -q 10 input.bam chrX:230-330 | awk 'BEGIN{OFS="\t"}{if($1 ~ /^"@"/) {print} else {if($4 >= 230) {print} else {}}}' | samtools view -Sbo output.bam 

I also tried:

samtools view -h -q 10 input.bam chrX:230-330 | awk 'BEGIN{OFS="\t"}{if($1 ~ /^"@"/) {print} else {if($4 >= 230 && length($10) >= 100) {print} else {}}}' | samtools view -Sbo output.bam -
awk samtools • 2.7k views
ADD COMMENT
2
Entering edit mode

If the regions are small you could combine the standard region querying with the filter expression, ie

samtools view -e 'pos <= 230 && pos+rlen >= 330' in.bam chrX:230-330 

I haven't tested it. Maybe it needs to be pos+rlen-1, but you get the jist. I hope it works! If not then maybe it'll need a samtools view | samtools view style command to do it in two passes.

ADD REPLY
0
Entering edit mode

Your diagram and text do not match - r1 is the only read that spans both the start and stop. Is that what you want?

ADD REPLY
0
Entering edit mode

I updated my question, thanks

ADD REPLY
3
Entering edit mode
2.9 years ago

Create a BED file that contains the start and stop positions, then use BEDtools:

bedtools intersect -wa BAM -b BED -F 1.0
ADD COMMENT
1
Entering edit mode

The user wants reads that 100% / completely fall within a desired interval. Does this BEDTools command do that? I guess that, yes, it can do it, and that -F 1.0 is the key.

-f Minimum overlap required as a fraction of A. Default is 1E-9 (i.e. 1bp).

-F Minimum overlap required as a fraction of B. Default is 1E-9 (i.e., 1bp).

User000, can you please try this?

ADD REPLY
0
Entering edit mode

I've tried this, and I get the error:

***** ERROR: Unrecognized parameter: input.bam *****

Also if i try bedtools intersect -wa -a BAM -b BED -F 1.0, I get this error:

ERROR: Received illegal bin number 4294967295 from getBin call.
ERROR: Unable to add record to tree.

Also, I have only one interval of interest, not a multiple, so my bed file contatins only that. It that ok?

ADD REPLY
0
Entering edit mode

What are the names of your BED and BAM files?

ADD REPLY
0
Entering edit mode

Try:

bedtools intersect -wa -a BAM -b BED -F 1.0
ADD REPLY

Login before adding your answer.

Traffic: 1508 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6