Extract parts of reads from BAM file that overlap a specific region of genome
2
3
Entering edit mode
6.8 years ago
Chadi Saad ▴ 100

Assuming that I have a BAM file with aligned reads, how can I extract the part of all reads that cover completely a specific region of a genome. (I tried with samtools view file.bam chr1:13974-22442 , but it reports all reads that intersect with this region, also it reports all the read sequence not only the intersection part...)

For example : (needed region to be extract between pipes)

=============|=====|========================= < Reference
    ---------|-----|--------------------------
-------------|-----|-----------
                         --------------------------------------------
           --|-----|--------------------------

Thank you

bam sequencing alignment • 5.0k views
ADD COMMENT
0
Entering edit mode

why do you want only a part of the reads?

ADD REPLY
0
Entering edit mode

Hello Chadi Saad, I have the same request...I want to extract/trim the reads from bam file that exactly match with specific regions of a bed file...Did you get it?

Chiara

ADD REPLY
2
Entering edit mode
6.8 years ago

using samjs :

samtools view  -bu -F 4  file.bam chr1:13974-22442 |\
java -jar dist/samjs.jar  -e 'record.alignmentStart <= 13974 && record.alignmentEnd >= 22442'

see also : Count reads within region

ADD COMMENT
0
Entering edit mode

thanks , worked perfectly. But it outputs all the reads sequence, while i want just the part of read that match in the required region. Can i do it in a simple way ?

ADD REPLY
1
Entering edit mode
ADD REPLY
0
Entering edit mode

It does what i want, but it have some bugs i think. Sometimes the read is clipped too much (see attachement screenshot, in purple the original read, in red the clipped one, it stops at 14776 instead of 14780). Sometimes reads are shifted by one position (see screenshot 2)...

enter image description here enter image description here

ADD REPLY
0
Entering edit mode

nice to know. please , send an issue to https://github.com/lindenb/jvarkit/issues w a minimal SAM file and a BED file.

ADD REPLY
0
Entering edit mode

I am also facing the same issue. I want the part of reads that match a specific region. How did you do it ?? Please help

ADD REPLY
0
Entering edit mode
21 months ago
Prakki Rama ★ 2.7k

@Chadi Saad @chiaraunivpm @msumaira36

This should do, what you want.

 library(GenomicAlignments) # Load the package

 stack <- stackStringsFromBam("test_4_seqs_sorted.bam", 
                              param=GRanges("Reference_barcodes:54-78")) # Extract the stacked reads from the position

 write.table(stack, Aligned_Reads_at_Region_Of_Interest.txt, quote = FALSE, col.names = F, row.names = F) # save into a text file

Source: Extract from a .bam file the reads that map to specific area in the reference genome

ADD COMMENT

Login before adding your answer.

Traffic: 2653 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6