samtools view region
2
0
Entering edit mode
6.8 years ago
xd_d ▴ 110

Hello everybody,

it is possible to get only the reads if the start coordinates is in a region.

When i run this: samtools view cigarN.bam X:68206675-68208264

I get this:

 FCC0WRYACXX:1:1103:14931:24492 99  X   68204511    255 59M2530N30M =   68207133    2703    TGTTCTTATTATTATGGTATCATTTATTTGTAAAATTTATAATTGTAAAATACAAACTAGTAAGAACCCTATCAGTGTTGTTTTTGTTT   HHJJJJJJJJJJJIJJJFHIJJJJJIJJJJJIJJJJJJJJJJJIJJJJJJJJJJJJJIJJIIIJGGIIIJJJJJJEHFHHHFFFFDEEE   PG:Z:MarkDuplicates NH:i:1  HI:i:1  nM:i:1  AS:i:165

FCC0WRYACXX:1:1304:17536:3030   99  X   68206627    255 47M3479N42M =   68210195    3656    TTTCTCATACTGGCAATAGTATTTCACCCAGGATATTCCTAAAGCCCATTTCTCTTGTGTATAGAGATAGCCTTCAATAGTTGGCTGTC   HHJJJJIJJJJJJJJIIIJHIJJJIJJJJIJIIJIJJIJJIJJIIIJJJJJJJJJJJGHIGGHIIIJHHGEHFFFDEFFECFCEEEDDD   PG:Z:MarkDuplicates NH:i:1  HI:i:1  nM:i:0  AS:i:176

 FCC0WRYACXX:1:1106:15737:96571 83  X   68206628    255 46M3479N43M =   68201680    -8516   TTCTCATACTGGCAATAGTATTTCACCCAGGATATTCCTAAAGCCCATTTCTCTTGTGTATAGAGATAGCCTTCAATAGTTGGCTGTCC   CCCDDCCCDDDCCEDDEDCDEDB9GFFFFGGDGGIIGGEHFAAGFIGGCGDGFCHGIEGGIIHDBEEGIGIJGIIGIIJJGHIHGEFGH   PG:Z:MarkDuplicates NH:i:1  HI:i:1  nM:i:0  AS:i:179

But I want only these reads if the fourth column (start position of the reads) in the region. It is possible to get a solution with samtools ?

in my example the start position of the read FCC0WRYACXX:1:1103:14931:24492 is not in the region X:68206675-68208264.

data: RNA-seq paired-end

RNA-Seq samtools paired-end • 6.1k views
ADD COMMENT
0
Entering edit mode

No, it is not possible with samtools. You will need to write a script and work with the sam file for your purpose.

ADD REPLY
0
Entering edit mode

Given the enormous amount of tools available it's very likely that something exists for this quite specific problem, without the need of writing a custom script. (Although I agree it could be a solution - it's for sure not the easiest for everyone).

ADD REPLY
2
Entering edit mode
6.8 years ago

Would this work, without resorting on additional third party tools?

samtools view aln.bam X:100-1000 | awk -v FS='\t' '$4 >= 100 && $4 <= 1000'

If you want also the bam header:

 samtools view -h aln.bam X:100-1000 | awk -v FS='\t' '$1 ~ "^@" || ($4 >= 100 && $4 <= 1000)'

(Not tested)

ADD COMMENT
2
Entering edit mode

I had this idea also!

ADD REPLY
0
Entering edit mode

Good call. I'm using a variation of this command in order to pull out SNP genotype-specific reads in order to divide my BAM by region and genotype

ADD REPLY
1
Entering edit mode
6.8 years ago
H.Hasani ▴ 990

How about running bedtools - intersect on your bam file. That should you give you what you want. I just noticed, there is already an example of exactly what you are after on that page.

ADD COMMENT
0
Entering edit mode

Sorry... Maybe I'm missing it. Could you point out at the example? I don't think you can tell bedtools "report feature if the start of a is in interval b". Of course you further parse the output but it would make things more complicated then necessary.

ADD REPLY
0
Entering edit mode

-abam Default behavior when using BAM input (deprecated since 2.18.0)

it says deprecated (the -abam), but you still can use -a and -b as given in this example

ADD REPLY
1
Entering edit mode

It says you can use bam as input, but you can't tell bedtools to output a read if the start is inside a target interval. If a read starts before the interval and ends somewhere inside or after the interval, bedtools will print it. The OP wants to discard such reads instead. Did I get it right?

ADD REPLY
0
Entering edit mode

could you point out where did you read that? when it comes to intersection it does not matter which end is obtained by the given list of intervals (at least that's what they say specifically in case of paired-end reads).

ADD REPLY
0
Entering edit mode

it does not matter which end is obtained by the given list of intervals

Exactly, so if the start of an interval (a read) is outside the interval, the OP wants to reject it. The OP wants to keep reads whose start is inside the interval. So if the read has start and end coordinates chr1:5-20 and the target interval is chr1:10-20, the read has to be rejected, but bedtools will instead return it.

By the way, I'm not considering paired-end reads here.

ADD REPLY
0
Entering edit mode

Oh, I see your point now.

ADD REPLY

Login before adding your answer.

Traffic: 2616 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6