How to efficiently remove a list of reads according to their ID and mapped position from BAM file?
1
0
Entering edit mode
5.8 years ago
MarVi ▴ 30

Hello there,

I have a little question, I know that this is very similar to this question already answered: How to efficiently remove a list of reads from BAM file? But, somehow is a slight different. In fact, I have used Picard before to remove the unwanted reads from my bamfile. Now, I want to remove reads not only taking into account their ids, but also their mapped position (HI:I:).

I have some multi-mapped reads and I want to remove only those according to their position. I can't use Picard, because this is going to remove all the multi-mapped reads even those that I want to keep.

So for exemple, I have:

SRR4293695.162674600    272 chr1    10538   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:2  AS:i:27 nM:i:1
SRR4293695.162674600    16  chr1    181068  0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:1  AS:i:27 nM:i:1
SRR4293695.162674600    272 chr9    10896   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:7  AS:i:27 nM:i:1
SRR4293695.162674600    272 chr12   10655   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:3  AS:i:27 nM:i:1
SRR4293695.162674600    256 chr15   101980314   0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:4  AS:i:27 nM:i:1
SRR4293695.162674600    272 chr16   10479   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:8  AS:i:27 nM:i:1
SRR4293695.162674600    256 chrX    156029433   0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:6  AS:i:27 nM:i:1
SRR4293695.162674600    256 chrY    57215953    0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:5  AS:i:27 nM:i:1

I would like to remove from the bam file the first two alignments in chromosome 1 with HI:i:1 and HI:i:2.

SRR4293695.162674600    272 chr9    10896   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:7  AS:i:27 nM:i:1
SRR4293695.162674600    272 chr12   10655   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:3  AS:i:27 nM:i:1
SRR4293695.162674600    256 chr15   101980314   0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:4  AS:i:27 nM:i:1
SRR4293695.162674600    272 chr16   10479   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:8  AS:i:27 nM:i:1
SRR4293695.162674600    256 chrX    156029433   0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:6  AS:i:27 nM:i:1
SRR4293695.162674600    256 chrY    57215953    0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:5  AS:i:27 nM:i:1

Does anyone have any suggestion? I could use python to do it by myself and then convert sam to bam file, but maybe there is something more optimized to try.

Thanks in advance!

samtools bam RNA-Seq alignment • 2.1k views
ADD COMMENT
0
Entering edit mode

Hello MarVi,

take a look at bamutils filter.

fin swimmer

ADD REPLY
0
Entering edit mode

but also their mapped position (NH:I:).

I have some multi-mapped reads and I want to remove only those according to their position.

i don't understand. Provide an example please.

ADD REPLY
0
Entering edit mode

Hello, thank you very much for your responses.

Sure, here an example :

SRR4293695.162674600    272 chr1    10538   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:2  AS:i:27 nM:i:1
SRR4293695.162674600    16  chr1    181068  0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:1  AS:i:27 nM:i:1
SRR4293695.162674600    272 chr9    10896   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:7  AS:i:27 nM:i:1
SRR4293695.162674600    272 chr12   10655   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:3  AS:i:27 nM:i:1
SRR4293695.162674600    256 chr15   101980314   0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:4  AS:i:27 nM:i:1
SRR4293695.162674600    272 chr16   10479   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:8  AS:i:27 nM:i:1
SRR4293695.162674600    256 chrX    156029433   0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:6  AS:i:27 nM:i:1
SRR4293695.162674600    256 chrY    57215953    0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:5  AS:i:27 nM:i:1

I would like remove the first two alignments that are in the first chromosome. At the end in my bam file I would like to have :

SRR4293695.162674600    272 chr9    10896   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:7  AS:i:27 nM:i:1
SRR4293695.162674600    272 chr12   10655   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:3  AS:i:27 nM:i:1
SRR4293695.162674600    256 chr15   101980314   0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:4  AS:i:27 nM:i:1
SRR4293695.162674600    272 chr16   10479   0   30M1S   *   0   0   CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8  HI:i:8  AS:i:27 nM:i:1
SRR4293695.162674600    256 chrX    156029433   0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:6  AS:i:27 nM:i:1
SRR4293695.162674600    256 chrY    57215953    0   1S30M   *   0   0   TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8  HI:i:5  AS:i:27 nM:i:1

Thank you very much for your suggestions or ideas. I am going to check bamutils filter.

MarVi

ADD REPLY
0
Entering edit mode
5.8 years ago

using samdjdk in pair mode http://lindenb.github.io/jvarkit/SamJdk.html

 java -jar picard.jar SortSam I=coord.bam O=query.bam SO=queryname

java -jar  dist/samjdk.jar --pair -e 'return records.stream().filter(R->{Integer i=R.getIntegerAttribute("HI"); return i!=null && i>2;}).collect(Collectors.toList());' query.bam
ADD COMMENT
0
Entering edit mode

Hello Pierre! Thanks for your answer. Nevertheless, is not the answer that I am looking for.

In fact, I don't want to keep only those reads that have HI:I > 2. I have a list with reads names associated with their alignment (chr:start) and HI:I. According to this list I want to remove those reads from my bamfile.

Above I just wrote an example. I feel sorry that it was not clear.

MarVi

ADD REPLY
0
Entering edit mode

Then That is still not clear to me. Why do you remove the two first one and not the other ? Furthermore, why are you doing this: in your example you're removing the primary alignment and all the MAQP==0 ....

ADD REPLY
0
Entering edit mode

My example is not a real example and not always are the two first. Removal is based in their positions (chr:start-end).

ADD REPLY

Login before adding your answer.

Traffic: 2694 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6