Hello there,
I have a little question, I know that this is very similar to this question already answered: How to efficiently remove a list of reads from BAM file? But, somehow is a slight different. In fact, I have used Picard before to remove the unwanted reads from my bamfile. Now, I want to remove reads not only taking into account their ids, but also their mapped position (HI:I:).
I have some multi-mapped reads and I want to remove only those according to their position. I can't use Picard, because this is going to remove all the multi-mapped reads even those that I want to keep.
So for exemple, I have:
SRR4293695.162674600 272 chr1 10538 0 30M1S * 0 0 CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8 HI:i:2 AS:i:27 nM:i:1
SRR4293695.162674600 16 chr1 181068 0 30M1S * 0 0 CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8 HI:i:1 AS:i:27 nM:i:1
SRR4293695.162674600 272 chr9 10896 0 30M1S * 0 0 CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8 HI:i:7 AS:i:27 nM:i:1
SRR4293695.162674600 272 chr12 10655 0 30M1S * 0 0 CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8 HI:i:3 AS:i:27 nM:i:1
SRR4293695.162674600 256 chr15 101980314 0 1S30M * 0 0 TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8 HI:i:4 AS:i:27 nM:i:1
SRR4293695.162674600 272 chr16 10479 0 30M1S * 0 0 CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8 HI:i:8 AS:i:27 nM:i:1
SRR4293695.162674600 256 chrX 156029433 0 1S30M * 0 0 TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8 HI:i:6 AS:i:27 nM:i:1
SRR4293695.162674600 256 chrY 57215953 0 1S30M * 0 0 TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8 HI:i:5 AS:i:27 nM:i:1
I would like to remove from the bam file the first two alignments in chromosome 1 with HI:i:1 and HI:i:2.
SRR4293695.162674600 272 chr9 10896 0 30M1S * 0 0 CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8 HI:i:7 AS:i:27 nM:i:1
SRR4293695.162674600 272 chr12 10655 0 30M1S * 0 0 CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8 HI:i:3 AS:i:27 nM:i:1
SRR4293695.162674600 256 chr15 101980314 0 1S30M * 0 0 TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8 HI:i:4 AS:i:27 nM:i:1
SRR4293695.162674600 272 chr16 10479 0 30M1S * 0 0 CCACCGAAATCTGTGCAGAGGAGAACGCAGA FFFFAFFFFFFF7<FFFFAFFFFFFF<AAAA NH:i:8 HI:i:8 AS:i:27 nM:i:1
SRR4293695.162674600 256 chrX 156029433 0 1S30M * 0 0 TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8 HI:i:6 AS:i:27 nM:i:1
SRR4293695.162674600 256 chrY 57215953 0 1S30M * 0 0 TCTGCGTTCTCCTCTGCACAGATTTCGGTGG AAAA<FFFFFFFAFFFF<7FFFFFFFAFFFF NH:i:8 HI:i:5 AS:i:27 nM:i:1
Does anyone have any suggestion? I could use python to do it by myself and then convert sam to bam file, but maybe there is something more optimized to try.
Thanks in advance!
Hello MarVi,
take a look at bamutils filter.
fin swimmer
i don't understand. Provide an example please.
Hello, thank you very much for your responses.
Sure, here an example :
I would like remove the first two alignments that are in the first chromosome. At the end in my bam file I would like to have :
Thank you very much for your suggestions or ideas. I am going to check bamutils filter.
MarVi