I have a blacklist barcode file and a sam file. I hope to delete the entire row in sam file if its first 32 letters match the barcode in the blacklist. I've tried "grep -vf blackList_barcode.txt hg19.sam", but it's too slow. Is there any good suggestion in any language? (In the following example, the first row in the sam file with be deleted, because the first 32 letters in that row matches the first barcode in blacklist.
blackList_barcode.txt
AAAAAAAAAAAAAAAACGTCTAATCAGGACGT
AAAAAAAAAAAAAAAACTAAGCCTTAATCTTA
AAAAAAAAAAACAAAATCCTACGGTATAGCCT
AAAAAAAAAAGATTATTCTTAAAACACAACCA
hg19.sam
AAAAAAAAAAACAAAATCCTACGGTATAGCCT:M01581:1209:000000000-D3YJT:1:1102:16735:1704 1:N:0 163 chr17 21903973 3 15=1X28= = 21904354 425 CAGTGCTTCCCACGGCTGTCTTAGGAACCAGTCCCCGAGGCTTG 1>>>>F3B31B1BAA1AA0BFBA311DE000A11EFEEG///F/ XT:A:R NM:i:1 AM:i:3
CGGCTATGGGACTCCTCGTCTAATGTACTGAC:M01581:1209:000000000-D3YJT:1:1102:14725:1704 1:N:0 73 chr1 71429475 42 44= = 71429475
0 CCTGGTTACTTATGCAAATTTCTGATGCAGGCTTGAATTTCTCC AA1A1@11@D3DAABGFGGC1EHHHF1FGGCEGHGCEGBFGFAE NM:i:0 AM:i:0
How could I use the entire blacklist file if I only grep for the line start? Thanks!
I don't know whether
grep
have an option for this. You can modify your blacklist file in that way to prepend an^
to each line.