Biostar Beta. Not for public use.
Modifying barcode sequence in fq files
1
Entering edit mode
18 months ago
AP • 90

Hello,

I have several .fq files containing 5bp inline barcodes at the beginning of each read such as (barcodes are between *) :

@gi|110640213|ref|NC_008253.1|_418_952_1:0:0_1:0:0_0/1
*CCAGG*CAGTGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCATCTGGTAGCGATGAT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@gi|110640213|ref|NC_008253.1|_31_476_0:0:0_0:0:0_1/1
*CAGAT*GGTTGGTGATTTTGGCGGGGGCAGAGAGGACGGTGGCCACCTGCCCCTGCCTGGCATTGCTTTCC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@gi|110640213|ref|NC_008253.1|_210_743_2:0:0_1:1:0_2/1
*CATTA*CCACCACCATCACCATTACCACAGGAAACGGTGCGGGCTGACGCGTACAGGAAACACCGAAAAAA
+
2222222222222222222222222222222222222222222222222222222222222222222222

I would like to modify these sequences in order to have the same for each read (here it would start by AAAAA):

@gi|110640213|ref|NC_008253.1|_418_952_1:0:0_1:0:0_0/1
*AAAAA*CAGTGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCATCTGGTAGCGATGAT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@gi|110640213|ref|NC_008253.1|_31_476_0:0:0_0:0:0_1/1
*AAAAA*GGTTGGTGATTTTGGCGGGGGCAGAGAGGACGGTGGCCACCTGCCCCTGCCTGGCATTGCTTTCC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@gi|110640213|ref|NC_008253.1|_210_743_2:0:0_1:1:0_2/1
*AAAAA*CCACCACCATCACCATTACCACAGGAAACGGTGCGGGCTGACGCGTACAGGAAACACCGAAAAAA
+
2222222222222222222222222222222222222222222222222222222222222222222222

I want to make sure that only the sequence at the beginning of the reads are modified and not throughout the read itself. The barcode sequence might be present within reads and I don't want to modify it.

Do you know any easy way to do this? Thanks!

fastq barcode • 784 views
ADD COMMENTlink
3
Entering edit mode
16 months ago
Gabriel R. ♦ 2.6k
Center for Geogenetik Københavns Univer…

I assume that the * are not part of the sequence and are just there to highlight them :-) Then use awk:

zcat [in fasta file]  |awk '{if(NR%4==2){print "AAAAA"substr($0,5)}else{print $0}}' |gzip > [output fasta].gz
ADD COMMENTlink
0
Entering edit mode

Works like a charm! Thanks Gabriel. I was trying things with awk but I was not successful. This solves my issue. Also yes, the * are not part of the sequence :-)

ADD REPLYlink
0
Entering edit mode

you are most welcome, mark the question as answered if you please :-)

ADD REPLYlink
0
Entering edit mode
18 months ago
AP • 90

From Gabriel R:

zcat [in fasta file] |awk '{if(NR%4==2){print "AAAAA"substr($0,5)}else{print $0}}' |gzip > [output fasta].gz

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1