Question

Altering fastq sequence identifier

0

Entering edit mode

6.7 years ago

fiona.newberry ▴ 80

I am attempting to determine false positive/negative of various alignments and want to add a unique sequence identifier onto each fastq file.

I have ten genomes which I have synthetically sequenced (so 20 fq files). The current sequence identifiers look like this:

@simulated.2618103/1

I want to change it so that it looks like this

@simulated.2618103/1.1

Each of the ten genomes will have a sequence identified 1-10. I have tried reading about how to do this with awk but don't seem to understand the program.

Thanks

fastq • 2.6k views

ADD COMMENT • link updated 6.7 years ago by GouthamAtla 12k • written 6.7 years ago by fiona.newberry ▴ 80

1

Entering edit mode

This would help, try to extend the answer in these links

ADD REPLY • link 6.7 years ago by venu 7.1k

score 3 · Accepted Answer · 2017-08-07

3

Entering edit mode

6.7 years ago

GouthamAtla 12k

Its a bit tricky with fastq as you need to alter only the 1st line of every record ( each record is represented in 4 lines )

So, what you can do is :

awk '{ if (NR%4==1) gsub("$",".1",$1); print }' in.fq > renamed_in.fq

Change the gsub() according to your needs,

ADD COMMENT • link 6.7 years ago by GouthamAtla 12k

0

Entering edit mode

THANK YOU!

do you mind explaining the parts of your awk script? I am really struggling to learn this. Do you know of any good learning material?

ADD REPLY • link 6.7 years ago by fiona.newberry ▴ 80

0

Entering edit mode

You can read any basic awk tutorials to understand the awk syntax and inbuilt variables.

ADD REPLY • link 6.7 years ago by GouthamAtla 12k