Using flowcell name as file name for FASTQ file
1
0
Entering edit mode
5.4 years ago
Ric ▴ 430

I have two folders and each of them contain the same file names.

ls -1
10_S0_L001_R1_001.fastq.gz
10_S0_L001_R2_001.fastq.gz
11_S0_L001_R1_001.fastq.gz
11_S0_L001_R2_001.fastq.gz

Is there a way to extract from each dataset the flowcell name and use it as unique filename?

Thank you in advance.

sequence next-gen • 2.1k views
ADD COMMENT
0
Entering edit mode
5.4 years ago
GenoMax 141k

If you look inside the files then you should see headers that look something like

@HWI-EAS209_0006_FC706VJ:5:58:5894:21141

The flowcell serial is embedded in the header (e.g. FC706VJhere). Problem with using that for file names is all samples would have the same name unless you cat it to sample id with something like Sample_01_FC706VJ

ADD COMMENT
0
Entering edit mode

Thank you, is there a script for it?

ADD REPLY
0
Entering edit mode

You can use standard unix tools (such as cut, awk, tr, grep, etc) or non-standard ones (e.g.,bioawk) to extract metadata from your fastqs. Like genomax said, you'd need to extract some other identifier(s) in order to make your filenames unique.

ADD REPLY

Login before adding your answer.

Traffic: 2524 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6