Parsing Fastq Files
2
1
Entering edit mode
12.0 years ago

Hi all,

I have Fastq reads something like

@HWI-ST1162:73:C0KEFACXX:6:1101:1816:1918 1:N:0:CGATGT
NACCCTAGAAATTATAAATCTCTTCAAGTGAGATTGTAAGGAGAAGGAGAAACTTGGTCTGGAATTTGTTATAAAAGCACTT
+
#1=DDFFFHHGGHIJJJJJIJJJJJJJJCHGHIIJJEFHIJIJJIIJIIIIJHHIJJFHIIJJJJJJJIJIJIJIIJHEHHHHFFFFFFEEEDEEEDCDDC

I aligned this fastq file with a reference genome using bowtie. How can I identify the sample name from this record?

I have demultiplexed fastq files for each sample and I also have barcode information file in the format

sample name    Index sequence
BC1                  CGATGT
BC2                  CGATGA

When I try to retrieve the alignment information using $sam->features() the seqID will be returned as

@HWI-ST1162:73:C0KEFACXX:6:1101:1816:1918

How can I get the 1:N:0:CGATGT part from the alignment information?

Thanks, Deeps

fastq parsing • 4.5k views
ADD COMMENT
2
Entering edit mode
12.0 years ago

I'd suggest that you use SAM Read Groups to track samples. This would be done at the alignment stage....

ADD COMMENT
0
Entering edit mode

Good suggestion. It helped me a lot

ADD REPLY
1
Entering edit mode
12.0 years ago
jingtao09 ▴ 110

If you want to keep the barcode in SAM file, you can add a non-space character in between the main header and the barcode section.

@HWI-ST1162:73:C0KEFACXX:6:1101:1816:1918 1:N:0:CGATGT

to be

@HWI-ST1162:73:C0KEFACXX:6:1101:1816:1918:1:N:0:CGATGT

here I used a colon ":", so if you parse this header, you can use split function to get the barcode.in Python

header="@HWI-ST1162:73:C0KEFACXX:6:1101:1816:1918:1:N:0:CGATGT"
barcode=header.rstrip("\n").split(":")[-1]

Normally, most of the mapper, i.e BWA or BOWTIE will truncate the header name after a space. so if you preprocess your FASTQ file into this new format you will save alot time. Otherwise, if you are not able to do the modification on the FASTQ reads, you can open the original FASTQ file and SAM file at same time to calibrate the line numbers and parse out the barcode.

ADD COMMENT

Login before adding your answer.

Traffic: 2533 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6