Biostar Beta. Not for public use.
Question: How To Split Reads For Different Flowcell Lanes In Fastq Files?
1
Entering edit mode

My fastQ file was delivered by the sequencing core as a combined file that has reads from two flow cell lanes. I am wondering if there's a way to split the reads from the two lanes? The downstream pipeline is Tophat-cufflinks-cuffmerge-cuffdiff.

I've also read the documentation of Tophat and did not see an option of splitting the reads in tophat, so I am asking here in this forum. thanks

ADD COMMENTlink 6.6 years ago newDNASeqer • 630 • updated 6.6 years ago Rm 7.8k
Entering edit mode
0

is the lane in the ID for each read ? If so, you could write a simple python/perl script to do that.

ADD REPLYlink 6.6 years ago
Gabriel R.
♦ 2.6k
Entering edit mode
0

Do the IDs have any distinguishing marks? (They should.) If you post a brief snippet containing a read from each lane, one of us could probably whip up a quick script or at least help you get started.

ADD REPLYlink 6.6 years ago
Alex Reynolds
28k
7
Entering edit mode

Quick Awk solution to separate merged fastq file based on lane

paste - - - -  my.R1.fastq | awk -F"\t" '{ split($1, arr, ":"); print $1 "\n" $2 "\n+\n" $4 >"lane."arr[4]".R1.fastq" }'
ADD COMMENTlink 6.6 years ago Rm 7.8k
Entering edit mode
2

I have a pure Awk solution that is much faster. Like the above solution, let's assume that the records are blocks of 4 lines:

awk 'BEGIN {FS = ":"} {lane=$4 ; print > "lane."lane".fastq" ; for (i = 1; i <= 3; i++) {getline ; print > "lane."lane".fastq"}}' < my.R1.fastq

Using the getline command 3 times, you can read blocks of 4 lines (from the standard input, hence the <).

ADD REPLYlink 6.6 years ago
Frédéric Mahé
♦ 2.9k
Entering edit mode
0

Thanks for this solution - I tried it and it works fast and nicely. I'm not familiar with awk, so could you please explain why your solution is faster please?

ADD REPLYlink 3.7 years ago
DVA
• 520
Entering edit mode
0

Totally rad. I love one-liners.

ADD REPLYlink 6.6 years ago
Dan D
6.8k
Entering edit mode
0

+1 for the paste

ADD REPLYlink 6.6 years ago
Pierre Lindenbaum
120k
Entering edit mode
0

I'm wondering, if it would be correct way to work with paired-end reads (not just a single fastq file)? Will the order be the same in the resulted files containing forward and reverse reads? Or may be there is a more safe solution for paired-end reads?

ADD REPLYlink 11 months ago
Denis
• 70
Entering edit mode
1

Are you referring to reads from multiple lanes in one file or just interleaved R1/R2 reads from a single lane?

It should be fine to use this solution as long as nothing else has been done to original files. You can do a quick check with repair.sh from BBMap suite after separating the files to make sure the read order is retained post-split.

ADD REPLYlink 11 months ago
genomax
68k
Entering edit mode
0

Yes, i have two fastq files - one with forward and another with reverse reads. In each file there are reads from all 8-th Illumina lanes and i need to split their by lane so that order of reads in all resulted R1 files and correspondig R2 files be the same.

ADD REPLYlink 11 months ago
Denis
• 70
4
Entering edit mode

enter image description here

See that highlighted "3" in the first line? That's the lane number in the FASTQ standard. If you read in your FASTQ file and direct your reads to different output files based on that value, you'll have different FASTQ files separated by lane.

Do you need help writing the script to do that?

ADD COMMENTlink 6.6 years ago Dan D 6.8k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0