Take A Subset Of A Fastq Paired-End Sample
2
0
Entering edit mode
11.1 years ago
dfernan ▴ 760

Hi,

I have two paired-end fastq compressed files coming from HiSeq RNA-SEq experiment, ie., pair.1.fastq.gz and pair.2.fastq.gz.

The files are very large so I wanted to just take a few million/thousand reads from each of them (by their respective pairs) and use that file for trying/debuuging purposes.

The results should be two paired-end files, i.e., pair.test.1.fastq.gz and pair.test.2.fastq.gz.

I'd be happy to hear some suggestions on how to do this or hear about tools available, thanks!

paired-end fastq rna-seq illumina • 14k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
0
Entering edit mode

thanks Pierre, I didn't realize someone else asked about it!

ADD REPLY
3
Entering edit mode
11.1 years ago
Rahul Sharma ▴ 660

Hi,

Assuming that the reads are in same order in both of the files. I would do like this:

zcat pair.1.fastq.gz | sed -n 1,4000000p > pair_1_millions.fastq
zcat pair.2.fastq.gz | sed -n 1,4000000p > pair_2_millions.fastq

Thanks,
Rahul

ADD COMMENT
0
Entering edit mode

Hi, thanks a lot, however, I am not sure if the reads are in the same order, I'd like to add that I am pairing them correctly...

ADD REPLY
0
Entering edit mode
10 months ago

I believe that a better option for paired-end data is to use fastq-sample from fastq-tools:

fastq-sample -n 5000000 pair_R1.fastq.gz pair_R2.fastq.gz -o pair_5M_R
ADD COMMENT

Login before adding your answer.

Traffic: 1520 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6