Extract 1M reads from paired end fastqs
1
0
Entering edit mode
7.9 years ago
acorella ▴ 30

Hi,

I have paired end reads in 2 separate fastq files. I want to take a subset of these reads for a bowtie run to get insert size. I am familiar with how to break up an individual file into 1 million reads (i.e. here: https://www.biostars.org/p/66864/)

My Question: Do I need to ensure my reads are in the same order in each file before I do this? If so, how do I do this?

Thanks!

RNA-Seq • 3.4k views
ADD COMMENT
1
Entering edit mode

However, do I need to ensure my reads are in the same order in each file before I do this? If so, how do I do this?

If you have not done anything to the files (other than using a paired-end aware trimming program) then the reads should be in order in R1/R2 files.

The files can be repaired as follows, if you suspect that the pairing is broken. repair.sh is from BBMap suite.

repair.sh in1=r1.fq.gz in2=r2.fq.gz out1=fixed1.fq.gz out2=fixed2.fq.gz outsingle=singletons.fq.gz
ADD REPLY
0
Entering edit mode

Thank you! That was indeed the question I was trying to ask!

Is there a quick way you can tell if the pairing is broken?

ADD REPLY
0
Entering edit mode

reformat.sh from the same package has an option to to that:

reformat.sh in1=r1.fq in2=r2.fq vpair

That will just verify that the names indicate the reads are in the same order in each file. Incidentally, you can also randomly sample 1M pairs from them, like this:

reformat.sh in1=r1.fq in2=r2.fq out1=sampled1.fq out2=sampled2.fq samplereadstarget=1m

If your reads are overlapping, you can discover the insert size with BBMerge; if not, you'll need to use mapping.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode
7.9 years ago

seqtk sample with fixed seed should work for you. Take a look here:

Selecting Random Pairs From Fastq?

ADD COMMENT

Login before adding your answer.

Traffic: 2752 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6