splitting fastq files output size issue

0

Entering edit mode

7.5 years ago

adrilu.romero • 0

I have a 22 Gb file that has paired end reads merged. In other words, in a single file, I have all R1 and R2s. After splitting the file to get 2 files (to input to Trinity), one in which I have all R1s and another one will all R2s, why are my single files much smaller than the original file? R1 file is 1.22 gb and R2 file is also 1.22 gb. Thanks!

fastq trinity paired end • 2.1k views

ADD COMMENT • link 7.5 years ago by adrilu.romero • 0

0

Entering edit mode

How did you split the file?

ADD REPLY • link 7.5 years ago by Eric Lim ★ 2.1k

0

Entering edit mode

using:

paste - - - - < test.fq \ | tee >(awk 'BEGIN{FS="\t"; OFS="\n"} {if (match($1, " 1:N")) print $1,$2,$3,$4}' > test.r1.fq ) \ | awk 'BEGIN{FS="\t"; OFS="\n"} {if (match($1, " 2:N")) print $1,$2,$3,$4}' > test.r2.fq

ADD REPLY • link 7.5 years ago by adrilu.romero • 0

0

Entering edit mode

I think that's unnecessarily complicated. Have you tried some of the methods from Fastq Splitter For Paired End Reads

ADD REPLY • link 7.5 years ago by Eric Lim ★ 2.1k

1

Entering edit mode

I think that's unnecessarily complicated

yup

  rm -f R1.fq R2.fq && cat R0.fq  | awk '(NR%4==1) { out = (index($0,"1:N")?"R1.fq":"R2.fq");} { print >> out;}'

are you sure ALL your reads contain 1:N or 2:N ?

ADD REPLY • link 7.5 years ago by Pierre Lindenbaum 161k

Login before adding your answer.