Entering edit mode
7.5 years ago
adrilu.romero
•
0
I have a 22 Gb file that has paired end reads merged. In other words, in a single file, I have all R1 and R2s. After splitting the file to get 2 files (to input to Trinity), one in which I have all R1s and another one will all R2s, why are my single files much smaller than the original file? R1 file is 1.22 gb and R2 file is also 1.22 gb. Thanks!
How did you split the file?
using:
paste - - - - < test.fq \ | tee >(awk 'BEGIN{FS="\t"; OFS="\n"} {if (match($1, " 1:N")) print $1,$2,$3,$4}' > test.r1.fq ) \ | awk 'BEGIN{FS="\t"; OFS="\n"} {if (match($1, " 2:N")) print $1,$2,$3,$4}' > test.r2.fq
I think that's unnecessarily complicated. Have you tried some of the methods from Fastq Splitter For Paired End Reads
yup
are you sure ALL your reads contain 1:N or 2:N ?