Question

Is there a paired end joiner which writes also the reads before merging?

0

Entering edit mode

5.1 years ago

oscar.nvergara • 0

Hi, I have a weird question. But I'm looking for a software which not just merge the paired reads but also writes them in a new file.

When a software matches a pair of reads it writes them in the output fastq (or fasta), but you don't have the option of knowing what they paired. I've checked some of them but mostly they create a file with the unpaired reads as the set of the discarded ones. I've been trying to find one which would give an output like that

SAMPLE_MATCHED_FORWARD.fastq
SAMPLE_MATCHED_REVERSE.fastq
SAMPLE_JOINED.fastq

the first two files would contain the reads that are going to be merged, but have been already found their mate. Is there a software with an option like that? I know it sounds weird but it could help me to discriminate from a pool containing a lot of unwanted sequences from different sources.

thanks for your time.

Assembly software error sequence • 1.0k views

ADD COMMENT • link updated 5.1 years ago by Biostar 20 • written 5.1 years ago by oscar.nvergara • 0

1

Entering edit mode

Am I right that you like to have one with merged reads, and separate files for the forward and reverse reads of the pairs that could be merged?

If so I would first do the merge, extract the read names and use these read names the extract the corresponding forward and reverse reads from the original fastq files.

$ bbmerge.sh in1=<read1> in2=<read2> out=<merged reads> outu1=<unmerged1> outu2=<unmerged2>
$ seqkit seq -n -i > id_merged.txt
$ seqkit grep -f id_merged.txt <read1> | bgzip -c  > SAMPLE_MATCHED_FORWARD.fastq.gz
$ seqkit grep -f id_merged.txt <read2> | bgzip -c  > SAMPLE_MATCHED_REVERSE.fastq.gz

fin swimmer

ADD REPLY • link 5.1 years ago by finswimmer 16k