Question

Issue when merging FastQ

0

Entering edit mode

3.3 years ago

gayal25016 • 0

Hi, I have been merging 2 paired-end fastq files using

cat file1.fq.gz file2.fq.gz > outputfile.fq.gz

I then mapped the output file and all went fine. However when I try to transform the created sam file in a bam file I get this error:

[W::sam_read1] Parse error at line 23309928

[main_samview] truncated file.

So I watched at the actual line and everything seems ok and does look in the correct format. However, I think I know what makes it bug but I don't know why. Indeed, in the 3rd field (the chr/ref field) I do have "GL000214.1" as you can see below:

ERR1019070.25381935     16      GL000214.1      -1      16      100M    *       0       0       CACTATTATTCTCCAAATGATGCGTGCCTCCCTAGAGTCCAGGCTATCTGCATATCTAATTTTTCCCACAAATTACTGTTTTGAATTGCACTGAATTCAA    @C@DB?DFHAHHFGGEDGIG@IGGDFGGGIIIIIIIE?GHGH>G?GDFGHGDFGG<FHGEEHGIIGEH@EAGGCEC>777?CFFCE>>CACCDC3>@5>3    XS:i:1

I checked and this "GL000214.1" is declared in the header of my SAMfile tho'...

I did the same process on another dataset, and I do have the same bug caused (probably) by the same issue; the 3rd field contain accession "GLXXXXXXX" that it does not recognise ?

Do you know how can I by pass this or if I did merged it the wrong way ?

Cheers and thank you !

Fastq Merging mapping SAM BAM • 942 views

ADD COMMENT • link updated 3.2 years ago by Biostar 20 • written 3.3 years ago by gayal25016 • 0

score 0 · Answer 1 · 2021-01-18

0

Entering edit mode

3.3 years ago

GenoMax 141k

have been merging 2 paired-end fastq files using

Just to be sure, you are merging R1 and R2 files independently and in the same order e.g. cat file1_R1.gz file2_R1.gz .. and cat file1_R2.gz file2_R2.gz ..? You can't do cat file1_R1.gz file1_R2.gz.

ADD COMMENT • link 3.3 years ago by GenoMax 141k

0

Entering edit mode

I am doing that because I am using multiple mapping software, and one of them doesn't accept PE reads.. I was thinking that by doing cat file1_R1.gz file1_R2.gz > outfil1.gz , then map this file and then by removing duplicates it should be alright no ?

You think I have that error because it is not doable to merge R1-R2 using cat ?

ADD REPLY • link 3.3 years ago by gayal25016 • 0

0

Entering edit mode

I am not sure what aligner you are using. Perhaps it is paying attention to fastq headers and having R2 headers show up in `R1 files may be an issue for it.

I was thinking that by doing cat file1_R1.gz file1_R2.gz > outfil1.gz , tthen map this file and then by removing duplicates it should be alright no ?

Not sure what you mean by that? If your R1/R2 reads are able to merge/overlap you could simply use a program like bbmerge.sh to generate a single read representation.

ADD REPLY • link 3.3 years ago by GenoMax 141k

0

Entering edit mode

The issue is that, with my long reads, I cannot use overlap to merge PE into SE dataset, which is why I chose to use the cat method and not bbmerge. I'll try to make sure the both headers are in my merge dataset and let you know. Thanks a lot !

ADD REPLY • link 3.3 years ago by gayal25016 • 0

0

Entering edit mode

Indeed you should prefer merging PE into SE. Now about the cat thing, I always use zcat when working on gz compressed files.

ADD REPLY • link 3.2 years ago by Arsenal ▴ 160