Question

SolexaQA++ lengthsort - c output

0

Entering edit mode

6.6 years ago

gtho123 ▴ 260

I am looking for advice on the output of SolexaQA++ lengthsort - c when preprocessing my RNA-Seq data.

Having already used SolexaQA++ dynamictrim to trim by read quality I then sought to remove any short reads which resulted using SolexaQA++ lengthsort and as the data is paired end I used the -c flag.

Here are the relevant lines from my bash script:

path1="/PATH/TO/trim/Sample1_R1"
path2="/PATH/TO/trim/Sample1_R2"
SolexaQA++ lengthsort -c -l 36 -d "/PATH/TO/sort" $path1$".fastq.trimmed.gz", $path2$".fastq.trimmed.gz"

I expected there to be six resulting files; paired-end, singleton and discard for each input file (R1 and R2). However what was produced was just two Sample1_R2.trimmed.gz.clean and Sample1_R2.trimmed.gz.paired. What happened to R1?

Has something gone wrong? if so how? and if not what do these files contain?

EDIT:

If it helps the input files are trimmed FASTQ files. Here is the top 8 lines of Sample1_R1.fastq.trimmed when unzipped.

@HWI-7001326F:29:C732HANXX:8:1101:1258:1926 1:N:0:ATCACGAT
TCATGAGAAAAGGAACTCCGTCTCATCTGGCATTGCCAATAAAC
+
FFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@HWI-7001326F:29:C732HANXX:8:1101:1457:1985 1:N:0:ATCACGAT
CAACAACTTTGAAGGGTCTTGAAAGGGCAGGTAGTCCTCTAACTGAAGATTTCTCAACTCTAAAAGGAGTTGGTTTCAAACTCACAGAAGCCATAACTGAAGAGATCGGAAGAGCACACG
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFB

The output files both appear to be empty.

This was the end of the terminal output:

...    
Cleaned from read 2: @HWI-7001326F:29:C732HANXX:8:2316:20703:101335
Cleaned from read 2: @HWI-7001326F:29:C732HANXX:8:2316:20624:101360
Cleaned from read 2: @HWI-7001326F:29:C732HANXX:8:2316:20904:101266
Cleaned from read 2: @HWI-7001326F:29:C732HANXX:8:2316:20879:101307
Cleaned from read 2: @HWI-7001326F:29:C732HANXX:8:2316:20776:101348
Cleaned from read 2: @HWI-7001326F:29:C732HANXX:8:2316:20940:101369
Paired reads were written to:
/PATH/TO/sort/.clean
/PATH/TO/sort/C732HANXX-1721-01-01-01_L008_R2.fastq.trimmed.gz.clean

100% [==================================================]
Writing files...

Why has this happened?

next-gen sequencing • 2.3k views

ADD COMMENT • link updated 5.0 years ago by Biostar 20 • written 6.6 years ago by gtho123 ▴ 260

0

Entering edit mode

Perhaps you could tell us what they contain? Particularly, the first 8 lines of each file would be helpful, as would the number of input reads and the number of reads in each output file... and of course anything the program printed to the screen.

ADD REPLY • link 6.6 years ago by Brian Bushnell 20k

score 0 · Answer 1 · 2017-09-20

Well, I'm not really sure what SolexaQA++ is doing or why it's producing blank output files. But I would suggest that you try BBDuk for quality-trimming and removing short reads, like this (adjusting parameters as desired):

bbduk.sh in1=read1.fq.gz in2=read2.fq.gz out1=trimmed1.fq.gz out2=trimmed2.fq.gz qtrim=r trimq=10 minlen=36

You should adapter-trim prior to quality-trimming, though.