fastq file from fast-dump -split-3 contains no quality score info
1
0
Entering edit mode
8.6 years ago
colonppg ▴ 120

Folks:

I ran fastdump split-3 and got fastq files

when I run tophat:

tophat-2.0.9.Linux_x86_64/tophat \
  -r 150 \
  -p 3 \
  --solexa1.3-quals \
  --fusion-search \
  -o ./tmp \
  --GTF ./genomes/human/annotation/Homo_sapiens.GRCh37.68.fix.gtf \
  ./genomes/human/hg19 \
  ./SRR1265510_1.fastq ./SRR1265510_2.fastq &

Got error message:

Error running 'prep_reads'
terminate called after throwing an instance of 'int'

I was surprised to find the .fastq files I got from split-3 does not have qc format information...

head SRR1265510_1.fastq
@SRR1265510.1 1 length=101
AGGGCATCTCTGGGAAAGGACCTGGGGCTGGTGAGGGGCCCGGAGGAGCCTTTGCCCGCGTGTCAGACTCCATCCCTCCTCTGCCGCCACCGCAGCAGCCC
+SRR1265510.1 1 length=101
@CCFFD?DHHHHGIGIIIIFHI@HICGGGHIDGGDHAEF6@FG@BE1??B;@CCAAC>>B8?&844::(4@ACAABCCBCC:4:@>99<525&&0&2?BB#
@SRR1265510.2 2 length=101
CGCAAGGGCATCTCTGGGAAAGGACCTGGGGCTGGTGACGGGCCCGGAGGAGCCTTTGCCCGCGTGTCAGACTCCATCCCTCCTCTGCCGCCACCGCAGCA
+SRR1265510.2 2 length=101
CCCFFFFFHHHHHJJJJJJJJJJIJJIJJJJJJJJAFHIJJIIJIIHHFDDDDDDDDDDDDDDD>B>BCDDDDDDCDCDDDDBACCCCCBDDDDDDDDDDB

What could be the cause of this? Shall I run fastq-dump again using --split-files?

fastq-dump tophat • 2.8k views
ADD COMMENT
0
Entering edit mode

What do you mean by "does not have qc format information"? The lines in bold below are the quality score lines.

@SRR1265510.1 1 length=101
AGGGCATCTCTGGGAAAGGACCTGGGGCTGGTGAGGGGCCCGGAGGAGCCTTTGCCCGCGTGTCAGACTCCATCCCTCCTCTGCCGCCACCGCAGCAGCCC
+SRR1265510.1 1 length=101
@CCFFD?DHHHHGIGIIIIFHI@HICGGGHIDGGDHAEF6@FG@BE1??B;@CCAAC>>B8?&844::(4@ACAABCCBCC:4:@>99<525&&0&2?BB#
@SRR1265510.2 2 length=101
CGCAAGGGCATCTCTGGGAAAGGACCTGGGGCTGGTGACGGGCCCGGAGGAGCCTTTGCCCGCGTGTCAGACTCCATCCCTCCTCTGCCGCCACCGCAGCA
+SRR1265510.2 2 length=101
CCCFFFFFHHHHHJJJJJJJJJJIJJIJJJJJJJJAFHIJJIIJIIHHFDDDDDDDDDDDDDDD>B>BCDDDDDDCDCDDDDBACCCCCBDDDDDDDDDDB

ADD REPLY
0
Entering edit mode

Thanks for your reply, what I mean is it lacks QC encoding info as the ################ section usually contains that info... this cause tophat stop working

+SRR1265510.1 1 length=101 #########################
@CCFFD?DHHHHGIGIIIIFHI@HICGGGHIDGGDHAEF6@FG@BE1??B;@CCAAC>>B8?&844::(4@ACAABCCBCC:4:@>99<525&&0&2?BB#
ADD REPLY
0
Entering edit mode
8.6 years ago
colonppg ▴ 120
tophat-2.0.9.Linux_x86_64/tophat -r 150 -p 3 -o ./tmp --GTF ./genomes/human/annotation/Homo_sapiens.GRCh37.68.fix.gtf ./genomes/human/hg19 ./SRR1265510_1.fastq ./SRR1265510_2.fastq &

tophat started working without the --solexa1.3, I guess this is probably not good without considering quality score coding...

ADD COMMENT

Login before adding your answer.

Traffic: 2204 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6