ValueError("End of file without quality information.") when using SeqIO
1
1
Entering edit mode
4.9 years ago
Arko ▴ 30

Hi! I'm trying to read in a fastQ file and carry out some operations on it (Demultiplexing the file into various fastq files based on exact matching barcodes) Midway through the fastQ file, I'm getting the following error :

ValueError: End of file without quality information.

 

try:
for name, seq, qual in FastqGeneralIterator(open(input_file)):
    key = name + '\n' + seq + '\n' + qual
    if key not in unmap.keys():
        unmap[key]=False
    header = name.split(":")[4]
    end_bc = header.split("#")[1]
    seq_barcode = end_bc.split("/")[0][10:]
    if seq_barcode in barcode:
        count = count + 1
        f.write("@{}\n{}\n+\n{}\n".format(name,seq,qual))

I have no idea on why it's failing, I don't think there's any issue with the fastQ file but if there was, I have no idea on how to check since the file is massive and the error doesn't specify where it fails.

This is the exact traceback :

>     for name, seq, qual in FastqGeneralIterator(open(input_file)):
  File "/home/software/python/python-3.6.4/lib/python3.6/site-packages/Bio/SeqIO/QualityIO.py", line 914, in FastqGeneralIterator
    raise ValueError("End of file without quality information.")
ValueError: End of file without quality information
biopython fastq FastqGeneralIterator SeqIO python • 2.0k views
ADD COMMENT
0
Entering edit mode

What is the output of:

tail <fastq_file>

(where <fastq_file> is the file being read by the python script)

ADD REPLY
0
Entering edit mode
6AAAAEEEEEEEEEEEEEEEE////A
@NS500496_727_H373LBGXB:2:23203:12170:5855#AGAATAAAAGAGTGAT/1
GAGGAAGTTCCAGCCAAGGAGATTGA
+NS500496_727_H373LBGXB:2:23203:12170:5855#AGAATAAAAGAGTGAT/1
AAAAAEEAEE/AEEAAEEAEE//E/<
@NS500496_727_H373LBGXB:2:23203:1526:5855#ACATCATCCCGAGTGG/1
CAGAAACACCAGGATCCCATGATTGA
+NS500496_727_H373LBGXB:2:23203:1526:5855#ACATCATCCCGAGTGG/1
AAAAAEEEEEEEEEEEEAEEE///EE
ADD REPLY
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLY
0
Entering edit mode
if key not in unmap.keys():

You'll instead want if key not in unmap:, which is O(1) rather than O(n) in terms of performance.

ADD REPLY
0
Entering edit mode

Thanks for the tip!

ADD REPLY
2
Entering edit mode
4.9 years ago
Joe 21k

do you really need to get your hands dirty in the 'bowels' of BioPython like that?

Why not just use SeqIO.parse() directly as an iterator?

Since you're iterating your file, you can just throw some print statements in to print the sequence name/header and find out how far the iteration got before it choked on the quality line. Then go Ctrl-F/grep that header in the file and checkout the neighbourhood for anything funky going on.

ADD COMMENT
0
Entering edit mode

Good idea, I shall have a look at it!

ADD REPLY
1
Entering edit mode

It should be sufficient to do something like:

from Bio import SeqIO

for record in SeqIO.parse('/path/to/input.fastq', 'fastq'):
    # do stuff with the record object e.g.
    # Description is safer for long headers, but as there's no whitespace, .id and .description is actually the same
    barcode = record.description[record.description.index('#')+1:-2]
etc.
    qualities = record.letter_annotations.values()

etc.

ADD REPLY

Login before adding your answer.

Traffic: 2384 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6