Biostar Beta. Not for public use.
biojava get qualityscores of sequence
1
Entering edit mode
16 months ago
t-jim • 30

Hello,

I'm trying to parse a fastq file with biojava and I need to get the quality score for every base of each sequence. So far I got this:

        FastqReader fastqReader = new SangerFastqReader(); 
        List<DNASequence> sequences = new LinkedList<DNASequence>();
        File in = new File("fastqfile.fastq");
        fastqReader.read(in);
         for (Fastq fastq : fastqReader.read(in)) {
             DNASequence test = FastqTools.createDNASequenceWithQualityScores(fastq);
             sequences.add(test); 
         }
         for(DNASequence seq : sequences) {
            String sequence = seq.getSequenceAsString();
            /*get score sequence*/
         }

I looked through the API and I know that the score is stored as a QualityFeature in the DNASequence but I can't figure out how to get it. I would appreciate your help.

ADD COMMENTlink
0
Entering edit mode
16 months ago
WCIP | Glasgow | UK

Maybe you are making things more complicated than they need in your code. Wouldn't this work?

FastqReader fastqReader = new SangerFastqReader(); 
File in = new File("fastqfile.fastq");
fastqReader.read(in);
for (Fastq fastq : fastqReader.read(in)) {
    String qual = fastq.getQuality();
    for(int i= 0; i < qual.length(); i++ ) {
        int q= (int)(qual.charAt(i)) - 33;
        System.err.println(q);
    }
}
ADD COMMENTlink
0
Entering edit mode

I have already tried that. This gives me the score in ASCII characters but it want them as numbers. I use the createDNASequenceWithQualityScores() methode because it returns a DNASequence object, converts the ASCII score into numbers and stores it in the object. I just need to figure out how to access the score.

ADD REPLYlink
0
Entering edit mode

See edit... Basically, convert ASCII to decimal and from there to quality score using the appropriate offset. Here I use -33 to produce Sanger scores. I don't think it is possible to always decide unambiguously, i.e. automatically, what offset should be used although after reading a few sequencing one should be able to tell what encoding has been used.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1