Fetching Description And Accession Number From A Genbank Format Dna Sequence File Using Biojava
1
0
Entering edit mode
11.0 years ago
J.Ashley ▴ 10

Hello everyone

I am trying to fetch the accession number and description from a genbank formatted DNA sequence file. However I keep recieving this error

A Exception Has Occurred During Parsing. 
Please submit the details that follow to biojava-l@biojava.org or post a bug report to http://bugzilla.open-bio.org/ 

Format_object=org.biojavax.bio.seq.io.GenbankFormat
Accession=null
Id=null
Comments=Bad section
Parse_block=
Stack trace follows ....


    at org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:603)
    at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:278)
    at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
    ... 5 more
Caused by: java.lang.NullPointerException
    at org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:570)
    ... 7 more
org.biojava.bio.BioException: Could not read sequence
    at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
    at org.biojavax.bio.seq.io.RichStreamReader.nextSequence(RichStreamReader.java:92)
    at org.biojavax.bio.seq.io.RichStreamWriter.writeStream(RichStreamWriter.java:66)
    at org.biojavax.bio.seq.RichSequence$IOTools.writeFasta(RichSequence.java:1558)
    at org.biojavax.bio.seq.RichSequence$IOTools.writeFasta(RichSequence.java:1581)
    at hmwktest.main(hmwktest.java:40)
Caused by: org.biojava.bio.seq.io.ParseException:

Here is the code below

import org.biojava.bio.*; 
import org.biojava.bio.seq.io.*;
import org.biojava.bio.seq.*;
import org.biojavax.Namespace;
import org.biojavax.RichObjectFactory;
import org.biojavax.bio.BioEntry;
import org.biojavax.bio.seq.RichSequence;
import org.biojavax.bio.seq.RichSequence.IOTools;
import org.biojavax.bio.seq.RichSequenceIterator;

import java.io.*; 
import java.util.*; 
import javax.swing.JFileChooser;

public class test {
    private static JFileChooser ourChooser = new JFileChooser("."); 
    /** * Open a file through a FileChooser */ 
    public static BufferedReader openFile(){ 
        int retval = ourChooser.showOpenDialog(null); 
        BufferedReader br = null; if (retval == JFileChooser.APPROVE_OPTION)
        { 
            File file = ourChooser.getSelectedFile(); 
            try { br = new BufferedReader(new FileReader(file)); 
            } 
            catch (FileNotFoundException e) 
            { System.out.println("trouble reading "+file.getName());
            e.printStackTrace(); } } return br; 
            }          

     public static void main(String[] args) 
             throws
             BioException, IOException{ BufferedReader br = openFile(); 
             RichSequenceIterator it = IOTools.readFastaDNA(br, null);
             int count = 0;
             Namespace ns= RichObjectFactory.getDefaultNamespace();
             while (it.hasNext()){
                 count++; 
                 RichSequenceIterator seqs  = RichSequence.IOTools.readGenbankDNA(br, ns);
                 RichSequence.IOTools.writeFasta(System.out,seqs.accession,seqs.description,seqs,ns);
             } 
     }
}
genbank dna sequence biojava java • 4.4k views
ADD COMMENT
0
Entering edit mode

why not stand-alone BLAST? blastdbcmd can do this really quick if you have the database downloaded from NCBI.

ADD REPLY
0
Entering edit mode
11.0 years ago
Hamish ★ 3.2k

From your code you appear to be attempting to read fasta format entries and GenBank format entries from the same file, which may explain your error.

Assuming the input data is in the GenBank format, then the following code will read the file, parse the GenBank entries in to objects, and output the primary accession and description from the entry:

public static void main(String[] args) throws BioException, IOException {
    BufferedReader br = openFile();
    RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(br, null);
    while (seqs.hasNext()) {
        RichSequence seq = seqs.nextRichSequence();
        System.out.println(seq.getAccession());
        System.out.println(seq.getDescription());
    }
}

This has been tested with BioJava 1.7.1, but other versions of legacy BioJava should work as well.

ADD COMMENT

Login before adding your answer.

Traffic: 2485 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6