GeMoMa AnnotationFinalizer: java.lang.NumberFormatException: For input string: "7180000819398"
1
0
Entering edit mode
4.9 years ago
Rohith B S • 0

Issue with GeMoMa AnnotationFinalizer

I was running GeMoMa to predict genes/proteins and annotate my plant genome assembly (repeat masked). But the last step where AnnotationFinalizer module in GeMoMa throws the following error:

Error:

starting AnnotationFinalizer
java.lang.NumberFormatException: For input string: "7180000819398"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Integer.parseInt(Integer.java:583)
        at java.lang.Integer.parseInt(Integer.java:615)
        at projects.gemoma.AnnotationFinalizer$SequenceIDComparator.extractInt(AnnotationFinalizer.java:410)
        at projects.gemoma.AnnotationFinalizer$SequenceIDComparator.compare(AnnotationFinalizer.java:400)
        at projects.gemoma.AnnotationFinalizer$SequenceIDComparator.compare(AnnotationFinalizer.java:1)
        at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
        at java.util.TimSort.sort(TimSort.java:234)
        at java.util.Arrays.sort(Arrays.java:1438)
        at projects.gemoma.AnnotationFinalizer.run(AnnotationFinalizer.java:488)
        at projects.gemoma.GeMoMaPipeline$JAnnotationFinalizer.doJob(GeMoMaPipeline.java:1466)
        at projects.gemoma.GeMoMaPipeline$FlaggedRunnable.run(GeMoMaPipeline.java:917)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Resolution Attempts

  1. I verified whether filtered_prediction.gff has an integer in the first column. but NO. I see jcf7180000819398.

  2. I tried other tools like gffread to convert gff to gtf, hence to use get_sequence_from_gtf.pl from GeneMark to get the sequence.

  3. Secondly, I tried getAnnoFasta.pl from Augustus (Partially works but annotations are nowhere available in the fasta, also no protein sequences).

  4. Thirdly, played around with rtracklayer(failed) and bedtools(gave fasta but again unreliable).

Please help with some leads.
Thank you in advance.

genome annotation GeMoMa AnnotationFinalizer • 1.5k views
ADD COMMENT
0
Entering edit mode

Welcome to Biostars and thank you for the contribution! Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLY
1
Entering edit mode

I don't know this tool, but the problem is that it's trying to parse that value to a 32-bit integer, which has a maximum value of 2147483647.

ADD REPLY
0
Entering edit mode

Thank you. I will do the same from next time.

ADD REPLY
0
Entering edit mode
4.9 years ago
Rohith B S • 0

I found out from the developers that, the tool tries to sort the input based on the numeric value in the scaffolds/contigs while doing that they were typecasting to integers. Hence the issue was caused. They mentioned that this will be fixed in the next release.

We need to use the tools version above 1.6.0.

tpoterba Thank you for your help.

ADD COMMENT

Login before adding your answer.

Traffic: 2258 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6