Tool: Exon file creation for GlimmerHMM training set
1
0
Entering edit mode
7.4 years ago

I want to carry out gene prediction for fungus Cochliobolus sativus isolated strain. As there is no fungal training model available in GlimmerHMM, I am creating one using C.sativus ND90Pr, C.victoriae, C.miyabeanus ATCC 44560 v1.0, C.lunatus, C.heterostrophus, C.carbonum genomic data from JGI. When I execute trainGlimmerHMM <multifasta_file> <exon_file> I get an error for specific lines in my dummy exon file. According to my observation, the error occurs for reverse strand lines only. As mentioned in its README file, I have separated them with a blank line and also mentioned the co-ordinates in descending order. I get an error ERROR 27: Wrong exon coordinates file. Exon file line: scaffold_0 exon 3002 2420

Below is the dummy exon file

scaffold_0 3002 2420
scaffold_0 2422 2420
scaffold_0 3933 3078
scaffold_0 4219 3995
scaffold_0 4304 4267
scaffold_0 4397 4357
scaffold_0 4699 4450
scaffold_0 5213 5115
scaffold_0 5575 5264
scaffold_0 5724 5633
scaffold_0 5812 5778
scaffold_0 5921 5864
scaffold_0 5921 5919

scaffold_0 6144 6190
scaffold_0 6144 6146
scaffold_0 6247 6394
scaffold_0 6452 6598
scaffold_0 6596 6598

scaffold_0 7222 7310
scaffold_0 7222 7224
scaffold_0 7365 7461
scaffold_0 7526 7927
scaffold_0 7925 7927

scaffold_0 8253 9230
scaffold_0 8253 8255
scaffold_0 9228 9230

If I run the 'train' command only for forward strand exon co-ordinates, training set is created successfully. Can anyone please point out where I am going wrong?

gene prediction GlimmerHMM training set • 2.9k views
ADD COMMENT
0
Entering edit mode
7.4 years ago

Can you check the length of scaffold_0 in multifasta_file file?

ADD COMMENT
0
Entering edit mode

The length of scaffold_0 is 870365 bases.

ADD REPLY
0
Entering edit mode

Error is generated most probably from this file: https://sourceforge.net/u/djinnome/jamg/ci/85b33b51b8ccdd6eadc8f5c7b8155baa119f4af4/tree/3rd_party/GlimmerHMM/train/trainGlimmerHMM

Search for ERROR 27: Wrong exon coordinates file. Exon file line I am not very good at perl so can't say much but my ($anum,$ex1,$ex2)=/^(\S+)\s*([\>|\<]*\d+)\s*([\>|\<]*\d+\s*)$/;

In this line either anum or ex1 or ex2 has not been set properly.

Hope it helps somehow.

ADD REPLY

Login before adding your answer.

Traffic: 1963 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6