Eukaryotic gene prediction using GeneMark-ES
1
0
Entering edit mode
6.2 years ago

I am using GeneMark-ES Suite version 4.33 for Eukaryotic gene prediction for plant genome and I am kind of stuck

command

perl gmes_petap/gmes_petap.pl --evidence protein.fa  --cores 40 --sequence genome_assembly.fa --ET transcripts.gff

input files

protein.fa = a multi-fasta file having amino acid sequences from a closely related plant

genome_assembly.fa = genome assembly multi-fasta file having scaffold sequences for which I want to predict the genes

transcripts.gff = gff file for transcript sequences


error message

error, unexpected format found on line: >prot.1

error on call: /gmes_petap/reformat_gff.pl --out data/evidence.gff  --trace info/dna.trace  --in protein.fa  --quiet

I think I am providing a wrong file in the --evidence parameter as shown below

--ET           [filename]; to run training with introns coordinates from RNA-Seq read alignments (GFF format)
--evidence     [filename]; to use in prediction external evidence (RNA or protein) mapped to genome

What could (RNA or protein) mapped to genome possibly mean? Any ideas?

gene prediction genemark • 3.6k views
ADD COMMENT
0
Entering edit mode
6.2 years ago

Not an expert in using GeneMark myself but I'm guessing it assumes some kind of alignment file (in gff format) of the proteins to the genome, and thus not a protein fasta file (which you apparently are providing)

ADD COMMENT
0
Entering edit mode

An alignment file in gff format sounds alien to me.

ADD REPLY
0
Entering edit mode

well, I meant alignment as in 'HSP coordinates of aligned proteins to the genome' ( obtained by using eg. blast (not recommended), GenomeThreader, GeneWise ... ). apologies for the brevity .

ADD REPLY

Login before adding your answer.

Traffic: 2420 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6