I have an annotation produced using maker with RNA-seq evidence but many gene models start with TTG or CTG rather than ATG ~2000 but almost all just have wrong start codon. Most in first exon have the ATG start codon a few bases downstream.
What I want to do is using just the gff3 of these sequences is to take the start CDS annotation line (have to account for if + or -) and search the genome fasta file for that window and find the next ATG and correct the start position of that first CDS feature and then the end of the 5' UTR.
For example using below test.gff I would take position 447 and 554 as first exon for + strand annotation and then search a fasta file. Does anyone know of a scripting way of doing this or already existing software to correct start codons?
test.gff
chrom_1_extraction maker three_prime_UTR 2254 2320 . + . ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:three_prime_utr;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1
chrom_1_extraction maker five_prime_UTR 295 446 . + . ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:five_prime_utr;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1
chrom_1_extraction maker transcript 295 2320 . + . Name=maker-chrom_1-augustus-gene-0.156-mRNA-1;ID=maker-chrom_1-augustus-gene-0.156-mRNA-1;_AED=0.00;_eAED=0.00;_QI=152|0.8|0.83|1|0.8|0.66|6|67|516;Parent=maker-chrom_1-augustus-gene-0.156
chrom_1_extraction maker gene 295 2320 . + . Name=maker-chrom_1-augustus-gene-0.156;ID=maker-chrom_1-augustus-gene-0.156
chrom_1_extraction maker CDS 447 554 . + . ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:cds;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1
chrom_1_extraction maker CDS 616 1002 . + . ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:cds;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1
chrom_1_extraction maker CDS 1050 1755 . + . ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:cds;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1
chrom_1_extraction maker CDS 1803 1903 . + . ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:cds;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1
chrom_1_extraction maker CDS 1955 2054 . + . ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:cds;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1
chrom_1_extraction maker CDS 2105 2253 . + . ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:cds;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1
Fasta file
>chrom_1_extraction
CAACATTGATATCATCAGCAACCTAAGTAGCGGTGAACATGAGACGTACAGTGCGATACGTAGTTGACTGCTTAAACAAGATTGGCTTTTGTTGCAGGGAAGCCTTGCTTCATGATGCTTTTCTGTTAATAGATAATTCTAGAACAGTGTCTTCTAAAGCTCAGCTACCCTATGGCTATGACTTGTTGGATTATAGCCAATCACACAAGCCAAACTACCTAGTCTAGACTAGCGGAGAGGTTTTAGCGTACGTATCCTTGGCTTCCCCGCTATTGCCTTGTTTGCCTGTGTTATCTACCTCACATTTACGCCTGCATGTTACAACATCAGAACTACAGTCGCTTGGCATCTTGCACTTATGAAGCCAGTGAAATGCTGTACCACTTGTCGCCGCCGACACAGGAGATGCGTCACTCAGCCGGGAGCCTCTCAATGTAGCACTTGCCTTGAGTCAGGGCAGGAATGCCAATTCGACAATGACATTCGGTTCAAGCATAGTCATTCAAAAACTGAGAAGCAGTCAAGGAGAGAATGGGCTAAAGTTCCCTCTAAGAGTAAGTCCAAAGCCTGCTTGGCCTCTGGCCTTCAACCTTTCTGTGTATTTTCATGCTGAAGGCTGTAGTCTCTTTCACAGCACCACGCGGAATCGACGGTATGGTGTTGGAGGAATCAGGTAGTAACTCTACAAAGGACGAAGCTGCCAAGAACCTGTCGCAAACTGCGCAGATACCCGAAGAAATGATAGAAGTATCTGTTAATGATCTTGCACAATCTGGACCTCAAGTTGTGCCCTTGAACCCGGAACTCGACTACAGGAAATCAGCTTCCAACTTCATCGCCAACTCTTTAGTTGATGACCCTTCAGTCCGTGACTCTGATGAATATTTCGACCATGCTACAGCCCAAGACCTACCGGTTCAAGTTCATATTTCATCTCCATATGAGTTGACTGAACGAGAAGCCTTTCTTTTCATGATCTATATTTACAAATGTGCACCCTTGGTAAGTTACAGCTGTCAGATGTCGTCTCCACTAACATGACATTTAAGTCTGATGCATGTGACGATGCCCGTCATTTCGAACTCGAAGTTCCCCGATTGGCCCTTCGCCAACCCATGATAATGAACGGTCTACTCGCCCTCGCAAGCCGCTACGATTCTCGATGCATGGACACGTCCAACGACATTGAAAGCACATTTTACCACAATAAATGCATAAAGCTTCTTATAGAAGCTTTTGCTCAACCCCCTGAAACATGGGACTCAACGCTCCTTACAGCCGTTGTAATCGCGCGACTGTATGAGGAGAACGATAACGAGACTGATTCCTATTACCATCATCTCAGTGGAACGCAGAACCTTCTGAATCATGAGGCAGTCGCTAGGTTTGTGATACAGGGGGGATTAGCTGAAGCTGCAAGTTGGGTTCATCTTCGACAAGTAATCTACATCTACGTAGTGCGCAGGAGGCCTATCGAGATATGCCTTGAGAGCTTTGAGAGGTCAACTGTGTTTAGAAGATGTGACGATTCAGCATATGCGAACAGAGCGGTCTATAACTTCGCCAAGATTATGAGGCTATTTCTACAAGTTGAAAATTTGGACAGTGATCAAGACGAGTGGCAGGCAGCTGAGATGGAGGTAGACCGGTGGTATGACGCTAAGCCCGTATCTTTTCAACCTGTATTTCATATTTTGGCGGACCTCTCGGCAAACAGACCGTTCCCGACCCTTTACTTCATTGCATCAGTGCCCGGTAAGTGTGACTTTCAGCTGCTGTGCTCTCTCGCTAACATATCGGAGTCGTTGCAATGCAGTATTACTTCGCAGCCAAGGCCGTTTTATATTTGCATCATTGTAAGAACTTGCAGCAACTGAATAACCATGGAAGGCCAGACTTTGAAGTATTTTGAAGGACTACTCTCGTCAAAGATAACACACTAATTAATTGCCAGACCAAGATATCCTTCTATCTCTTCACTCTCATGGGTCTTGCTCTATCCAACTCCCATGTTCTAAACGCATTTTACCTACCTGCACATATGCTTTCATTCTGTACAGTCATCCCCCACCCCTAATGTGACAATGGCTGCTAACCCTTGTAGGTGGATATTGCATAAGAGACCCATGTGAACAGACCCATGCCATTTGTTACCTTGAGAAGGTTAACGAAGTGATTAAGTGGAAGACAAAGGAACTTATTGCAACGCTGAAGGAAAAATGGCATGATGGAGAGAAACATGATTCTCACTAATGGGCCCTCTCTGTTATATAAAAATAGTTCATCAATAAACTGCAAAGGTAGAATTATAAATGGCGCAGAATGGATATCCTGTAAGTGAAACTTTATGATGGAGTTTTGTAATTAATGAGACTTGTGGCCTTGAAGAAATGTCTTTTCTTTTTACTGTCGAATTTTAGTAATACTATAGCTAGGACCATCATTTTTATTCACTAAGAAAGATAACTCGCTAACACATAAGAAAAGGCCAATTATTTTAATTTATCCCTATC
Hi, did you solve the problem? would you mind to share the solution, pls? Thanks....