start codon gff annotation corrections using fasta file
0
1
Entering edit mode
8.1 years ago
rob234king ▴ 610

I have an annotation produced using maker with RNA-seq evidence but many gene models start with TTG or CTG rather than ATG ~2000 but almost all just have wrong start codon. Most in first exon have the ATG start codon a few bases downstream.

What I want to do is using just the gff3 of these sequences is to take the start CDS annotation line (have to account for if + or -) and search the genome fasta file for that window and find the next ATG and correct the start position of that first CDS feature and then the end of the 5' UTR.

For example using below test.gff I would take position 447 and 554 as first exon for + strand annotation and then search a fasta file. Does anyone know of a scripting way of doing this or already existing software to correct start codons?

test.gff

  chrom_1_extraction    maker   three_prime_UTR 2254    2320    .   +   .   ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:three_prime_utr;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1
    chrom_1_extraction  maker   five_prime_UTR  295 446 .   +   .   ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:five_prime_utr;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1 
    chrom_1_extraction  maker   transcript  295 2320    .   +   .   Name=maker-chrom_1-augustus-gene-0.156-mRNA-1;ID=maker-chrom_1-augustus-gene-0.156-mRNA-1;_AED=0.00;_eAED=0.00;_QI=152|0.8|0.83|1|0.8|0.66|6|67|516;Parent=maker-chrom_1-augustus-gene-0.156 
    chrom_1_extraction  maker   gene    295 2320    .   +   .   Name=maker-chrom_1-augustus-gene-0.156;ID=maker-chrom_1-augustus-gene-0.156 
chrom_1_extraction  maker   CDS 447 554 .   +   .   ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:cds;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1 
    chrom_1_extraction  maker   CDS 616 1002    .   +   .   ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:cds;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1 
chrom_1_extraction  maker   CDS 1050    1755    .   +   .   ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:cds;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1 
    chrom_1_extraction  maker   CDS 1803    1903    .   +   .   ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:cds;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1 
chrom_1_extraction  maker   CDS 1955    2054    .   +   .   ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:cds;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1 
    chrom_1_extraction  maker   CDS 2105    2253    .   +   .   ID=maker-chrom_1-augustus-gene-0.156-mRNA-1:cds;Parent=maker-chrom_1-augustus-gene-0.156-mRNA-1

Fasta file

>chrom_1_extraction
CAACATTGATATCATCAGCAACCTAAGTAGCGGTGAACATGAGACGTACAGTGCGATACGTAGTTGACTGCTTAAACAAGATTGGCTTTTGTTGCAGGGAAGCCTTGCTTCATGATGCTTTTCTGTTAATAGATAATTCTAGAACAGTGTCTTCTAAAGCTCAGCTACCCTATGGCTATGACTTGTTGGATTATAGCCAATCACACAAGCCAAACTACCTAGTCTAGACTAGCGGAGAGGTTTTAGCGTACGTATCCTTGGCTTCCCCGCTATTGCCTTGTTTGCCTGTGTTATCTACCTCACATTTACGCCTGCATGTTACAACATCAGAACTACAGTCGCTTGGCATCTTGCACTTATGAAGCCAGTGAAATGCTGTACCACTTGTCGCCGCCGACACAGGAGATGCGTCACTCAGCCGGGAGCCTCTCAATGTAGCACTTGCCTTGAGTCAGGGCAGGAATGCCAATTCGACAATGACATTCGGTTCAAGCATAGTCATTCAAAAACTGAGAAGCAGTCAAGGAGAGAATGGGCTAAAGTTCCCTCTAAGAGTAAGTCCAAAGCCTGCTTGGCCTCTGGCCTTCAACCTTTCTGTGTATTTTCATGCTGAAGGCTGTAGTCTCTTTCACAGCACCACGCGGAATCGACGGTATGGTGTTGGAGGAATCAGGTAGTAACTCTACAAAGGACGAAGCTGCCAAGAACCTGTCGCAAACTGCGCAGATACCCGAAGAAATGATAGAAGTATCTGTTAATGATCTTGCACAATCTGGACCTCAAGTTGTGCCCTTGAACCCGGAACTCGACTACAGGAAATCAGCTTCCAACTTCATCGCCAACTCTTTAGTTGATGACCCTTCAGTCCGTGACTCTGATGAATATTTCGACCATGCTACAGCCCAAGACCTACCGGTTCAAGTTCATATTTCATCTCCATATGAGTTGACTGAACGAGAAGCCTTTCTTTTCATGATCTATATTTACAAATGTGCACCCTTGGTAAGTTACAGCTGTCAGATGTCGTCTCCACTAACATGACATTTAAGTCTGATGCATGTGACGATGCCCGTCATTTCGAACTCGAAGTTCCCCGATTGGCCCTTCGCCAACCCATGATAATGAACGGTCTACTCGCCCTCGCAAGCCGCTACGATTCTCGATGCATGGACACGTCCAACGACATTGAAAGCACATTTTACCACAATAAATGCATAAAGCTTCTTATAGAAGCTTTTGCTCAACCCCCTGAAACATGGGACTCAACGCTCCTTACAGCCGTTGTAATCGCGCGACTGTATGAGGAGAACGATAACGAGACTGATTCCTATTACCATCATCTCAGTGGAACGCAGAACCTTCTGAATCATGAGGCAGTCGCTAGGTTTGTGATACAGGGGGGATTAGCTGAAGCTGCAAGTTGGGTTCATCTTCGACAAGTAATCTACATCTACGTAGTGCGCAGGAGGCCTATCGAGATATGCCTTGAGAGCTTTGAGAGGTCAACTGTGTTTAGAAGATGTGACGATTCAGCATATGCGAACAGAGCGGTCTATAACTTCGCCAAGATTATGAGGCTATTTCTACAAGTTGAAAATTTGGACAGTGATCAAGACGAGTGGCAGGCAGCTGAGATGGAGGTAGACCGGTGGTATGACGCTAAGCCCGTATCTTTTCAACCTGTATTTCATATTTTGGCGGACCTCTCGGCAAACAGACCGTTCCCGACCCTTTACTTCATTGCATCAGTGCCCGGTAAGTGTGACTTTCAGCTGCTGTGCTCTCTCGCTAACATATCGGAGTCGTTGCAATGCAGTATTACTTCGCAGCCAAGGCCGTTTTATATTTGCATCATTGTAAGAACTTGCAGCAACTGAATAACCATGGAAGGCCAGACTTTGAAGTATTTTGAAGGACTACTCTCGTCAAAGATAACACACTAATTAATTGCCAGACCAAGATATCCTTCTATCTCTTCACTCTCATGGGTCTTGCTCTATCCAACTCCCATGTTCTAAACGCATTTTACCTACCTGCACATATGCTTTCATTCTGTACAGTCATCCCCCACCCCTAATGTGACAATGGCTGCTAACCCTTGTAGGTGGATATTGCATAAGAGACCCATGTGAACAGACCCATGCCATTTGTTACCTTGAGAAGGTTAACGAAGTGATTAAGTGGAAGACAAAGGAACTTATTGCAACGCTGAAGGAAAAATGGCATGATGGAGAGAAACATGATTCTCACTAATGGGCCCTCTCTGTTATATAAAAATAGTTCATCAATAAACTGCAAAGGTAGAATTATAAATGGCGCAGAATGGATATCCTGTAAGTGAAACTTTATGATGGAGTTTTGTAATTAATGAGACTTGTGGCCTTGAAGAAATGTCTTTTCTTTTTACTGTCGAATTTTAGTAATACTATAGCTAGGACCATCATTTTTATTCACTAAGAAAGATAACTCGCTAACACATAAGAAAAGGCCAATTATTTTAATTTATCCCTATC
gff3 annotation • 3.1k views
ADD COMMENT
0
Entering edit mode

Hi, did you solve the problem? would you mind to share the solution, pls? Thanks....

ADD REPLY

Login before adding your answer.

Traffic: 1523 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6