Entering edit mode
4.8 years ago
marcos.sep.ro
▴
10
While I was parsing all annotation records from the Pongo abelii (Sumatran orangutan) genome I realized that there are some annotations repeated, same location, strand, even note tag, but different (in some cases) product tag, here's an example from a log:
Species: PONAB | 42202927:42211982(+) | Product: zinc finger protein 155 isoform X1 | Note: By Gnomon.
Species: PONAB | 42202927:42211982(+) | Product: zinc finger protein 155 isoform X1 | Note: By Gnomon.
Species: PONAB | 42202927:42211982(+) | Product: zinc finger protein 155 isoform X1 | Note: By Gnomon.
Species: PONAB | 42202927:42211982(+) | Product: zinc finger protein 155 isoform X1 | Note: By Gnomon.
Species: PONAB | 42204537:42211982(+) | Product: zinc finger protein 155 isoform X2 | Note: By Gnomon.
What does this mean? Are they redundant? Should I remove repeated annotations or not?
Best regards, Marcos
Duplicated features does happen from time to time, and they can often be ignored safely depending on what you're doing. It may be an artifact of a reannotation process or similar.
Its worth noting that one of those entries has a different coordinate, so may be something subtly different (since those are isoforms, it may be the same entity but with a legitimately subtly different start site).
This may be a glitch from
Gnomon
(NCBI's eukaryotic gene prediction tool). Let NCBI know by emailing the help desk.