Uniprot to refseq missing entries

0

Entering edit mode

8.9 years ago

jacobsen.jeremy ▴ 40

A large proportion of the Uniprot database is not linked to a refseq nucleotide id. For instance (Q7KZI7-11,Q7KZI7-13,Q496A3-2,Q8NDM7-3,Q8NDM7-2,Q8NDM7-5). I've counted about 10,000 of these. Why is this, and is there a way to patch this?

Thanks,
-Jeremy

uniprot RNA-Seq refseq • 1.6k views

ADD COMMENT • link updated 15 months ago by Ram 43k • written 8.9 years ago by jacobsen.jeremy ▴ 40

0

Entering edit mode

Looks like they are isoforms that differ from the canonical sequence. If you drop the (-x) part you will get the original sequence ID which is lined to a RefSeq entry.

ADD REPLY • link 8.9 years ago by GenoMax 141k

0

Entering edit mode

Thanks genomax. This is definitely the case for most of the missed mappings, but there are still many others that are not isoform accessions such as (O71037,Q9UKH3,Q6ZUT4). These maybe account for 500 or so entries (much better than 10,000 anyway).

ADD REPLY • link updated 15 months ago by Ram 43k • written 8.9 years ago by jacobsen.jeremy ▴ 40

1

Entering edit mode

First two entries appear to be some sort of retro-viral proteins and the last one is based on a single mRNA sequence. Not enough evidence for RefSeq curators to act on. Looks like you may have to exclude these entries from whatever analysis you are doing.

ADD REPLY • link 8.9 years ago by GenoMax 141k

0

Entering edit mode

You are correct. After taking a closer look it would appear that many of the remaining entries are either contaminants or "putative uncharacterized proteins" There are others that have a ENST identity but no refseq mapping. I'm at a point now where I've been able to map 98% in one way or a another and I suspect that more than half of the remaining 1000 (not 500) entries are non-human contaminants. This is good enough that I can attempt direct sequence matching or annotate by hand. Thanks a lot genomax!

ADD REPLY • link updated 15 months ago by Ram 43k • written 8.9 years ago by jacobsen.jeremy ▴ 40

Login before adding your answer.