I downloaded RefSeq's top_level gff3 file from their ftp site:
but cannot find any documentation on what each of the specific columns contain. I was able to glean information on the standard gff format columns and could probably guess at some of them, but it would be nice to have a definitive explanation. Does anyone know where I can find this information?
The fields in question are:
gbkey genome mol_type description gene part pseudo product transcript_id gene_synonym partial ncrna_class protein_id exon_number exception transl_except anticodon Target e_value bit_score num_ident blast_aligner pct_identity_gap num_mismatch pct_identity_ungap gap_count pct_coverage pct_coverage_hiqual pct_identity_gapopen_only common_component filter_score weighted_identity rank assembly_bases_seq assembly_bases_aln for_remapping matched_bases matchable_bases lxr_locAcc_currStat_120 matches identity splices consensus_splices product_coverage exon_identity idty merge_aligner map lxr_locAcc_currStat_35 inversion_merge_aligner country isolation-source note tissue-type codons transl_table
Thanks so much!