Biostar Beta. Not for public use.
Documentation on RefSeq gff3 columns
2
Entering edit mode
18 months ago
summerela • 90
United States

I downloaded RefSeq's top_level gff3 file from their ftp site:

ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/H_sapiens/ARCHIVE/ANNOTATION_RELEASE.105/GFF/ref_GRCh37.p13_top_level.gff3.gz

but cannot find any documentation on what each of the specific columns contain. I was able to glean information on the standard gff format columns and could probably guess at some of them, but it would be nice to have a definitive explanation. Does anyone know where I can find this information?

The fields in question are:

gbkey
genome
mol_type
description
gene
part
pseudo
product
transcript_id
gene_synonym
partial
ncrna_class
protein_id
exon_number
exception
transl_except
anticodon
Target
e_value
bit_score
num_ident
blast_aligner
pct_identity_gap
num_mismatch
pct_identity_ungap
gap_count
pct_coverage
pct_coverage_hiqual
pct_identity_gapopen_only
common_component
filter_score
weighted_identity
rank
assembly_bases_seq
assembly_bases_aln
for_remapping
matched_bases
matchable_bases
lxr_locAcc_currStat_120
matches
identity
splices
consensus_splices
product_coverage
exon_identity
idty
merge_aligner
map
lxr_locAcc_currStat_35
inversion_merge_aligner
country
isolation-source
note
tissue-type
codons
transl_table

Thanks so much!

ADD COMMENTlink
0
Entering edit mode

PS- Here's a link to their README file, I didn't see this information anywhere, but maybe I missed something?

ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/README

ADD REPLYlink
0
Entering edit mode

Have a look to the INSDC documentation, the terms in common should have the same definition. http://www.insdc.org/files/feature_table.html

ADD REPLYlink
1
Entering edit mode
19 months ago
lacek • 10

There are descriptions for some of the unofficial fields in ftp://ftp.ncbi.nlm.nih.gov/genomes/README_GFF3.txt.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1