What type of file is the size of the chromosomes and how do I get it?
1
0
Entering edit mode
5.0 years ago
kiomix106 ▴ 10

I am currently using files gff, gff3, gtf ... I work with commands by terminal ... mainly awk and tools like bedtools

genome assembly • 3.6k views
ADD COMMENT
0
Entering edit mode

One of the solutions here should suffice: Easiest Way To Obtain Chromosome Length?

Are you referring to your own assembled genome or one of the pre-existing genomes out there?

ADD REPLY
0
Entering edit mode

my question is if I can get the size of a chromosome from a notation file or a gff or gff3 or gtf? or just from a page that has the information about any genome?

ADD REPLY
1
Entering edit mode

You can't get size of a chromosome from a GFF v.1 or 2 (amended based on @jrj.healey's point below)/GTF file. There is no provision in the two file formats to encode information about chromosome size.You may be able to get an approximation (f you consider the chromosome to start at base 1 and use the end interval base pair of last feature that is encoded for that chromosome).

I don't know what you mean by a notation file. Can you clarify?

build.chrome.Sizes file available from UCSC genome data download folders will have chromosome sizes. Example file for GRCh38 human build.

ADD REPLY
0
Entering edit mode

Not necessarily the case. GFF(3) can contain a (multi)fasta attached to the end of the file after the ## FASTA line. If these were complete chromosomes, you could theoretically get that information from a GFF, but it wouldn’t be especially easy.

ADD REPLY
1
Entering edit mode

Thanks for clarifying that.

Even if the file is in GFF3 format containing full chromosome sequences, the information about chromosome sizes would not be readily available for direct parsing without additional processing.

ADD REPLY
2
Entering edit mode
4.9 years ago

Use the UCSC Kent Utilities toolkit. For example:

$ fetchChromSizes hg38 > hg38.chromsizes

Or to build a sorted BED file without non-nuclear chromosomes:

$ fetchChromSizes hg38 \
    | awk -vOFS="\t" '{ print $1, "0", $2; }' \
    | egrep -v '_' \
    | sort-bed - \
    > hg38.bed

Whether you use a chromsizes or BED or other formatted file depends on what you're doing with it, but a little taco-bell programming can get it into the form you need.

ADD COMMENT

Login before adding your answer.

Traffic: 1720 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6