What Is The Gc-Content Across Different Human Chromosomes?
4
8
Entering edit mode
12.3 years ago
Dan12345 ▴ 160

Does anyone know what is the GC-content of different human chromosomes?

chromosome gc • 38k views
ADD COMMENT
2
Entering edit mode

Funny that there are three different GC% answers for Chr1 ...

ADD REPLY
2
Entering edit mode

Depends on the genome build and version that you use - it's perfectly 'legit', as they say in Cockney London slang.

The truth of the matter is that we do not have an honest representation of the true GC content because the reference genome builds exclude / mask telomeric and centromeric regions, where GC content is high.

Thus, all values represented in this thread are based on the genome builds and are not reflective of the actual GC content, which would be larger and which would differ from individual to individual.

ADD REPLY
15
Entering edit mode
12.3 years ago

**EDIT**

OK, so I felt bad about not actually answering your question, so here you go (generated by the method outlined below):

#Sequence   GC content
chr1          0.43
chr2          0.40
chr3          0.40
chr4          0.38
chr5          0.40
chr6          0.40
chr7          0.41
chr8          0.40
chr9          0.43
chr10         0.42
chr11         0.42
chr12         0.41
chr13         0.40
chr14         0.43
chr15         0.44
chr16         0.45
chr17         0.46
chr18         0.40
chr19         0.48
chr20         0.44
chr21         0.43
chr22         0.49
chrX          0.40
chrY          0.46
chrM          0.44

**EDIT ENDS**

The GC content of human chromosomal DNA is very heterogeneous, rendering chromosome-wide statistics relatively meaningless. It has been shown that the human genome is a mosaic of GC-rich and GC-poor regions, of around 300kb in length, called isochores.

You can plot these regions of varying content using the Emboss program isochore. For example, for chromosome 1.

wget <http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/chr1.fa.gz>
gunzip chr1.fa.gz
isochore -sequence chr1.fa -outfile chr1.isochore -graph png

Gives the following result:

Isochores of Chr 1

You could also get the sequences of the individual chromosomes and work out their overall GC content, also using Emboss, this time geecee:

   geecee -sequence chr1.fa

Gives us an answer of 43% for Chromosome 1.

ADD COMMENT
0
Entering edit mode

Well, the number I got for chr1 is 41.7% with my program. I guess EMBOSS is count "N" as 50% GC, but it should not do that! We should send a bug report, I think.

ADD REPLY
0
Entering edit mode

hmm, yes agreed, the treatment of 'N' will affect the results considerably. From http://emboss.sourceforge.net/apps/cvs/emboss/apps/geecee.html - 'It sums the number of G and C bases in the input sequence(s) and writes the result to file as the fraction (in the interval 0.0 to 1.0) of the length of the whole sequence.'

ADD REPLY
12
Entering edit mode
12.3 years ago
lh3 33k

GRCh37/hg19/b37:

1   0.417439
2   0.402438
3   0.396943
4   0.382479
5   0.395163
6   0.396109
7   0.407513
8   0.401757
9   0.413168
10  0.415849
11  0.415657
12  0.40812
13  0.385265
14  0.408872
15  0.42201
16  0.447894
17  0.455405
18  0.39785
19  0.483603
20  0.441257
21  0.408325
22  0.479881
X   0.394963
Y   0.391288
MT  0.443626

Done by:

seqtk comp hs37m.fa.gz | awk '/^[0-9MXY]/{x=$4+$5;y=x+$3+$6;print $1"\t"x/y}'

ChrY has lots of ambiguous bases and that is why my result differs most on chrY in comparison to the EMBOSS result. EMBOSS is wrong, IMHO.

ADD COMMENT
1
Entering edit mode
12.3 years ago

I was about to tell you, but then someone crashed the server. Here is how (using Biopieces):

read_fasta -i /home/DATA/downloads/Homo_sapiens/human_hg19.fasta.gz | analyze_gc | write_tab -ck SEQ_NAME,GC% -x
#SEQ_NAME       GC%
gi|89161184|ref|AC_000044.1| Homo sapiens chromosome 1, alternate assembly Celera, whole genome shotgun sequence        40.77
ADD COMMENT
0
Entering edit mode

Well, AC_000044 is the Celera assembly, not hg19. In addition, Perl is notoriously inefficient for looping through each base.

ADD REPLY
1
Entering edit mode
7.9 years ago
sacha ★ 2.4k

using bedtools nuc on hg19 :

1 chr1 0.377295
2 chr2 0.394172
3 chr3 0.390478
4 chr4 0.375491
5 chr5 0.388130
6 chr6 0.387498
7 chr7 0.397821
8 chrX 0.384356
9 chr8 0.392218
10 chr9 0.351521
11 chr10 0.402901
12 chr11 0.403720
13 chr12 0.397843
14 chr13 0.319767
15 chr14 0.336276
16 chr15 0.336248
17 chr16 0.391037
18 chr17 0.436335
19 chr18 0.380423
20 chr20 0.416613
21 chrY 0.172677
22 chr19 0.456450
23 chr22 0.326388
24 chr21 0.297838

ADD COMMENT
2
Entering edit mode

Ehm your results are remarkably different from what was obtained earlier here in this topic. Also not sure if this topic was worth reviving after 4.4 years.

ADD REPLY
1
Entering edit mode

Someone else revived it just now... after 6 years! They up-voted lh3's answer. I also then gave my own comment at the very top

ADD REPLY

Login before adding your answer.

Traffic: 2701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6