Call the file

Question

ChromHMM: chromatin states to genome % coverage?

1

Entering edit mode

7.1 years ago

Sy80 ▴ 10

Hi all,

I'm trying to wrap my head around ChromHMM. short version: I can't gather from the ChromHMM/ENCODE papers how one goes from chromatin state to % genome coverage of states across different cell lines?

long version: We have three conditions (wild-type, het, ko cells) with 6 histone mark ChIP-seq data per condition.

We are trying to create a joint model using all the data (18 histone marks total) from our peak calls and then use the jointly learned chromatin states to determine genome coverage per chromatin state for our 3 conditions (to determine similarities/differences between conditions).

First, we merged all our peaks (each histone mark separately) for all conditions and created a virtual chromosome per histone mark and used this virtual genome to learn a 12-state model.

My question is, how can we use the joint chromatin states to get genome coverage per state and per cell line separately??

Hope my question was clear...I appreciate any input. Thanks!

ChIP-Seq ChromHMM • 3.2k views

ADD COMMENT • link updated 3.1 years ago by Yussuf • 0 • written 7.1 years ago by Sy80 ▴ 10

0

Entering edit mode

Hi @Sy80, Even I am looking for answer to my similar question. Was just curious to ask were you able to get it done. I'll appreciate if you can share your experience here and the strategy you followed to get this done.

Thanks

ADD REPLY • link 4.9 years ago by Researcher ▴ 130

0

Entering edit mode

If you set up your tissue marks file to define each cell line separately, then ChromHMM should have produced segmentation files specific for each cell line. There should also be a file called CellLine_12_coverage.txt for each cell line, which has a genome coverage column.

ADD REPLY • link 4.8 years ago by colin.kern ★ 1.1k

score 0 · Answer 1 · 2021-03-29

Do it in Rstudio, here is a code I wrote:

Call the file

x<- read.table("segments.bed", sep = "\t")

Kepping only the rows with the same state, for me they were 7 states, the segments bed files should have 4 columns.

E1 <- x[x$V4 == "E1", ] E2 <- x[x$V4 == "E2", ] E3 <- x[x$V4 == "E3", ] E4 <- x[x$V4 == "E4", ] E5 <- x[x$V4 == "E5", ] E6 <- x[x$V4 == "E6", ] E7 <- x[x$V4 == "E7", ]

subtracting V4-V3 to extract the interval bp

E1 ['interval_size'] <- (E1$V3 - E1$V2) E2 ['interval_size'] <- (E2$V3 - E2$V2) E3 ['interval_size'] <- (E3$V3 - E3$V2) E4 ['interval_size'] <- (E4$V3 - E4$V2) E5 ['interval_size'] <- (E5$V3 - E5$V2) E6 ['interval_size'] <- (E6$V3 - E6$V2) E7 ['interval_size'] <- (E7$V3 - E7$V2)

total sum of our new column in each state

sum(E1[, 'interval_size']) sum(E2[, 'interval_size']) sum(E3[, 'interval_size']) sum(E4[, 'interval_size']) sum(E5[, 'interval_size']) sum(E6[, 'interval_size']) sum(E7[, 'interval_size'])

the total sum of all the intervals column should be the genome size in bp, calculate your percentages.

I know it's a bit old question, but I put the code here in case someone needs it.