Predictability Of Regulatory States In New Cell Lines Or Tissues From Existing Encode Segmentations?
4
4
Entering edit mode
12.1 years ago

Are there any tools for predicting the regulatory states in genomic regions in new cell lines or tissues from the existing ENCODE Segmentations states? This is, can the current segmentations be used to predict the status of a cell line or tissue that has not been assayed by ENCODE?

For example, given that we understand how similar/different certain tissues are to each other in terms of their gene expression, can we use the existing information in ENCODE to predict the states of different regulatory regions in a given tissue?

Is there any predictive power?

encode prediction • 3.9k views
ADD COMMENT
1
Entering edit mode
12.1 years ago
Gjain 5.8k

Encode`s chromHMM and Segway data is available on ucsc genome browser.

you might want to look at the recent paper published in nature Mapping and analysis of chromatin state dynamics in nine human cell types

Here is the UCSC genome browser link to Chromatin State Segmentation by HMM from ENCODE/Broad

I hope this helps.

ADD COMMENT
0
Entering edit mode

How about for new cell lines or tissues?

ADD REPLY
1
Entering edit mode
12.1 years ago

May be not, as chromatin states are specific to regulatory elements that are tissue/cell specific.

Also I would like to add, there are so many regions in the human genome that are not covered by any of these 15 chromatin states predicted by using 8 histone and 1 ctcf marks may lower your predictive power.

If you are really serious about your question, I would suggest you to start with 50 chromatin states (covers more histone modifications and human genome) for your prediction and then 15 chromatin states.

You can train your datasets by using chromHMM (This software is used in predicting the above chromatin states by Jason in his 2 Nature papers)

Good luck!

ADD COMMENT
1
Entering edit mode
12.1 years ago

I would be very surprised if there was an existing tool for answering your question in a straightforward way. Why not try yourself and see?

You could select a certain number of regions from the genome, and represent each by a vector containing a segmentation value for each of the assayed cell lines or tissued (I don't know whether it will be a binary or a continuous value, I guess either would be OK). So each region would have a kind of "tissue specificity profile." Then you could do leave-one-out cross-validation, leaving out one tissue/cell line in turn, train some statistical learning model on the profiles for the rest of the tissues/cell lines, and try to predict the value of the left-out tissue/cell-line, finally averaging the accuracy of the predictions across the tissues and regions.

Actually I would consider taking a shot at this when I get the time, the question is just whn this mythical day will come ... :-)

ADD COMMENT
0
Entering edit mode
12.1 years ago
ALchEmiXt ★ 1.9k

Comming initially from molecular biology....don't forget that many many cell lines have adapted significantly to in vitro conditions and many many chromosome reshuffling, duplication events are to be considered (as my professor used to say on cell culture work: make sure the cells are happy and fresh; dead cells in culture are like "french fries" to the happy cells, the happy (meals) take up everything until they got fed up and burst...or develop some genome reshuffling to deal with it)....

So depending on the question WHY you want to do this using cell lines...BE CAREFUL! Tissue sections might be a different piece of cookie though.

My 2ct (or did I misunderstand your question?).

ADD COMMENT
0
Entering edit mode

thanks. how about tissues?

ADD REPLY
0
Entering edit mode

Tissues are more complex. Since as far as I am into it they are usually not single cell type.... so you would need to consider tissues for each of them...quite some task...

ADD REPLY

Login before adding your answer.

Traffic: 2524 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6