Question

Encode Data For Dnase-Seq Peaks

1

Entering edit mode

11.0 years ago

J.F.Jiang ▴ 910

Hi all,

I am recently looking into the DNase data of ENCODE project. For one celline, I think the DNaseI HS Peaks from ENCODE/Duke is the right one I am looking for.

However, there are two kinds of such data, 1) DNaseI HS Peaks from ENCODE/Duke
2) DNaseI HS Uniform Peaks from ENCODE/Analysis

So what's the difference of these two types of data, I can not find anywhere for description.

Another question is that cellines may be processed with some treatment, so what's the difference between with and without such treatment?

Hope to get answers.

Thanks

encode peak-calling • 6.4k views

ADD COMMENT • link updated 10.4 years ago by Biostar 20 • written 11.0 years ago by J.F.Jiang ▴ 910

Alex Reynolds · Answer 1 · 2013-04-10

1

Entering edit mode

11.0 years ago

Anthony Mathelier ▴ 910

You can find the descriptions of the data sets on the UCSC tracks.

For instance, if you go to http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&hgt_tSearch=search&tsCurTab=advancedTab&hgt_tsPage=&tsName=&tsDescr=&tsGroup=Any&hgt_mdbVar1=dataType&hgt_mdbVal1=DnaseSeq&hgt_mdbVar2=cell&hgt_mdbVal2=HUVEC&hgt_mdbVar3=view&hgt_mdbVal3=Any&hgt_mdbVar4=[]&hgt_mdbVar5=[]&hgt_mdbVar6=[], you can click on the two first links (HUVEC DNase and HUVEC Pk) to get their respective information.

Basically, the Uniform peaks have been computed by the ENCODE Analysis Working Group which has performed uniform processing on datasets produced by multiple data production groups in the ENCODE Consortium.

ADD COMMENT • link updated 10.4 years ago by Alex Reynolds 35k • written 11.0 years ago by Anthony Mathelier ▴ 910

0

Entering edit mode

Yes, I know it is a kind of meta-analysis, but it do not have the p-value, and the counts for the DHSs regions of uniform are more than the idividual one

ADD REPLY • link 11.0 years ago by J.F.Jiang ▴ 910

0

Entering edit mode

The p-values (FDR) that they used are listed in the link I gave you. You should have access to all the info.

ADD REPLY • link 11.0 years ago by Anthony Mathelier ▴ 910

0

Entering edit mode

chr1 10100 10250 . 0 . 5 -1 -1 -1 chr1 237740 237890 . 0 . 12 -1 -1 -1 chr1 565400 565550 . 0 . 39 -1 -1 -1 chr1 565840 565990 . 0 . 62 -1 -1 -1 chr1 566720 566870 . 0 . 60 -1 -1 -1 chr1 566980 567130 . 0 . 38 -1 -1 -1 chr1 567560 567710 . 0 . 75 -1 -1 -1 chr1 569780 569930 . 0 . 9279 -1 -1 -1

I paste the part of file here, please take a look at the last three coloms, which should be: pValue -1 float range Statistical significance of signal value (-log10). Set to -1 if not used. qValue -1 float range Statistical significance with multiple-test correction applied (FDR -log10). Set to -1 if not used. peak -1 int(11) range Point-source called for this peak; 0-based offset from chromStart. Set to -1 if no point-source called.

As you can see, all the last three colums are set to -1, which means they do not do any significant test?

The content is taken from file : HUVEC DNase HUVEC DNaseI HS Uniform Peaks from ENCODE/Analysis

ADD REPLY • link 11.0 years ago by J.F.Jiang ▴ 910

0

Entering edit mode

I guess they did statistical tests but did not report the values unfortunately. That's bad but you can try to email them. I am not at all related to ENCODE so I do not know more.

ADD REPLY • link 11.0 years ago by Anthony Mathelier ▴ 910

0

Entering edit mode

Great thanks, I believe so since I guess these uniform peaks should be more condident ones while it seems that all of this kind of data do not provide p-value

Anyway, thanks, I will email to them for answer.

ADD REPLY • link 11.0 years ago by J.F.Jiang ▴ 910

score 0 · Answer 2 · 2013-04-10

0

Entering edit mode

11.0 years ago

spacemorrissey ▴ 280

Unfortunately much of the "ENCODE" data on the browser is not the uniformly processed data that was used for the analysis in the papers. In many cases the bigwigs and peaks will be those produced by the individual labs. This does not necessarily mean that DUKE peaks in this case are less reliable, just that they weren't the ones used in the papers.

These peaks were called using Fseq, which does not assign a p-value to peaks. It does give a signal intensity though. I am not sure what you want the p-value for, but you can certainly sort the peaks by the signal value. Not having identical signal values for the DUKE/ENCODE peaks is not surprising as they may have been called with different software, but if you sort both sets by the signal, they should be pretty similar.

ADD COMMENT • link 11.0 years ago by spacemorrissey ▴ 280

0

Entering edit mode

Yes, it is supposed that the uniform results should be more reliable, but actually it did not give us which kind of data they used to get the uniform peak, e.g., the DHSs from a celline, did the uniform peak indicated the DHSs overlaped with other cellines? That is not explained, so I am confused. I have writen email to UCSC and ENCODE, hope to get their response.

Again thanks for your reply

ADD REPLY • link 11.0 years ago by J.F.Jiang ▴ 910

0

Entering edit mode

Since you emailed them, please post here the answers if you get some. Thanks

ADD REPLY • link 11.0 years ago by Anthony Mathelier ▴ 910

0

Entering edit mode

I wrote to them but never get response, what a pitty. And since you are expert on it, I have another question that what's the difference between bigwig (signals) and peaks files? Since peaks are called from signals. When we want to look at the site overlaped by DNase and CHIPSeq, which should I use?

Because, generally, we can call peaks from chipseq and ENCODE also offers peak for specific cell lines, from my opinion, if I just look into those cell lines in ENCODE, I just overlap the chipseq & DNase peaks to indicate the footprint region, OR just overlap the footprint from ENCODE and CHIPSeq peaks,

So I just get really confused

ADD REPLY • link 11.0 years ago by J.F.Jiang ▴ 910

0

Entering edit mode

That's a pitty that they did not reply. Unfortunately, I am not an expert on DNase data at all. To overlap TF ChIP-seq with DNase, I would overlap the peak regions on both data sets.

ADD REPLY • link 11.0 years ago by Anthony Mathelier ▴ 910