Question

Diffbind affinity based analysis

0

Entering edit mode

4.9 years ago

rapmic • 0

Dear all

I am analyzing ChIPseq data. Following code was used:

> samples <- read.csv2("sample4.csv")
> DBdata1 <- dba(sampleSheet=samples)
P359 SF  Hotair  1 bed
cP359 SF  Unstim  1 bed
P361 SF  Hotair  1 bed
cP361 SF  Unstim  1 bed
P395 SF  Hotair  1 bed
cP395 SF  Unstim  1 bed
> DBdata1
6 Samples, 5111 sites in matrix (7058 total):
     ID Tissue Condition Replicate Caller Intervals
1  P359     SF    Hotair         1    bed      4637
2 cP359     SF    Unstim         1    bed      4313
3  P361     SF    Hotair         1    bed      5100
4 cP361     SF    Unstim         1    bed      8470
5  P395     SF    Hotair         1    bed      5834
6 cP395     SF    Unstim         1    bed      6990
> DBdata2 <- dba.count(DBdata1)
Warnmeldung:
In deparse(object[[i]], nlines = 1L) :
  Neustart einer unterbrochenen promise evaluation
> DBdata2
6 Samples, 5111 sites in matrix:
     ID Tissue Condition Replicate Caller Intervals FRiP
1  P359     SF    Hotair         1 counts      5111 0.60
2 cP359     SF    Unstim         1 counts      5111 0.57
3  P361     SF    Hotair         1 counts      5111 0.62
4 cP361     SF    Unstim         1 counts      5111 0.62
5  P395     SF    Hotair         1 counts      5111 0.55
6 cP395     SF    Unstim         1 counts      5111 0.58
> DBdata3 <- dba.contrast(DBdata2, categories=DBA_CONDITION)
> DBdata4 <- dba.analyze(DBdata3)
converting counts to integer mode
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
> DBdata4
6 Samples, 5111 sites in matrix:
     ID Tissue Condition Replicate Caller Intervals FRiP
1  P359     SF    Hotair         1 counts      5111 0.60
2 cP359     SF    Unstim         1 counts      5111 0.57
3  P361     SF    Hotair         1 counts      5111 0.62
4 cP361     SF    Unstim         1 counts      5111 0.62
5  P395     SF    Hotair         1 counts      5111 0.55
6 cP395     SF    Unstim         1 counts      5111 0.58

1 Contrast:
  Group1 Members1 Group2 Members2 DB.DESeq2
1 Hotair        3 Unstim        3         0
> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.2

How is it possible that there are 0 DBSites? When i do occupancy based analysis i find some distinct differences.

Thank you for your anwers!

BW Raphael

ChIP-Seq Diffbind Affinity analysis • 797 views

ADD COMMENT • link updated 4.9 years ago by Rory Stark ★ 2.0k • written 4.9 years ago by rapmic • 0

score 0 · Answer 1 · 2019-05-22

It is most certainly possible for the quantitative (affinity) analysis to be unable to identify differentially bound sites, even when there are peaks called only in one condition.

There are a number of reasons why this can occur.

First, there may be a number of borderline peaks that are above the threshold in one condition but not the other, even though they look similar. The nature of peak calling means this is almost certain to happen in any real-world experiment.

The more important driver of having few or no differential peaks below a FDR threshold is that the replicates have high variance (and/or there are not enough replicates to capture the variance with high confidence).

You can check the distribution of counts for specific peaks by getting the report with the counts:

report <- dba.report(DBdata3, th=1, bCounts=TRUE)

This will show you the (normalized) read counts for each consensus peak ordered by FDR (ascending). You can also look at peaks of interest in a browser will all the replicate tracks to see if you really have a consistent signal.