Question

DiffBind - Erroneous Affinity Shift In One Direction

0

Entering edit mode

6.3 years ago

halner.andreas • 0

Hi all I am using DiffBind (R/3.4.1, DiffBind_2.4.8) to identify differentially accessible peaks from ATACseq (peaks called using MACS2 --no model, with broad region calling off) of a given cell type for three different conditions which I shall call A (2 samples), B (5 samples) and C (2 samples). I wish to compare ‘condition’ A to B as well as compare B to C. My code is as follows:

B_C<-dba(sampleSheet = " B_C.csv")

B_C <-dba.count(B_C)

B_C <-dba.contrast(B_C,categories = DBA_CONDITION, minMembers = 2)

B_C <-dba.analyze(B_C, method = DBA_ALL_METHODS)

And

A_B<-dba(sampleSheet = " A_B.csv")

A_B <-dba.count(A_B)

A_B <-dba.contrast(A_B,categories = DBA_CONDITION, minMembers = 2)

A_B <-dba.analyze(A_B, method = DBA_ALL_METHODS)

I have two questions/concerns:

1) The B-C comparison analysis yields sensible results but the results of the A-B comparison analysis seem spurious – the affinity has shifted exclusively in one direction, with all peaks differentially accessible in the condition A compared to B (regardless of whether EdgeR or DESeq2 is used). Even peaks which feature in 4 out of the 5 B samples but neither of the 2 A samples come up as differentially accessible in A compared to B. Do you have any suggestions as to what might be causing this discrepancy and how I can overcome this problem?

2) Is there a way in which to compare differential accessibility between the A condition and those peaks in the B group which are themselves differentially accessible compared to C i.e. A versus B[!C]? I haven’t been able to identify a way in which to do this from the DiffBind manual. It would be possible to take the DiffBind results of B versus C and then implement DiffBind a second time using these results in comparison to condition A which, however, would preclude the use of replicates as part of the second DiffBind run. An alternative solution I have thought about would be to carry out A-B and B-C separately (as outlined in ‘question 1’ above), combine the regions from both of these analyses into a single BED file and then perform clustering e.g. using the k means algorithm to identify clusters in which A is enriched compared to B or vice versa but B is enriched compared to C. Your thoughts would be welcome.

Thanks in advance.

ChIP-Seq diffbind • 2.6k views

ADD COMMENT • link updated 6.2 years ago by Rory Stark ★ 2.0k • written 6.3 years ago by halner.andreas • 0

score 1 · Answer 1 · 2018-01-25

Hello, I am the author of DiffBind.

1) There are a number of things that could account for the A-B differences. You say all the shifts are in one direction, which I assume means that the fold-changes of all the differentially bound (DB) sites have the same sign? It may be instructive to look at the read counts for the DB sites, by setting bCounts=TRUE in the call to dba.report().You can try this normalized or non-normalized (bNormalized=FALSE). This may shed some light as to what is going on. You can also look at the MA plots using dba.plotMA() to confirm the shift is visible in the data.

One possibility is that the accessible regions are longer in one conditions, leading to the merged peaks being very wide, with relatively few read counts in the other condition. In this case it would be wroth running an analysis that re-centers the sites and standardized their lengths by setting e.g. summits=250 in the call to dba.count(). You will then do analysis on 501bp regions centered around the point of maximal pileup which may prove a better comparison.

You can also send me the DBA object after analysis to have a look at what may be going on. rory.stark [@] cruk.cam.ac.uk

2) The way I would do the second comparison is as follows:

Run the B vs C analysis, then retrieve the DB sites by calling dba.report().
Reload all the peaks and retrieve the condition A consensus using dba() and dba.peakset(bRetrieve=TRUE).
Add the DB sites and the A consensus sites to a temporary DBA object and retrieve their union.
Reload all the peaks and re-run dba.count() with consensus peakset set to the union
Run A vs P analysis on this binding matrix

Here is a complete script using the sample data. In this case, the first contrast is between MCF7 Responsive and all Resistant, and the second contrast is between Resistant and Responsive samples using the union of those peaks. Note that the minOverlap parameter in the call to dba() determines how many of the "A" condition peaks you keep (minOverlap=1 will keep all of them):

# Run first contrast and retrieve DB sites (B vs C)
data(tamoxifen_counts)
tamoxifen <- dba.contrast(tamoxifen,group1=3:5,group2=tamoxifen$masks$Resistant)
tamoxifen <- dba.analyze(tamoxifen)
DBsites   <- dba.report(tamoxifen)

# Re-load peaks and retrieve non-MCF7 Responsive sites (A)
data(tamoxifen_peaks)
RespNonMCF7 <- dba(tamoxifen,
                   mask=tamoxifen$masks$Responsive & !tamoxifen$masks$MCF7, 
                   minOverlap=2)
RespNonMCF7Sites <- dba.peakset(RespNonMCF7,bRetrieve=TRUE)

# Add A sites and DB (B vs C) sites to temp DBA object and get union
tamDBA <- dba.peakset(tamoxifen,peaks=RespNonMCF7Sites,consensus=TRUE )
tamDBA <- dba.peakset(tamDBA,peaks=DBsites,consensus=TRUE)
tamDBA <- dba(tamDBA,mask=tamDBA$masks$Consensus, minOverlap=1)
consSites <- dba.peakset(tamDBA, bRetrieve=TRUE)

#  Re-load all original peaks and run analysis using union as consensus
data(tamoxifen_peaks)
tamoxifen <- dba.count(tamoxifen, peaks=consSites)
tamoxifen <- dba.contrast(tamoxifen, group1=tamoxifen$masks$Resistant,
                          group2=tamoxifen$masks$Responsive,
                          name1="Resistant", name2="Responsive")
tamoxifen <- dba.analyze(tamoxifen)