Question

Read Counts Summarisation For Deseq

0

Entering edit mode

10.0 years ago

aditi.qamra ▴ 270

Hi,

I have a slightly basic question about using DEseq downstream of CCAT -

After calling peaks using CCAT on my tumor and normal samples ( i have 5 such pairs) - Deseq requires a read count in a list of regions (common ?) across conditions ( you want to check for differential peaks in ) . However CCAT has varying regions that it identifies in different samples. So how do I get the read counts for eg. peak regions identified in T1 from the CCAT output of N1 ?

Thanks !

chipseq • 1.9k views

ADD COMMENT • link 10.0 years ago by aditi.qamra ▴ 270

0

Entering edit mode

The general idea is to combine the peak calls from all of your samples and then perform the counting based on that. BTW, DESeq(2) will incorrectly perform library-size normalization for your use case, since its assumptions are unlikely to be true for ChIP-seq, so you'll need to either provide your own size factors or also count the off-peaks and then normalize to that. I've never used CCAT or seen its output, so I can't provide any specific advice over exact steps.

ADD REPLY • link 10.0 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks for pointing out the normalisation bit. But the question is how to combine them. CCAT outputs a file with the following header - <chromosome> <position of="" the="" peak=""> <start of="" region=""> <end of="" region=""> <read counts="" in="" chip="" library=""> <read counts="" in="" control="" library=""> <fold-change score=""> <local fdr="">

The regions are going to be slightly different in the output from each sample ( whether tumor or normal) - I could take an union of the peaks from all the tumor biological replicates and likewise for normal. but the regions would still not be necessarily same between T and normal. In that case, Im struggling to understand that for DeSeq how would i get different read counts for common regions for each sample ?

ADD REPLY • link 10.0 years ago by aditi.qamra ▴ 270

0

Entering edit mode

You would take the union of regions from all samples, regardless of treatment group. BTW, I should amend my earlier mention of library-size normalization with the word "may", since whether this will be an issue or not will depend on the dataset.