ChIp-seq RNA-seq overlap
3
1
Entering edit mode
7.5 years ago
Federico ▴ 10

Hello! I am actually trying to overlap my peaks from ChIp-seq to my differentially expressed genes obtained after RNA-seq analysis. I will try to be more clear about it... I have my ChiP-seq peaks for my protein of interest X. I used HOMER to annotate them. Let's say I obtained that 30% of my peaks are enriched at promoter regions. Then I have my RNA-seq data in conditions wt versus X-knockout. I used DESeq2 package from R to obtain a list of differentially expressed genes. My question is now to see whether my ChIp-seq peaks for promoters overlap with my list of differentially expressed genes, i.e. my protein X is effectively binding and regulating the expression of these genes. I would like to know if there is some tool able to allow this also at a statistical level. Of course, even a tool to directly overlap ChIp-seq data with RNA-seq would be great :)

Does anyone have a suggestion for that?

Thank you!

ChIP-Seq RNA-Seq • 5.7k views
ADD COMMENT
3
Entering edit mode
7.5 years ago

You can use a Fisher's test (fisher.test() in R) for the statistics.

Regarding "overlapping" data, it depends on what you mean. I would personally make a combined heatmap of the ChIP and RNAseq data (at least for the DE genes). You can use deepTools for this, though it'd be easiest if you used the develop branch from github, since the computeMatrixOperations command won't otherwise be available until the next release (ETA November 1). The general steps would be:

  1. Use bamCoverage to generate bigWig files (possibly input-normalized in the case of ChIPseq)
  2. Use computeMatrix on the ChIPseq bigWig files, likely with reference-point and a reasonable setting for -b
  3. Use computeMatrix scale-regions on the RNAseq bigWig files, likely using the --metagene option.
  4. Use computeMatrixOperations cbind with the output of 2 and 3
  5. Make a heatmap with plotHeatmap.

This allows you to see the differences even in cases where there happened to not be a peak called.

ADD COMMENT
0
Entering edit mode

Sorry for the naive question, but when doing computeMatrix on the ChIPseq file, what do you use for -R. I keep getting this error:

computeMatrix scale-regions: error: argument --regionsFileName/-R is required

I'm guessing it wants the bed file with the peaks - but isn't the point of this approach to not use the peaks as that is limiting?

Thanks for your help.

ADD REPLY
0
Entering edit mode

We usually use transcripts.

ADD REPLY
0
Entering edit mode

by that you mean like a GTF file you use for RNAseq analysis?

ADD REPLY
0
Entering edit mode

GTF or BED, yes

ADD REPLY
0
Entering edit mode

Hello Devon. I followed the steps from 1-3. Now, I'm stuck at step 4. Below is the command I ran :

computeMatrixOperations cbind -m peak_sorted_matrix rna_16hr_sorted_matrix -o output.mat.gz

Error :
Traceback (most recent call last):
  File "/home/anupriya/.local/bin/computeMatrixOperations", line 11, in <module>
    main(args)
  File "/home/anupriya/.local/lib/python2.7/site-packages/deeptools/computeMatrixOperations.py", line 677, in main
    cbindMatrices(hm, args)
  File "/home/anupriya/.local/lib/python2.7/site-packages/deeptools/computeMatrixOperations.py", line 408, in cbindMatrices
    hm.matrix.matrix = np.hstack((hm.matrix.matrix, np.empty(hm2.matrix.matrix.shape)))
  File "/home/anupriya/miniconda2/lib/python2.7/site-packages/numpy/core/shape_base.py", line 288, in hstack
    return _nx.concatenate(arrs, 1)
ValueError: all the input array dimensions except for the concatenation axis must match exactly

I used same '-a' and '-b' options in computeMatrix command for ChIP-seq and RNA-seq, still got this error. How can I fix this?

ADD REPLY
1
Entering edit mode

It appears you used a different GTF or BED file to produce the two matrices. Can you post the commands you used to create both?

ADD REPLY
0
Entering edit mode

Hi Devon , below are the commands and the bed files I used :

computeMatrix scale-regions -S rna_16hr_sorted.bw -R m.sme_exons.bed --metagene -a 500 -b 500 --outFileName rna_16hr_sorted_matrix


computeMatrix reference-point -S peak_sorted.bw -R m.sme_transcript.bed -a 500 -b 500 --outFileName peak_sorted_matrix



head -n 20 m.sme_transcript.bed

Chromosome  499 1692
Chromosome  1721    2614
Chromosome  2624    3778
Chromosome  3775    4359
Chromosome  4591    6618
Chromosome  6648    9176
Chromosome  9229    10011
Chromosome  10184   10276
Chromosome  10411   11211
Chromosome  11215   12246
Chromosome  12243   13301
Chromosome  13310   14140
Chromosome  14130   15089
Chromosome  15286   15522
Chromosome  15525   17252
Chromosome  17249   19018
Chromosome  19052   41623
Chromosome  41688   42635
Chromosome  42943   43353
Chromosome  43365   44687


head -n 20 m.sme_exons.bed

Chromosome  499 1692
Chromosome  1721    2614
Chromosome  2624    3778
Chromosome  3775    4359
Chromosome  4591    6618
Chromosome  6648    9176
Chromosome  9229    10011
Chromosome  10072   10148
Chromosome  10184   10276
Chromosome  10293   10368
Chromosome  10411   11211
Chromosome  11215   12246
Chromosome  12243   13301
Chromosome  13310   14140
Chromosome  14130   15089
Chromosome  15286   15522
Chromosome  15525   17252
Chromosome  17249   19018
Chromosome  19052   41623
Chromosome  41688   42635
ADD REPLY
1
Entering edit mode

You'll have to ensure that you do the following:

  1. Both BED files need to be of the same length and sorted such that row N in each file correspond to each other (computeMatrixOperations is just merging them by rows, since it has no way to otherwise determine which rows belong together).
  2. Ensure that computeMatrix is keeping the input file order (--sortRegions keep).
ADD REPLY
0
Entering edit mode

Hi Devon,

But what if I have different/extra row in exon bed file (like this one : Chromosome 10072 10148)? Should I discard them? Won't I'll be losing data then?

FYI, I am using exon file with RNA-seq data and transcript file with Chip-seq data.

ADD REPLY
0
Entering edit mode

It's unclear what should be matched together if you have extra rows. In that case you must necessarily lose data (not that a few rows matter).

ADD REPLY
0
Entering edit mode

Hi Devon, I took common rows between exon.bed and transcripts.bed and ran remaining commands :

computeMatrixOperations cbind -m peaks_sorted_matrix1 rna_16hr_sorted_matrix1 -o output1.mat.gz 

plotProfile --matrixFile output1.mat.gz --outFileName trial.pdf --samplesLabel peaks rna_16hrs --startLabel GS --endLabel GE --dpi 500 --perGroup --plotHeight 15

and got this plot. Why peaks are not covering the whole plot , did I miss something? https://www.dropbox.com/s/ls6742nyqblt8k9/trial.pdf?dl=0

ADD REPLY
0
Entering edit mode

They don't cover the whole plot because the two datasets are different size. In general using --perGroup with a dataset like that doesn't make sense, as the columns of data for each sample aren't comparable (only the rows are).

ADD REPLY
0
Entering edit mode

Hi Devon, I was wondering, will using either reference-point or scale-regions in computeMatrix for both chipseq n rnaseq data will work?

ADD REPLY
0
Entering edit mode

Or I'll remove --perGroup option and create the graph like this : https://www.dropbox.com/s/gw0rc9jgsq09e69/trial1.pdf?dl=0 and then it can be compared?

ADD REPLY
1
Entering edit mode

Yes, though it looks like you have an older version of deepTools, since I think I fixed the issue with the tick labels not being correct in more recent versions.

ADD REPLY
0
Entering edit mode

Thanks a lot Devon for solving the problem!

ADD REPLY
0
Entering edit mode
7.5 years ago
Mike ★ 1.9k

Have a look on BETA tool.

Target analysis by integration of transcriptome and ChIP-seq data with BETA

http://www.nature.com/nprot/journal/v8/n12/full/nprot.2013.150.html

ADD COMMENT
0
Entering edit mode
7.5 years ago
Federico ▴ 10

ok thank you! I'll try them out! Cheers

ADD COMMENT

Login before adding your answer.

Traffic: 2498 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6