Sorting heatmaps by gene expression
1
0
Entering edit mode
7.4 years ago
mmmmcandrew ▴ 190

Hi all- I have coverage files from MNase-seq experiments in bigWig format which have been normalized to show nucleosome occupancy at each position in the genome. I have been very successful in using these coverage files in deepTools, along with bed files containing regions of interest (including TSS) to show occupancy at these regions and in flanking sequences. Now, I would like to add an extra layer to this analysis. Rather than simply aligning around the TSS, I would now like to sort the heatmaps by gene expression. So, I have coverage files, and I have access to some RNA-seq data in BAM format. What are the steps I would need to take in order to sort by gene expression? I would prefer to continue using deepTools if possible. I know that R is often used to generate heatmaps as well, but the R language seems far more complicated than the command line options in deepTools. Keep in mind that I have never performed RNA-seq analysis, and that I am relatively new to bioinformatics in general. Thanks!

RNA-Seq MNase-Seq • 4.8k views
ADD COMMENT
3
Entering edit mode
7.4 years ago

You have two methods of doing this within deepTools:

  1. Presort your BED file according to expression.
  2. Use that with computeMatrix, which now defaults to --sortRegions keep
  3. If you use --sortRegions no in plotHeatmap then the results will still be sorted according to expression.

Alternatively:

  1. Run computeMatrix as normal on the MNase samples, ensure that --sortRegions keep is used.
  2. Run computeMatrix with the RNAseq files, likely using the --metagene option. Again, use --sortRegions keep.
  3. Merge the output of 1 and 2 with computeMatrixOperations cbind
  4. When you use plotHeatmap, use the --sortUsingSamples option and specify the last sample (it's numeric, so if you have 6 MNase samples and one RNAseq sample then specify 7).
ADD COMMENT
0
Entering edit mode

Hi Ryan, thanks for your suggestion. I've got some questions in regard to the two methods.

1st method (bed file presorted by expression):

  • Is there any way that a tabular illustration of the gene sorted (by FPKM) can be put near to the deeptools generated heatmaps (e.g. occupancy data) Example: Fig. B here

2nd method (by computeMatrixOperations cbind):

  • If this method is used, isn't it the matrices should be generated by scale-regions mode (starting from TSS to TES for expression the data?). Can I set a different length for the regions for RNAseq and occupancy data (e.g. RNAseq data: only TSS to TTS, occupancy data: 3kb upstream of TSS to 3kb downstream of TTS)

  • In the RNAseq files, how does the value computed for each row (by –sortUsing in computeMatrix) compared/ related to FPKM?

Thanks!

ADD REPLY
1
Entering edit mode
  1. There's not automated way to do that. The output from computeMatrix is basically a gzipped BED file with a json header and a bunch of extra columns (the values per bin), so you could munge your data into that format easily enough.
  2. You can generate the matrices however you want. Just note that the tick placement in the resulting graphs will only be correct for the left-most matrix (maybe I'll change that in version 3.0, it'd be a good time to do so).
  3. The row value is the mean. There's not necessarily any relationship to FPKM.
ADD REPLY
0
Entering edit mode

Thanks Ryan. I'm now trying to construct the matrix for FPKM plotting. In parallel, I need to perform the computeMatrix for the ChIP-seq. files. I am now creating the bed file for that but want to clarify some concepts.

When performing computeMatrix,

  • If a bed6 format is used, isn't it no --metagene option is needed for processing the ChIP-seq. file? How to specify bed6 format is used in the command?

Thanks

ADD REPLY
0
Entering edit mode

For BED6 files --metagene won't do anything, since you don't have exon information. In general, there's little sense in using --metagene for ChIPseq data, since you generally care whether introns have signal.

ADD REPLY
0
Entering edit mode

Thanks Ryan. Do I need to save the BED6 file as .bed6 when inputting into computeMatrix ?

ADD REPLY
0
Entering edit mode

No, the file extension doesn't matter.

ADD REPLY
0
Entering edit mode

Hi Ryan, it's me again. When I performed computeMatrixOperation using the self-constructed matrices, a warning prompted out

chrom, start, end, name, score, strand = region[0:6] ValueError: not enough values to unpack (expected 6, got 1)

I think that there is something wrong during the matrix construction process. Please find the first few lines of the matrix:

@{"ref point":"TSS","upstream":5000,"sample_labels":["FPKM_Differentiation_Day0"],"sort regions":"keep","verbose":true,"downstream":5000,"max threshold":null,"nan after end":false,"bin avg type":"mean","unscaled 3 prime":0,"skip zeros":false,"sort using":"mean","sample_boundaries":[0,2],"bin size":5000,"proc number":4,"min threshold":null,"group_boundaries":[0,50844],"missing data as zero":false,"body":0,"group_labels":["genes"],"unscaled 5 prime":0,"scale":1}

1 3073253 3074322 ENSMUSG00000102693 0 + 0 0

1 3102016 3102125 ENSMUSG00000064842 0 + 0 0

1 3205901 3671498 ENSMUSG00000051951 0 - 0.075562 0.075562

1 3252757 3253236 ENSMUSG00000102851 0 + 0 0

To save the matrix & convert it into .gz, I did it in Ubuntu 1) I've save the matrix as .txt (Character Encoding: Current Locale (UTF-8; Line Ending: Unix/Linus) by "Text Editor"; 2) Then, I compressed them into .gz by right in "files" list

Do you have a standard workflow for doing this? (e.g. in which OS, format, extension, etc..?)

Thanks!

ADD REPLY
0
Entering edit mode

The OS won't matter, though I imagine this would be hard to do on Windows. File extensions are meaningless on Linux and MacOS X, you can change any file extension to anything you want and nothing will change. I would need to see the whole file to be able to figure out what's wrong. My guess is that there's a single line wrong somewhere.

ADD REPLY

Login before adding your answer.

Traffic: 1288 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6