Circos or circlize plot for overlapping values/colors
0
0
Entering edit mode
9.4 years ago
User 7754 ▴ 250

Hi,

I have a file with many genes across the genome, and each with a different color depending on whether a variant within the gene has been associated with a phenotype. I would like to create a plot using circos or circlize representing stacked layers where the genes overlap, with colors assigned based on the phenotype (but if the gene is associated with only one phenotype then the layer will only be one, so not stacked). The purpose of this is to immediately visualise which genes are associated with multiple phenotypes (from the stacking), and which phenotypes are associated with the genes. The colors will indicate whether a gene has associations with one type of phenotype (e.g. cancer) or the other (e.g. diabetes). I was thinking of using the "tiles" plot in circos, where the tiles are color-coded. Is there an option to color-code the tiles based on another value? I have also tried with 'highlights' and with 'heatmap' (using the phenotype colors as the factor levels) but I don't think this is the way to go because I cannot see the overlaps if I use these plots, which is what I am mostly interested in.

If I use circlize, I am trying to plot overlapping regions like the plot here: http://jokergoo.github.io/circlize/example/gene_model.html

But using different colors already specified in the data file. If there is a way, could you please direct me to the right function?

This is an example of the data file in R:

df = structure(list(Chr = c("chr1", "chr1", "chr1", "chr1", "chr1",
"chr2", "chr2", "chr2", "chr3", "chr3", "chr4", "chr4", "chr6",
"chr6", "chr6", "chr7", "chr7", "chr7", "chr8", "chr8", "chr9",
"chr9", "chr10", "chr11", "chr12", "chr13", "chr13", "chr19",
"chr19", "chr20", "chr21", "chr22"), pos.start = c(10678425L,
159391160L, 109318306L, 154509258L, 229805966L, 26989551L, 202937054L,
16209774L, 142169092L, 8925911L, 113873068L, 78144140L, 29882328L,
31321038L, 2754229L, 91908370L, 149706362L, 4754575L, 105108497L,
81375712L, 107169073L, 95049590L, 117466805L, 125738394L, 123893076L,
73886275L, 29029377L, 48616438L, 48616760L, 16070165L, 18529136L,
19608500L), pos.end = c(11678425L, 160391160L, 110318306L, 155509258L,
230805966L, 27989551L, 203937054L, 17209774L, 143169092L, 9925911L,
114873068L, 79144140L, 30882328L, 32321038L, 3754229L, 92908370L,
150706362L, 5754575L, 106108497L, 82375712L, 108169073L, 96049590L,
118466805L, 126738394L, 124893076L, 74886275L, 30029377L, 49616438L,
49616760L, 17070165L, 19529136L, 20608500L), Gene = c("ANGPTL7",
"CCDC19", "CELSR2", "DCST1", "GALNT2", "ATRAID", "BMPR2", "FAM49A",
"PAQR9", "THUMPD3-AS1", "CAMK2D", "CNOT6L", "ABCF1", "RDBP",
"SLC22A23", "CDK6", "GIMAP7", "WIPI2", "LRP12", "ZNF704", "ABCA1",
"ASPN", "GFRA1", "ST3GAL4", "CCDC92", "KLF12", "MTUS2", "CA11",
"SPHK2", "KIF16B", "BTG3", "TRMT2A"), color = c("moccasin", "navy",
"moccasin", "yellow", "moccasin", "moccasin", "yellow", "cyan",
"yellow", "green", "goldenrod4", "magenta", "navy", "moccasin",
"moccasin", "yellow", "moccasin", "yellow", "moccasin", "yellow",
"moccasin", "cyan", "navy", "moccasin", "navy", "moccasin", "yellow",
"moccasin", "moccasin", "moccasin", "cyan", "moccasin")), .Names = c("Chr",
"pos.start", "pos.end", "Gene.name", "color"), row.names = c(917L,
953L, 956L, 1005L, 1087L, 1997L, 2003L, 2077L, 2534L, 2560L,
2937L, 2956L, 3495L, 5182L, 4625L, 6612L, 6642L, 6491L, 7060L,
7124L, 7487L, 7501L, 7991L, 8468L, 8897L, 9424L, 9471L, 11476L,
11226L, 11786L, 12117L, 12279L), class = "data.frame")

The part of the configuration file for the plot in circos is this:

<plots>
 <plot>
      type            = tile
      file        = data/data1.txt
      r0   = 0.98r
      r1   = conf(.,r0)+0.03r
      orientation = center
      layers      = 24
      margin      = 0.02u
      thickness   = 24
      padding     = 8
      stroke_thickness = 0.001
      stroke_color     = vlgrey
 </plot>
</plots>

However, I can't get the colors in the tiles plot to show up correctly: there are some colors that do not come up (maybe because they overlap with too many others? is there a way to prioritise which color needs to be plotted first?), and I have black lines while I do not have a color 'black' for any of the phenotypes (again maybe because the lines are areas where there are too many overlaps?). I have tried adjusting the layers, and the stroke_thickness, but these black lines remain and the correct colors sill do not show. I am attaching the plot I am getting now.

I really appreciate any suggestions!

Thank you in advance for your help!

Fra

circos circlize • 4.2k views
ADD COMMENT
0
Entering edit mode

Pgibas kindly suggested to use circlize in R. Can I get something like this

http://jokergoo.github.io/circlize/example/gene_model.html

I did this so far, but it gives me an error (Error in n - I : non-numeric argument to binary operator)

circos.genomicTrackPlotRegion(ylim = c(0.5, n + 0.5), panel.fun = function(region, value, ...) {
    gi = get.cell.meta.data("sector.index")
    tr = data.1$Gene[data.1$Chr == gi]
    for(i in tr) {
        region = data.frame(data.1$pos.start[all.hm$Gene==tr], data.1$pos.end[data.1$Gene.name==tr])
        circos.lines(c(min(data.1$pos.start[all.hm$Gene==tr]), max(data.1$pos.end[data.1$Gene.name==tr])), c(n-i, n-i), col = data.1$color[data.1$Gene==tr])
    }
}, bg.border = NA, track.height = 0.3)
ADD REPLY

Login before adding your answer.

Traffic: 2000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6