RIblast output Circos plot
1
1
Entering edit mode
5.2 years ago
ta_awwad ▴ 340

Dear All, I have data table in the following format:

Id  Query.name  Query.Length    Target.name Target.Length   Energy  BasePair
0   gene1   2154    gene2   982 -8.09237    (571-581:979-966) 
1   gene1   2154    gene2   982 -8.33018    (265-278:682-669) 
2   gene1   2154    gene3   982 -8.14695    (392-401:674-665) 
3   gene1   2154    gene3   982 -8.16419    (392-399:258-251) 
4   gene1   2154    gene4   2235    -8.00343    (257-268:2156-2145)
5   gene1   2154    gene4   2235    -8.57361    (427-441:2156-2140)

I would like to generate circos plot .. any idea how to reformate this table?

many many thanks,,

TA

R RIblast circos • 1.6k views
ADD COMMENT
0
Entering edit mode

Without knowing where the data came from it is a bit difficult... What is the meaning of the last column ?

ADD REPLY
0
Entering edit mode

Hi Bastien, the last column is just coordinates .. this means the sequence from 571-581 of gene 1 interacts with the bases 979- 966 of gene 2.

ADD REPLY
1
Entering edit mode
5.2 years ago

You simply need the following format to plot arcs:

Chr1 start1 end1 chr2 start2 end2 energy

Using your first line:

chr 571 581 chr 966 979 -8.09237

You should replace the chr by the actual name of the chromosome where gene1 and gene2 are, following the chromosome name stored in the karyotype file given in the conf file.

For more information, I recommend the well made Circos tutorials about links:

http://circos.ca/documentation/tutorials/links/basic_links/

Alternatively you can plot Links Heatmap using the "energy" information, for more details on the rules to use:

http://circos.ca/documentation/tutorials/recipes/heatmap_links/

ADD COMMENT
1
Entering edit mode

Thanks Gautier, problem is the start and end are NOT genomic coordinated but rather transcripts sequence position .. do you know what i mean! simply this is the output file from RIblast to predict RNA-RNA interaction.. so you provide the RNAs as FASTA then the algorithm calculates the interaction and gives you the regions within the sequences that interact with each other so there is nor genomic coordinates here.

ADD REPLY
0
Entering edit mode

You should have had this information in your post. I don't think you will be able to transform your transcript coordinates to genomic one. Maybe with an annotation file as gtf or gff using your gene names

ADD REPLY
0
Entering edit mode

Actually can easily make your own karyotype file with transcripts instead of chromosomes.

For instance you can make such a karyotype file (chr22, chrX and chr8 are corresponding to built-in colors) data/transcripts_karyotype.txt:

chr -   gene1   g1  0   2154    chr22 
chr -   gene2   g2  0   982 chrX
chr -   gene3   g3  0   982 chr8
chr -   gene4   g4  0   2235    chr2

And then for the arc file data/RIblast_results.txt :

gene1 571 581 gene2 966 979 -8.09237

And in the conf file etc/transcripts.conf you should precise that you want to plot gene1 gene2 gene3, etc. basically each transcript will be considered as a chromosome and your problem should be solved. In the conf file you should have:

karyotype = ../data/transcripts_karyotype.txt
chromosome_units = 1
chromosomes = gene1;gene2;gene3;gene4

<links>
<link>
file = ../data/RIblast_results.txt`
ribbon           = yes
flat             = yes
stroke_color     = vdgrey
stroke_thickness = 2
color            = grey_a4
</link>
</links>

and the usual <ideogram> .... </ideogram> configuration part that you can get from the tutorials.

ADD REPLY
0
Entering edit mode

many many thanks Gautier

ADD REPLY

Login before adding your answer.

Traffic: 2800 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6