Question

having a more informative annotation file

0

Entering edit mode

8.8 years ago

zizigolu ★ 4.3k

Hello there,

I downloaded a csv annotation file from affymetrix but when I used with my citrus arrays the the resulted network just contains gene symbol, not gene name, not uniprot, nothing else. I downloaded text annotation from PLEXdb but my workbench rejects the format, do you know any other more informative csv annotation or know anyway to reformat my text to csv please

annotation • 2.6k views

ADD COMMENT • link updated 17 months ago by Ram 43k • written 8.8 years ago by zizigolu ★ 4.3k

1

Entering edit mode

Doesn't affymetrix contain gene information? output of a network analysis

ADD REPLY • link updated 17 months ago by Ram 43k • written 8.8 years ago by PoGibas 5.1k

0

Entering edit mode

Thank you, yes it does but I don't know how to mixed the gene name with annotation from affymetrix

ADD REPLY • link updated 17 months ago by Ram 43k • written 8.8 years ago by zizigolu ★ 4.3k

0

Entering edit mode

Thank you Alolex, i have an annotation text file that can't be read by my workbench and just accepts csv...

This is a little bit of my text file annotation downloaded from PLEXdb

Annotation for selected probe sets. downloaded from PLEXdb on Jul 12 2015

Probeset    Annotation_Date    Consensus_ID    GeneBank_Accession    Blast_Date    Blast_Program    Ref_Desc    E-value    Perc_Identity    
--    --    --    --    --    --    --    --    --    
--    --    --    --    --    --    --    --    --    
--    --    --    --    --    --    --    --    --    
Cit.10074.1.S1_s_at    Mar 11, 2009    CV884880    .    2010-12-10    blastx    | Symbols: SIR | sulfite reductase | chr5:1319404-1322298 FORWARD LENGTH=643    7e-17    66.1    
Cit.10074.1.S1_s_at    Mar 11, 2009    CV884880    .    2009-09-13    blastn        0    99.9    
Cit.10074.1.S1_s_at    Mar 11, 2009    CV884880    .    2008-09-02    blastx        2e-51    85.2    
Cit.10074.1.S1_s_at    Mar 11, 2009    CV884880    .    2010-12-10    blastx    PREDICTED: hypothetical protein [Vitis vinifera]    2e-51    85.2    
Cit.10074.1.S1_s_at    Mar 11, 2009    CV884880    .    2007-11-07    blastn    Sulfite reductase [Prunus armeniaca (Apricot)]    0    100    
Cit.10074.1.S1_s_at    Mar 11, 2009    CV884880    .    2012-01-23    blastx    Sulfite reductase [ferredoxin] n=2 Tax=Synechocystis sp. PCC 6803 RepID=SIR_SYNY3    4e-08    46.9    
Cit.10076.1.S1_s_at    Mar 11, 2009    CV709816    .    2010-12-10    blastx    | Symbols: SIR | sulfite reductase | chr5:1319404-1322298 FORWARD LENGTH=643    3e-27    70.1    
Cit.10076.1.S1_s_at    Mar 11, 2009    CV709816    .    2009-09-13    blastn    similar to UniRef100_A7NZP8 Cluster: Chromosome chr6 scaffold_3, whole genome shotgun sequence; n=1; Vitis vinifera|Rep: Chromosome chr6 scaffold_3, whole genome shotgun sequence - Vitis vinifera (Grape), partial (15%)    0    97.1    
Cit.10076.1.S1_s_at    Mar 11, 2009    CV709816    .    2008-09-02    blastx        2e-60    85.4    
Cit.10076.1.S1_s_at    Mar 11, 2009    CV709816    .    2010-12-10    blastx    PREDICTED: hypothetical protein [Vitis vinifera]    2e-60    85.4    
Cit.10076.1.S1_s_at    Mar 11, 2009    CV709816    .    2007-11-07    blastn    Sulfite reductase [Prunus armeniaca (Apricot)]    0    97.6    
Cit.10076.1.S1_s_at    Mar 11, 2009    CV709816    .    2012-01-23    blastx    Sulfite reductase [ferredoxin] n=2 Tax=Synechocystis sp. PCC 6803 RepID=SIR_SYNY3    1e-16    52.5    
Cit.10084.1.S1_at    Mar 11, 2009    CF838391    .    2010-12-10    blastx    | Symbols: ATRAB5A, ATRABF2A, RABF2A, RAB5A, RHA1, ATRAB-F2A, RAB-F2A | RAB homolog 1 | chr5:18244495-18246060 FORWARD LENGTH=201    1e-98    89.5    
Cit.10084.1.S1_at    Mar 11, 2009    CF838391    .    2009-09-13    blastn    homologue to UniRef100_Q40570 Cluster: Ras-related GTP-binding protein; n=1; Nicotiana tabacum|Rep:

I am performing network analysis, when i am uploading normalized arrays, I need to upload an annotation file, I downloaded such a file (csv) from affymetrix but after creating the network, the nodes don't have any uniprot id, Refseq id, gene name, and nothing else, just nodes showed by gene symbol equal to the probesets. Then I have to perform GO annotation by symbols that all are non-significant. I need a more informative annotation file.. I found such a file but in text format that can't be accepted by my workbench

ADD REPLY • link updated 17 months ago by Ram 43k • written 8.8 years ago by zizigolu ★ 4.3k

0

Entering edit mode

Out of the columns shown, do you need all of them or just a select few? If a few, which ones? Also, do you have a linux/unix/mac or are you working on Windows? Finally, what program are you loading this file into? You have said workbench a few times, but I'm not clear on what program that is. The answer to these questions will help me figure out a solution that might work for you. Oh, and one more question -- does the program you are using provide a sample input file? If yes can you post a few lines of that so I can see what the end result should be?

ADD REPLY • link updated 17 months ago by Ram 43k • written 8.8 years ago by alolex ▴ 950

0

Entering edit mode

thanks Alolex for paying attention,

I am in windows and working with geworkbench, running ARACNe...I normalized GSE63706 as input then tool asks me a csv annotation file which i downloaded from affametrix, then I used some probsets as hubs and tools created a network that its nodes shown only with gene symbol for example LOC102577933, by which I can't perform not promoter analysis, not nothing and I need convert them, in addition I think performing GO based on probsets as nodes can't be trusted because each probes mapped to more than one gene...then I downloaded a text annotation (mentioned above but a few of columns not all columns I pasted) from PLEXdb that I think I need essential information such as gene name, entrez id, Consensus_ID, GeneBank_Accession and so one but workbench just needs csv... I described some more doubts in https://www.biostars.org/t/myposts/

ADD REPLY • link updated 4.5 years ago by Ram 43k • written 8.8 years ago by zizigolu ★ 4.3k

0

Entering edit mode

I looked up the geworkbench tool you are using as I am not familiar with it, and have not used it. From the documentation here it seems the program is doing what it is supposed to be doing. It looks like it requires the Affy csv file if you are using an Affy array (follow the directions in the "Example of running ARACNe" section). It also says the following about how it uses the annotation file with affy arrays. From your explanation I am guessing that you are selecting the "merge multiple probesets" option. In this case you will only get one node per gene. If you need to probe sets intact you need to unselect this option (see last part below). I'm thinking the display of just gene symbols is what the program is designed to do. If you need it to provide more information I would suggest you post this question on the geworkbench end-user forum if you can't find what you need in the help documents as I'm not familiar with the application. Hopefully this helps you somewhat.

Load the microarray dataset into the Workspace. If available, associate a gene annotation file with the dataset. This will allow the results to be displayed in consolidated fashion in Cytoscape by gene rather than by marker (individual probeset) name.

......

"Merge multiple probesets"

Checking this box will cause interactions to be summarized at the gene level for each hub marker. The links to individual probesets will not be retained. Thus when this option is selected, the adjacency matrix will contain a single line per hub gene. This option depends on an annotation file being loaded along with the microarray dataset.

....

On a microarray analysis platform, genes may be represented by more than one marker (probeset). The mapping between markers and genes is specified in the annotation file, if it is read in at the time that the data is loaded. The ARACNe analysis in geWorkbench is performed at the level of probesets. In some cases, an interaction between two genes may be represented by more than one edge, each such edge involving an alternate probeset for at least one of the genes.

When the "Merge multiple probesets" option is not chosen, the full ARACNe adjacency matrix, as calculated at the probeset level, will be retained and placed as a data node in the Workspace."

ADD REPLY • link updated 17 months ago by Ram 43k • written 8.8 years ago by alolex ▴ 950

0

Entering edit mode

thank you very much Alolex

ADD REPLY • link 8.8 years ago by zizigolu ★ 4.3k

Ram · Answer 1 · 2015-07-21

1

Entering edit mode

8.8 years ago

alolex ▴ 950

csv just means "comma separated value", so just take your text file (delimited by tabs maybe?) and replace all the column separators by commas. Then change the file extension to .csv. Also, I think if you have your annotations in Excel you can just do "save as" and select csv format for the current workbook.

ADD COMMENT • link updated 17 months ago by Ram 43k • written 8.8 years ago by alolex ▴ 950

0

Entering edit mode

thank you Alolex, the file is too dense to be reformat by myself and not in excel

ADD REPLY • link 8.8 years ago by zizigolu ★ 4.3k

1

Entering edit mode

Can you post a sample of the file you are working with and then explain what you want extracted/reformatted? If you can't open it in a text format and you have a mac or linux machine you can just copy and paste the output of head myfile.txt here.

ADD REPLY • link updated 17 months ago by Ram 43k • written 8.8 years ago by alolex ▴ 950