Interpro Parentchildtreefile -> .Obo File
1
0
Entering edit mode
12.2 years ago
boczniak767 ▴ 850

I need .obo file with InterPro annotations. From obofoundry I know that that file was available some time ago. Now link to it is dead.

I've seed that OBO-Edit has ability to create .obo files even from information extracted from articles article here but haven't found info. how transform text file to obo.

I wonder if someone knows how to transform InterPro ParentChildTreeFile, representative part of it:

IPR000971::Globin, subset::
--IPR001032::Leghaemoglobin::
--IPR002335::Myoglobin::
--IPR002336::Erythrocruorin::
----IPR011367::Haemoglobin, polymeric::
--IPR002337::Haemoglobin, beta::
--IPR002338::Haemoglobin, alpha::
----IPR002339::Haemoglobin, pi::
------IPR018331::Haemoglobin alpha chain::

to either .obo type file or just to two-column txt file where terms (only IPR ids) are paired to higher level term. e.g. for above excerpt:


IPR001032 isa: IPR000971
IPR002335 is
a: IPR000971
IPR002336 isa: IPR000971
IPR011367 is
a: IPR002336
IPR002337 isa: IPR000971
IPR002338 is
a: IPR000971
IPR002339 isa: IPR002338
IPR018331 is
a: IPR002339

I've tried to make such a file by transforming ParentChild... to file with each full branch in one line, extracting lines containing ---- swapping fields so last become first and first become second and adding "is_a:" between. The same for lines without ---- but with --. It looks ok (after filtering-out lines where 1st field equals 2nd) but then I realized that there are lower level terms (i.e. starting with more hyphens, up to seven), so it would require further divison of file and repeated grep and awk commands. All in all it is very error-prone procedure.
I wrote to EBI with question about interpro.obo file, if I have any info I'll share it.

annotation ontology • 2.9k views
ADD COMMENT
1
Entering edit mode
12.2 years ago

The following java program should convert your input:

import java.io.*;
import java.util.*;

public class Biostar16745
    {
    public static void main(String args[]) throws Exception
        {
        Stack<String> stack=new Stack<String>();    
        BufferedReader in=new BufferedReader(new InputStreamReaderSystem.in));
        String line;
        while((line=in.readLine())!=null)
            {
            int expectsize=1;
            while(line.startsWith("--"))
                {
                expectsize++;
                line=line.substring(2);
                }
            int colon=line.indexOf("::");
            line=line.substring(0,colon);
            while(expectsize<=stack.size())
                {
                stack.pop();
                }
            stack.add(line);

            if(stack.size()>1)
                {
                System.out.println(
                    stack.get(stack.size()-1)+
                    "\tis_a\t"+
                    stack.get(stack.size()-2)
                    );
                }

            }
        in.close();
        }
    }

example:

javac Biostar16745.java && curl -s "ftp://ftp.ebi.ac.uk/pub/databases/interpro/ParentChildTreeFile.txt" | java Biostar16745

Result:

IPR013655   is_a    IPR000014
IPR013656   is_a    IPR000014
IPR013767   is_a    IPR000014
IPR018081   is_a    IPR000020
IPR001840   is_a    IPR018081
IPR001887   is_a    IPR000026
IPR005698   is_a    IPR000032
IPR018031   is_a    IPR000034
IPR016491   is_a    IPR000038
IPR008113   is_a    IPR016491
ADD COMMENT
0
Entering edit mode

Thanks, it works. I just have to install "gcj-4.5-jdk", without it there was error: "Syntax error, parameterized types are only available if source level is 1.5" pointing to <String> in eighth line.

ADD REPLY
0
Entering edit mode

I use Oracle/SUN java compiler. The current release is 7.0, I suppose that your java compiler was very old.

ADD REPLY

Login before adding your answer.

Traffic: 2488 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6