EC file parsing
2
0
Entering edit mode
6.4 years ago

Dear all

I have some issue with the below data:

ENTRY       EC 1.1.1.35                 Enzyme
NAME        3-hydroxyacyl-CoA dehydrogenase;
            beta-hydroxyacyl dehydrogenase;
            beta-keto-reductase;
            3-keto reductase;
            3-hydroxyacyl coenzyme A dehydrogenase;
            beta-hydroxyacyl-coenzyme A synthetase;
            beta-hydroxyacylcoenzyme A dehydrogenase;
            beta-hydroxybutyrylcoenzyme A dehydrogenase;
            3-hydroxyacetyl-coenzyme A dehydrogenase;
            L-3-hydroxyacyl coenzyme A dehydrogenase;
            L-3-hydroxyacyl CoA dehydrogenase;
            beta-hydroxyacyl CoA dehydrogenase;
            3beta-hydroxyacyl coenzyme A dehydrogenase;
            3-hydroxybutyryl-CoA dehydrogenase;
            beta-ketoacyl-CoA reductase;
            beta-hydroxy acid dehydrogenase;
            3-L-hydroxyacyl-CoA dehydrogenase;
            3-hydroxyisobutyryl-CoA dehydrogenase;
            1-specific DPN-linked beta-hydroxybutyric dehydrogenase
CLASS       Oxidoreductases;
            Acting on the CH-OH group of donors;
            With NAD+ or NADP+ as acceptor

How can I parse all the multi-line entries of this file like ENTRY, NAME, and CLASS with multi-line entries mentioned above?

bash shell • 1.4k views
ADD COMMENT
1
Entering edit mode
 grep  -E '^(ENTRY|NAME|CLASS)'  input.txt | cut -c 10- | paste - - -
ADD REPLY
0
Entering edit mode

It prints only single line output..I need multi-line output.

ADD REPLY
0
Entering edit mode

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

Please give an example of the desired output.

ADD REPLY
0
Entering edit mode

This does not work..I need all corresponding entries..

ADD REPLY
0
Entering edit mode

Eh? I didn't propose a solution. I think you reacted on the wrong comment.

ADD REPLY
0
Entering edit mode

@OP: Please post expected output.

ADD REPLY
0
Entering edit mode
ENTRY            NAME             CLASS
EC 1.1.1.35   3-hydroxyacyl-CoA dehydrogenase;          Oxidoreductases;
            Acting on the CH-OH group of donors;
            With NAD+ or NADP+ as acceptor
            beta-hydroxyacyl dehydrogenase;
            beta-keto-reductase;
            3-keto reductase;
            3-hydroxyacyl coenzyme A dehydrogenase;
            beta-hydroxyacyl-coenzyme A synthetase;
            beta-hydroxyacylcoenzyme A dehydrogenase;
            beta-hydroxybutyrylcoenzyme A dehydrogenase;
            3-hydroxyacetyl-coenzyme A dehydrogenase;
            L-3-hydroxyacyl coenzyme A dehydrogenase;
            L-3-hydroxyacyl CoA dehydrogenase;
            beta-hydroxyacyl CoA dehydrogenase;
            3beta-hydroxyacyl coenzyme A dehydrogenase;
            3-hydroxybutyryl-CoA dehydrogenase;
            beta-ketoacyl-CoA reductase;
            beta-hydroxy acid dehydrogenase;
            3-L-hydroxyacyl-CoA dehydrogenase;
            3-hydroxyisobutyryl-CoA dehydrogenase;
            1-specific DPN-linked beta-hydroxybutyric d
ADD REPLY
0
Entering edit mode

Again, I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY
0
Entering edit mode
ENTRY            NAME             CLASS
EC 1.1.1.35   3-hydroxyacyl-CoA dehydrogenase;          Oxidoreductases;Acting on the CH-OH group of donors;
        With NAD+ or NADP+ as acceptor
            Acting on the CH-OH group of donors;
            With NAD+ or NADP+ as acceptor
            beta-hydroxyacyl dehydrogenase;
            beta-keto-reductase;
            3-keto reductase;
            3-hydroxyacyl coenzyme A dehydrogenase;
            beta-hydroxyacyl-coenzyme A synthetase;
            beta-hydroxyacylcoenzyme A dehydrogenase;
            beta-hydroxybutyrylcoenzyme A dehydrogenase;
            3-hydroxyacetyl-coenzyme A dehydrogenase;
            L-3-hydroxyacyl coenzyme A dehydrogenase;
            L-3-hydroxyacyl CoA dehydrogenase;
            beta-hydroxyacyl CoA dehydrogenase;
            3beta-hydroxyacyl coenzyme A dehydrogenase;
            3-hydroxybutyryl-CoA dehydrogenase;
            beta-ketoacyl-CoA reductase;
            beta-hydroxy acid dehydrogenase;
            3-L-hydroxyacyl-CoA dehydrogenase;
            3-hydroxyisobutyryl-CoA dehydrogenase;
            1-specific DPN-linked beta-hydroxybutyric d
ADD REPLY
1
Entering edit mode
6.4 years ago

Code (this code would not print enzyme in first column i.e Entry column):

$  awk -F "  " '{if ($1=="") {$1=prev} prev=$1};{gsub("   +","\t")}1' test.txt |  datamash -g 1 collapse 2| datamash transpose  > test2.txt

output:

ENTRY   NAME    CLASS
EC 1.1.1.35 3-hydroxyacyl-CoA dehydrogenase;,beta-hydroxyacyl dehydrogenase;,beta-keto-reductase;,3-keto reductase;,3-hydroxyacyl coenzyme A dehydrogenase;,beta-hydroxyacyl-coenzyme A synthetase;,beta-hydroxyacylcoenzyme A dehydrogenase;,beta-hydroxybutyrylcoenzyme A dehydrogenase;,3-hydroxyacetyl-coenzyme A dehydrogenase;,L-3-hydroxyacyl coenzyme A dehydrogenase;,L-3-hydroxyacyl CoA dehydrogenase;,beta-hydroxyacyl CoA dehydrogenase;,3beta-hydroxyacyl coenzyme A dehydrogenase;,3-hydroxybutyryl-CoA dehydrogenase;,beta-ketoacyl-CoA reductase;,beta-hydroxy acid dehydrogenase;,3-L-hydroxyacyl-CoA dehydrogenase;,3-hydroxyisobutyryl-CoA dehydrogenase;,1-specific DPN-linked beta-hydroxybutyric dehydrogenase   Oxidoreductases;,Acting on the CH-OH group of donors;,With NAD+ or NADP+ as acceptor

If you don't mind enzyme in first column, then the code is:

$ awk -F "  " '{if ($1=="") {$1=prev} prev=$1}1' test.txt   | awk '{$1=$1; sub(" ","\t")}1' |  datamash -g1 collapse 2 | datamash transpose

input:

$ cat test.txt 
ENTRY       EC 1.1.1.35                 Enzyme
NAME        3-hydroxyacyl-CoA dehydrogenase;
            beta-hydroxyacyl dehydrogenase;
            beta-keto-reductase;
            3-keto reductase;
            3-hydroxyacyl coenzyme A dehydrogenase;
            beta-hydroxyacyl-coenzyme A synthetase;
            beta-hydroxyacylcoenzyme A dehydrogenase;
            beta-hydroxybutyrylcoenzyme A dehydrogenase;
            3-hydroxyacetyl-coenzyme A dehydrogenase;
            L-3-hydroxyacyl coenzyme A dehydrogenase;
            L-3-hydroxyacyl CoA dehydrogenase;
            beta-hydroxyacyl CoA dehydrogenase;
            3beta-hydroxyacyl coenzyme A dehydrogenase;
            3-hydroxybutyryl-CoA dehydrogenase;
            beta-ketoacyl-CoA reductase;
            beta-hydroxy acid dehydrogenase;
            3-L-hydroxyacyl-CoA dehydrogenase;
            3-hydroxyisobutyryl-CoA dehydrogenase;
            1-specific DPN-linked beta-hydroxybutyric dehydrogenase
CLASS       Oxidoreductases;
            Acting on the CH-OH group of donors;
            With NAD+ or NADP+ as acceptor
ADD COMMENT
0
Entering edit mode

Thanks for your help..

ADD REPLY
0
Entering edit mode
6.4 years ago

Is there a way to retrieve substrate , Product, enzyme class and interacting pathway using KEGG API and script if you have multiple number of EC numbers and Reaction number from KEGG. EC 1.1.1.35 EC 1.1.1.100 and so on

and rn:R1135 rn:R05233

ADD COMMENT

Login before adding your answer.

Traffic: 2797 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6