awk syntax error
1
0
Entering edit mode
5.4 years ago
bgold04 • 0

I am just not seeing what my problem is here:

FILE

Y   555381  555382  256191  .   A   AGCCCCCCGCGC    .   ALLELEID    257892  CLNDISDB    MedGen  CN169374    CLNDN   not_specified   CLNHGVS NC_000024.9 !g.555387_555397dupCCCGCGCGCCC  CLNREVSTAT  criteria_provided,_single_submitter CLINSIG %Likely_benign  CLNVC   Duplication CLNVCSO SO  1000035 GENEINFO    @SHOX
X   585120  585121  265992  .   C   G   .   ALLELEID    260809  CLNDISDB    MedGen  C1845118,OMIM   300582,Orphanet ORPHA314795 CLNDN   Short_stature,_idiopathic,_X-linked CLNHGVS NC_000023.10    !g.585121C>G    CLNREVSTAT  no_assertion_criteria_provided  CLINSIG %Likely_benign  CLNVC   single_nucleotide_variant   CLNVCSO SO  0001483 GENEINFO    @SHOX

COMMAND & SYNTAX ERROR

awk -F '\t' '{for(i=1 i < NF;i++ && j=1 j < NF;j++ && k=1 k < NF;k++) if($i ~ /@/ && $j ~ /!/ && $k ~ /%/) {print $i"\011"$j"\011"$k}}' FILE > OUTPUT
awk: cmd. line:1: {for(i=1 i < NF;i++ && j=1 j < NF;j++ && k=1 k < NF;k++) if($i ~ /@/ && $j ~ /!/ && $k ~ /%/) {print $i"\011"$j"\011"$k}}
awk: cmd. line:1:                                                    ^ syntax error
awk: cmd. line:1: {for(i=1 i < NF;i++ && j=1 j < NF;j++ && k=1 k < NF;k++) if($i ~ /@/ && $j ~ /!/ && $k ~ /%/) {print $i"\011"$j"\011"$k}}
awk: cmd. line:1:       

                                             ^ syntax error

The interpreter is not liking the semi-colon between NF;k++ and the subsequent closed parenthesis ). Aide appreciated.

software error • 3.4k views
ADD COMMENT
5
Entering edit mode

Hello bgold04,

In the past, you have edited your posts and removed content after you got help. Please do not repeat that behavior. If such behavior is seen this time, your account will be suspended.

ADD REPLY
0
Entering edit mode
for(i=1 i < NF;i++ && j=1 j < NF;j++ && k=1 k < NF;k++)

for loop in awk is defined in the manual ( https://www.gnu.org/software/gawk/manual/html_node/For-Statement.html ) as:

for (initialization; condition; increment)

where are those 3 parts in your code ?

ADD REPLY
0
Entering edit mode

initialization: set i, j, k counters one for each of the desired word prefixes, condition: select the word prefixes @, ! and % and print them out. Go to the next line until the file is exhausted. No?

ADD REPLY
0
Entering edit mode

initialization: set i, j, k

No?

no. And awk doesn't support multi initialization in the for loop. https://www.gnu.org/software/gawk/manual/html_node/For-Statement.html

It isn’t possible to set more than one variable in the initialization part without using a multiple assignment statement such as ‘x = y = 0’.

ADD REPLY
0
Entering edit mode

OK, back to drawing board. Thanks.

ADD REPLY
0
Entering edit mode

You have four subsections in your for loop definition, which only allows three. You are getting a syntax error on the third semi-colon. Remove the unnecessary section.

ADD REPLY
0
Entering edit mode

Alex, it just hangs when I do this..., Pierre is right, I need another approach.

awk -F '\t' '{for(i=1 i < NF;i++ && j=1 j < NF;j++) if($i ~ /@/ && $j ~ /!/) {print $i"\011"$j}}' FILE  > OUTPUT
ADD REPLY
0
Entering edit mode

I am having difficulty understanding what this is trying to do.

Can you show what your expected output should look like, in comparison with the input?

Pairing a single line of input with the expected output would be useful.

ADD REPLY
0
Entering edit mode

Hi! Here is the hoped for output from the FILE given above:

!g.555387_555397dupCCCGCGCGCCC  %Likely_benign  @SHOX
!g.585121C>G    %Likely_benign  @SHOX
ADD REPLY
0
Entering edit mode

Thanks, this is a beautiful thing. I would be happy to discuss the one time I did this with you privately.

ADD REPLY
2
Entering edit mode
5.4 years ago

Given the tab-delimited file in.txt:

Y       555381  555382  256191  .       A       AGCCCCCCGCGC    .       ALLELEID        257892  CLNDISDB        MedGen  CN169374        CLNDN   not_specified   CLNHGVS NC_000024.9     !g.555387_555397dupCCCGCGCGCCC  CLNREVSTAT      criteria_provided,_single_submitter     CLINSIG %Likely_benign  CLNVC   Duplication     CLNVCSO SO      1000035 GENEINFO        @SHOX
X       585120  585121  265992  .       C       G       .       ALLELEID        260809  CLNDISDB        MedGen  C1845118,OMIM   300582,Orphanet ORPHA314795     CLNDN   Short_stature,_idiopathic,_X-linked     CLNHGVS NC_000023.10    !g.585121C>G    CLNREVSTAT      no_assertion_criteria_provided  CLINSIG %Likely_benign  CLNVC   single_nucleotide_variant       CLNVCSO SO      0001483 GENEINFO        @SHOX

You could just look for fields that start with your characters of interest (!, %, and @), print out a tab-delimited line of those fields, and then strip the trailing tab:

$ awk -F'\t' '{ for(i = 1; i <= NF; i++) { if ($i ~ /^[!%@]/) { printf("%s\t", $i); } } printf("\n"); }' in.txt | awk '{ sub(/\t$/, "", $0); print $0; }' > out.txt

The output file out.txt:

$ cat out.txt
!g.555387_555397dupCCCGCGCGCCC  %Likely_benign  @SHOX
!g.585121C>G    %Likely_benign  @SHOX

Use cat -te to sanity-check output, so that you can be sure that it has the delimiters and line-ending characters that you need.

Note: It has come to my attention through moderator channels that you delete questions once answered. On a forum where we give our time and expertise freely, it is important that we have a history of questions and answers we can point users to in the future. Please do not do this again.

ADD COMMENT

Login before adding your answer.

Traffic: 2434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6