compare two tables in files
2
0
Entering edit mode
5.5 years ago
erickfqqa ▴ 20

I have two files in txt format, and I want to compare the first column in the file 1 and the file_2 and see which rows are different in the file_1 and add a column in the file_1 wiht 0 and 1, 0 for different and 1 for equals

file_1

Organism_1
Organism_2
Organism_3
Organism_4
Organism_5
Organism_6
Organism_7
Organism_8
Organism_9
Organism_10
Organism_11

file_2

    Organism_1
    Organism_2
    Organism_3
    Organism_4
    Organism_5
    Organism_6
    Organism_7
    Organism_8
    Organism_9
    Organism_10
    Organism_11
    Organism_12
    Organism_13
    Organism_14
    Organism_15
    Organism_16
    Organism_17
    Organism_18
    Organism_19
    Organism_20
    Organism_21
    Organism_22
    Organism_23
    Organism_24

I want a output like this file_3.txt

Organism_1  1
Organism_2  1
Organism_3  1
Organism_4  1
Organism_5  1
Organism_6  1
Organism_7  1
Organism_8  1
Organism_9  1
Organism_10 1
Organism_11 1
Organism_12 0
Organism_13 0
Organism_14 0
Organism_15 0
Organism_16 0
Organism_17 0
Organism_18 0
Organism_19 0
Organism_20 0
Organism_21 0
Organism_22 0
Organism_23 0
Organism_24 0

I've been trying with this to see which rows are different, but I don't know how to add the binary column

comm <(sort file_1.txt) <(sort file_2.txt)

I hope somebody can help me

perl linux bash • 794 views
ADD COMMENT
1
Entering edit mode
5.5 years ago
comm <(sort file_1.txt ) <(sort file_2.txt) |\
awk -F '\t' '($1!="") {print $1,0;next;}($2!="") {print $2,0;next;} ($3!="") {print $3,1;next;}' |\
sort -V


Organism_1 1
Organism_2 1
Organism_3 1
Organism_4 1
Organism_5 1
Organism_6 1
Organism_7 1
Organism_8 1
Organism_9 1
Organism_10 1
Organism_11 1
Organism_12 0
Organism_13 0
Organism_14 0
Organism_15 0
Organism_16 0
Organism_17 0
Organism_18 0
Organism_19 0
Organism_20 0
Organism_21 0
Organism_22 0
Organism_23 0
Organism_24 0
ADD COMMENT
0
Entering edit mode

I have tried to with your awk code, but I'm getting repeated rows an row with 0 and one row with 1 for example

Organim_12 0 Organism_12 1 Organism_13 0 Organism_13 1

ADD REPLY
1
Entering edit mode
5.5 years ago
$ sort -V test1.txt test2.txt | uniq -ci | sed 's/^\s\+//g'| awk -v OFS="\t" '{print $2,$1-1}'

Organism_1  1
Organism_2  1
Organism_3  1
Organism_4  1
Organism_5  1
Organism_6  1
Organism_7  1
Organism_8  1
Organism_9  1
Organism_10 1
Organism_11 1
Organism_12 0
Organism_13 0
Organism_14 0
Organism_15 0
Organism_16 0
Organism_17 0
Organism_18 0
Organism_19 0
Organism_20 0
Organism_21 0
Organism_22 0
Organism_23 0
Organism_24 0

Using datamash / tsv-utils:

 $ sort -V test1.txt test2.txt |  datamash -s -g 1 count 1 | sort -V | awk -v OFS="\t" '{print $1,$2-1}'
 $ sort -V test1.txt test2.txt |  tsv-summarize --group-by 1 --count  | awk -v OFS="\t" '{print $1,$2-1}'
ADD COMMENT

Login before adding your answer.

Traffic: 2591 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6