Question

compare two columns of two files

0

Entering edit mode

5.0 years ago

Sam ▴ 150

Dear All

I have two files of gene list in tab format, I want to compare 2nd column if 1st file file with 1st column of 2nd file and report all columns in both files for each match

any help ?

Thanks

file 1 
1st column        2nd column
chr10_17026 VIT_202s0033g00190
chr18_37402 VIT_210s0003g03400
chr19_39195 VIT_200s0313g00080


files 2
1st column                                                  2nd column 3rd column
VIT_200s0313g00072 VIT_200s0313g00080 VIT_200s0313g00066    7.86            6.22
VIT_202s0033g00190  8.48        6.49
VIT_210s0003g03400  8.18        6.05
VIT_210s0116g01020  8.67        5.82
VIT_213s0064g01330  9.02        5.84

output :
chr10_17026 VIT_202s0033g00190     8.48        6.49
chr18_37402 VIT_210s0003g03400     8.18        6.05
chr19_39195 VIT_200s0313g00080     7.86         6.22

awk bash • 1.4k views

ADD COMMENT • link updated 5.0 years ago by curious ▴ 750 • written 5.0 years ago by Sam ▴ 150

0

Entering edit mode

Hello Sam ,

what have you tried so far?

fin swimmer

ADD REPLY • link 5.0 years ago by finswimmer 16k

1

Entering edit mode

another tip is to use sort and join commands.

ADD REPLY • link 5.0 years ago by husensofteng ▴ 410

0

Entering edit mode

One tip, use R. Import the files into R and use subsetting. There are many tutorials, for example here, so try to do it yourself.

ADD REPLY • link 5.0 years ago by Benn 8.3k

score 2 · Answer 1 · 2019-04-23

The most simple way to do it with awk is like this:

awk 'FNR==NR {lookup[$2]; next} {if ($1 in lookup) print}' file1.tsv file2.tsv 
VIT_202s0033g00190  8.48    6.49
VIT_210s0003g03400  8.18    6.05

You may want to tidy up this type of line, though, as its formatting is inconsistent:

VIT_200s0313g00072 VIT_200s0313g00080 VIT_200s0313g00066    7.86            6.22

score 1 · Answer 2 · 2019-04-23

1

Entering edit mode

5.0 years ago

bioguy24 ▴ 230

Another awk:

awk 'NR==FNR{c[$2]++;next};c[$1] > 0' file1 file2

ADD COMMENT • link 5.0 years ago by bioguy24 ▴ 230

score 1 · Answer 3 · 2019-04-23

If you know python you can look at the pandas package. You probably want to perform an inner merge. Learning SQL-like processes can take a long time to get used to, but after a while become second nature. It might not help you now depending on your situation, but you eventually might want to look into pandas. It is my daily driver for working with tab separated data like this. It is incredibly powerful

https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html