Biostar Beta. Not for public use.
Question: compare two columns of two files
0
Entering edit mode

Dear All

I have two files of gene list in tab format, I want to compare 2nd column if 1st file file with 1st column of 2nd file and report all columns in both files for each match

any help ?

Thanks

file 1 
1st column        2nd column
chr10_17026 VIT_202s0033g00190
chr18_37402 VIT_210s0003g03400
chr19_39195 VIT_200s0313g00080


files 2
1st column                                                  2nd column 3rd column
VIT_200s0313g00072 VIT_200s0313g00080 VIT_200s0313g00066    7.86            6.22
VIT_202s0033g00190  8.48        6.49
VIT_210s0003g03400  8.18        6.05
VIT_210s0116g01020  8.67        5.82
VIT_213s0064g01330  9.02        5.84

output :
chr10_17026 VIT_202s0033g00190     8.48        6.49
chr18_37402 VIT_210s0003g03400     8.18        6.05
chr19_39195 VIT_200s0313g00080     7.86         6.22
ADD COMMENTlink 10 months ago Sam • 110 • updated 10 months ago brett.vanderwerff • 100
Entering edit mode
0

Hello Sam ,

what have you tried so far?

fin swimmer

ADD REPLYlink 10 months ago
finswimmer
11k
Entering edit mode
1

another tip is to use sort and join commands.

ADD REPLYlink 10 months ago
husensofteng
• 80
Entering edit mode
0

One tip, use R. Import the files into R and use subsetting. There are many tutorials, for example here, so try to do it yourself.

ADD REPLYlink 10 months ago
Benn
6.9k
2
Entering edit mode

The most simple way to do it with awk is like this:

awk 'FNR==NR {lookup[$2]; next} {if ($1 in lookup) print}' file1.tsv file2.tsv 
VIT_202s0033g00190  8.48    6.49
VIT_210s0003g03400  8.18    6.05

You may want to tidy up this type of line, though, as its formatting is inconsistent:

VIT_200s0313g00072 VIT_200s0313g00080 VIT_200s0313g00066    7.86            6.22
ADD COMMENTlink 10 months ago Kevin Blighe 43k
1
Entering edit mode

Another awk:

awk 'NR==FNR{c[$2]++;next};c[$1] > 0' file1 file2
ADD COMMENTlink 10 months ago bioguy24 • 190
1
Entering edit mode

If you know python you can look at the pandas package. You probably want to perform an inner merge. Learning SQL-like processes can take a long time to get used to, but after a while become second nature. It might not help you now depending on your situation, but you eventually might want to look into pandas. It is my daily driver for working with tab separated data like this. It is incredibly powerful

https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

ADD COMMENTlink 10 months ago brett.vanderwerff • 100

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0