arranging columns and rows
1
0
Entering edit mode
6.0 years ago
AP ▴ 80

Hello everyone,

I have File 1 like this with 2 columns:

g4989   2.70224323450382
g4650   2.71483380183318
g11701  2.83907744860811
g11701  2.83907744860811
g3807   2.83912968405616
g17931  2.84821618321646

and File 2 like this with 4 columns

g4989
g4650  Pfam    PF00172 FungalZn(2)-Cys(6)binuclearclusterdomain
g11701  Pfam    PF04082 Fungalspecifictranscriptionfactordomain
g17931  Pfam    PF04082 Fungalspecifictranscriptionfactordomain

Both of the files are tab delimited. File 2 only contains the selective genes from File 1. I want The to add a second column from file1 to file 2 but only for the genes in file two like this:

    g4989     2.70 
    g4650     2.71         Pfam    PF00172 FungalZn(2)-Cys(6)binuclearclusterdomain
    g11701   2.83         Pfam    PF04082 Fungalspecifictranscriptionfactordomain
    g17931   2.84         Pfam    PF04082 Fungalspecifictranscriptionfactordomain

Could you please help me sort this out in linux.

Thank you, Ambika

awk grep bash • 1.2k views
ADD COMMENT
0
Entering edit mode

g11701 is present twice in file1. How should you handle this ?

ADD REPLY
0
Entering edit mode

Yes its present twice, and this is just a sample some of the genes might be present more than that because single gene might have different pfam domains.

ADD REPLY
2
Entering edit mode
6.0 years ago

use join

join -t $'\t' -1 1 -2 1 <(sort -t $'\t' -k1,1 file1.txt ) <(sort -t $'\t' -k1,1 file2.txt )

g11701  2.83907744860811    Pfam    PF04082 Fungalspecifictranscriptionfactordomain
g11701  2.83907744860811    Pfam    PF04082 Fungalspecifictranscriptionfactordomain
g17931  2.84821618321646    Pfam    PF04082 Fungalspecifictranscriptionfactordomain
g4650   2.71483380183318    Pfam    PF00172 FungalZn(2)-Cys(6)binuclearclusterdomain
g4989   2.70224323450382
ADD COMMENT
0
Entering edit mode

Hi Pierre, Thank you but the problem is I got the output file but the number of rows that I have for output file is not the same as in File 2.

ADD REPLY
0
Entering edit mode

it happens if , like in your example, there is a duplicated key : eg: g11701 . see also the option -v and -a of join

ADD REPLY
0
Entering edit mode

Thank you so much for your help.

ADD REPLY
1
Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLY

Login before adding your answer.

Traffic: 2308 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6