Entering edit mode
5.3 years ago
emilyc
▴
30
SOLVED:
join -a 1 <file1> <file2> > <resulting_file_3>
Note: File 1 had the "extra" data, denoted by "-a 1"
Hello.
I have two files that need to be joined together, excerpts are below.
File 1
AAO24320 NODE_15_length_1466_cov_5.512403
AAX28387 NODE_56_length_561_cov_2.735178
ABZ84886 NODE_98_length_332_cov_2.086643
ADI18656 NODE_656_length_63_cov_4.750000
ADI19064 NODE_34_length_769_cov_31.782913
AFR11835 NODE_8_length_2031_cov_5.296559
AGC04691 NODE_19_length_1204_cov_12.818973
AGO87813 NODE_74_length_440_cov_2.514286
AGO87862 NODE_106_length_312_cov_5.049046
AJA38639 NODE_41_length_682_cov_9.496013
AOE07606 NODE_48_length_595_cov_62.800000
AOE12508 NODE_121_length_244_cov_26.328042
AOY34458 NODE_26_length_958_cov_3.727575
APG76165 NODE_62_length_513_cov_2.323144
APM23345 NODE_122_length_225_cov_9.600000
APX07692 NODE_107_length_312_cov_1.194553
ASG92535 NODE_18_length_1259_cov_40429.044850
ASM94017 NODE_17_length_1282_cov_7.995925
ASM94017 NODE_51_length_585_cov_1.620755
ASM94072 NODE_4_length_3725_cov_131.130245
AUL77352 NODE_38_length_710_cov_2.054962
AWK77888 NODE_197_length_111_cov_148.160714
AWK77888 NODE_249_length_110_cov_160.890909
AWK77888 NODE_340_length_108_cov_21.528302
AWK77888 NODE_394_length_106_cov_38.470588
AWK77888 NODE_506_length_96_cov_40151.170732
AWK77888 NODE_631_length_65_cov_86.300000
AWK77888 NODE_703_length_58_cov_40284.000000
File 2
AAO24320 218923
AAX28387 6182
ABZ84886 498761
ADI18656 710731
ADI19064 710825
AFR11835 1224515
AGC04691 11987
AGO87813 1343840
AGO87862 1343844
AJA38639 1587550
AOE07606 77133
AOE12508 77133
AOY34458 1911103
APG76165 1922488
APM23345 573
APX07692 680
ASG92535 2016027
ASM94017 2021904
ASM94072 2021869
AUL77352 2067994
AWK77888 2201303
I need to join these two files together so that repeats are not removed. For example in the instance of AWK77888 it occurs 7 times in File 1, and only once in File 2. When I join the files I lose many entries because it automatically is removing the duplicates. I am unsure on how to do this.
I am happy to use something other than "join".
I need my resulting file to look something like:
Resulting File 3
AWK77888 NODE_197_length_111_cov_148.160714 2201303
AWK77888 NODE_249_length_110_cov_160.890909 2201303
AWK77888 NODE_340_length_108_cov_21.528302 2201303
AWK77888 NODE_394_length_106_cov_38.470588 2201303
AWK77888 NODE_506_length_96_cov_40151.170732 2201303
AWK77888 NODE_631_length_65_cov_86.300000 2201303
AWK77888 NODE_703_length_58_cov_40284.000000 2201303
Thanks in advance for any help and/or suggestions.