Closed:Joining two files and KEEPING duplicates
2
0
Entering edit mode
5.3 years ago
emilyc ▴ 30

SOLVED:

join -a 1 <file1> <file2> > <resulting_file_3>

Note: File 1 had the "extra" data, denoted by "-a 1"

Hello.

I have two files that need to be joined together, excerpts are below.

File 1

AAO24320    NODE_15_length_1466_cov_5.512403
AAX28387    NODE_56_length_561_cov_2.735178
ABZ84886    NODE_98_length_332_cov_2.086643
ADI18656    NODE_656_length_63_cov_4.750000
ADI19064    NODE_34_length_769_cov_31.782913
AFR11835    NODE_8_length_2031_cov_5.296559
AGC04691    NODE_19_length_1204_cov_12.818973
AGO87813    NODE_74_length_440_cov_2.514286
AGO87862    NODE_106_length_312_cov_5.049046
AJA38639    NODE_41_length_682_cov_9.496013
AOE07606    NODE_48_length_595_cov_62.800000
AOE12508    NODE_121_length_244_cov_26.328042
AOY34458    NODE_26_length_958_cov_3.727575
APG76165    NODE_62_length_513_cov_2.323144
APM23345    NODE_122_length_225_cov_9.600000
APX07692    NODE_107_length_312_cov_1.194553
ASG92535    NODE_18_length_1259_cov_40429.044850
ASM94017    NODE_17_length_1282_cov_7.995925
ASM94017    NODE_51_length_585_cov_1.620755
ASM94072    NODE_4_length_3725_cov_131.130245
AUL77352    NODE_38_length_710_cov_2.054962
AWK77888    NODE_197_length_111_cov_148.160714
AWK77888    NODE_249_length_110_cov_160.890909
AWK77888    NODE_340_length_108_cov_21.528302
AWK77888    NODE_394_length_106_cov_38.470588
AWK77888    NODE_506_length_96_cov_40151.170732
AWK77888    NODE_631_length_65_cov_86.300000
AWK77888    NODE_703_length_58_cov_40284.000000

File 2

AAO24320    218923
AAX28387    6182
ABZ84886    498761
ADI18656    710731
ADI19064    710825
AFR11835    1224515
AGC04691    11987
AGO87813    1343840
AGO87862    1343844
AJA38639    1587550
AOE07606    77133
AOE12508    77133
AOY34458    1911103
APG76165    1922488
APM23345    573
APX07692    680
ASG92535    2016027
ASM94017    2021904
ASM94072    2021869
AUL77352    2067994
AWK77888    2201303

I need to join these two files together so that repeats are not removed. For example in the instance of AWK77888 it occurs 7 times in File 1, and only once in File 2. When I join the files I lose many entries because it automatically is removing the duplicates. I am unsure on how to do this.

I am happy to use something other than "join".

I need my resulting file to look something like:

Resulting File 3

    AWK77888    NODE_197_length_111_cov_148.160714  2201303
    AWK77888    NODE_249_length_110_cov_160.890909  2201303
    AWK77888    NODE_340_length_108_cov_21.528302   2201303
    AWK77888    NODE_394_length_106_cov_38.470588   2201303
    AWK77888    NODE_506_length_96_cov_40151.170732 2201303
    AWK77888    NODE_631_length_65_cov_86.300000    2201303
    AWK77888    NODE_703_length_58_cov_40284.000000 2201303

Thanks in advance for any help and/or suggestions.

join merge sequence • 151 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 2725 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6