Question

Merging same one column different files

0

Entering edit mode

3.3 years ago

mel22 ▴ 100

Hello, Please I would like to merge one same column to multiple different files. Those files have the same structure but from different samples and I want to merge them with snp position column each file separately.
Is there any kind of loop (bash, R , python ...) that could do this ?

input files

RS  1-51.Log R Ratio    1-51.B Allele Freq
A28         -0.1656                     1

column :

RS       Position
A28      5555

Output:

RS  1-51.Log R Ratio    1-51.B Allele Freq        Position
A28         -0.1656                     1          5555

Thank you very much

merge bash • 842 views

ADD COMMENT • link updated 3.3 years ago by Pierre Lindenbaum 161k • written 3.3 years ago by mel22 ▴ 100

1

Entering edit mode

with tsv-utils :

input:

$ cat first_file.txt second_file.txt 

RS  Position
A28 5555
RS  1-51.Log_R_Ratio    1-51.B_Allele_Freq
A28 -0.1656 1

output:

$ tsv-join -H -f first_file.txt -k RS   --write-all  -1 -a Position second_file.txt

RS  1-51.Log_R_Ratio    1-51.B_Allele_Freq  Position
A28 -0.1656 1   5555

ADD REPLY • link 3.3 years ago by cpad0112 21k

0

Entering edit mode

You should provide the first few lines in each file and an example of the desired output. It would be difficult to answer the question without this information.

ADD REPLY • link 3.3 years ago by rpolicastro 13k

score 2 · Answer 1 · 2020-12-14

2

Entering edit mode

3.3 years ago

Carlo Yague 8.6k

Looks like a task for the merge function in R. See ?merge

ADD COMMENT • link 3.3 years ago by Carlo Yague 8.6k

1

Entering edit mode

Adding to Carlo's point:

Use list.files to get a list of file names/locations - ths will be the list of files to read.
Use lapply with read.table on the above list to get a list of data.frame objects with the content of each file. You will want to uniquify column names in individual data.frames so merging would not create suffixes on similarly names columns.
Use Reduce with merge to combine the list from step-2 to get a single data frame.

ADD REPLY • link 3.3 years ago by Ram 43k

0

Entering edit mode

Thanks for elaborating my rather blunt answer ! I just wanted to ad that by default, the merge is performed based on the the columns with identical names, so depending on the case, it might not be needed to uniquify column names.

ADD REPLY • link 3.3 years ago by Carlo Yague 8.6k

0

Entering edit mode

That is both a pro and a con. When merging multiple output files from RSEM, for example, such a merge would be faulty. IMO a merge should always be done with explicit column name specification.

ADD REPLY • link 3.3 years ago by Ram 43k

score 1 · Answer 2 · 2020-12-14

1

Entering edit mode

3.3 years ago

Pierre Lindenbaum 161k

join -t $'\t' -1 1 -2 1 <(sort -t $'\t' -k1,1 file1.tsv) <(sort -t $'\t' -k1,1 file2.tsv)

ADD COMMENT • link 3.3 years ago by Pierre Lindenbaum 161k