Merging two files based on the identifier column (gene symbols)
1
0
Entering edit mode
4.6 years ago

Hi,

I have two different *.csv files with different column headers except one column, i.e, one with the gene symbols and expression data (samples), and the other with the gene symbols and phenotypic data/attributes, in both the files, one column (gene symbols) is same. I would like to merge both the files based on mapping with the gene symbol column and save all the data in one file for further data analysis. I would like to know how this could be done.

Thank you,

Toufiq

gene-annotation R • 2.2k views
ADD COMMENT
1
Entering edit mode

Have you read the help page of the merge function?

?merge
ADD REPLY
0
Entering edit mode

Thank you so much. @Benn

ADD REPLY
0
Entering edit mode
ADD REPLY
3
Entering edit mode
4.6 years ago

This can be done in the terminal with the join utility (sort the files on gene symbol first), e.g. join -a1 -a2 file1.csv file2.csv

The -a option is used to keep unpairable lines from the corresponding file, i.e. in case a gene symbol is in one file but not the other.

ADD COMMENT

Login before adding your answer.

Traffic: 2871 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6