Entering edit mode
5.9 years ago
3335098459
▴
30
As I am new to R, this question may seem to you piece of a cake. I have a data in txt format. The first column has Cluster Number and the second column has names of different organisms. For example:
- 0 org4|gene759
- 1 org1|gene992
- 2 org1|gene1101
- 3 org4|gene757
- 4 org1|gene1702
- 5 org1|gene989
- 6 org1|gene990
- 7 org1|gene1699
- 9 org1|gene1102
- 10 org4|gene2439
- 10 org1|gene1374
I need to re-arrange/reshape the data in following format.
Cluster No. Org 1 Org 2 org3 org4
- 0 0 0 1
- 1 0 0 0
I could not figure out how to do it in R. Thanks
clusters in example input are 12 (0 and 11 - 10 repeated) and expected output has only two clusters and organisms in input are 2 and in expected output are 4. Can you post matching input and output?
This can be done outside R. I replaced | with tabs using sed, renumbered rows as there were duplicates and added few more clusters to example OP data (given below).
in R: output with modified OP data:
outside R (with datamash and miller), output with modified OP data:
modified data: