Entering edit mode
8.2 years ago
mts89
•
0
Hi everyone,
I have a large orthologous tree file (newick format) with one tree per line. I only want the single-copy gene trees (one copy per one species). I tried to select by the ID of the species with awk command but I always modified and damaged the trees.
The simplest tree is like "(", <ID of the species 1>, ".", <name of the gene>, ":", <distance value>, <ID of the species 2>, ".", <name of the gene>, ":", <distance value>, ");":
(ID1_XXXX.geneXXX:0.XX,ID2_XXXX.geneXXX:0.XX);
Any script idea to select the single-copy gene trees?