how to rename tree nodes by python
1
0
Entering edit mode
3.4 years ago
mumdooh • 0

Hi, I am trying to build up a phylogenetic tree by iqtree. However, after building the tree, the nodes contain only the subject ID. So I tried to code in Python hoping that I can rename the tree nodes with spices title. I used python dictionary to try match the subject id to the dictionary then have it renamed with its species name. Unfortunately, the code returned the same subject ID not the species name. Can someone please help?

tree file output is:

(XM_625857.1_3361-3407:0.0000000000,(U65981.1_3455-3501:0.0000010000,(M01601_61_000000000-AK68L_1_21:0.0679337469,CP044419.1_717961-718007:0.9567695962):0.0000022960):0.0000023664,XM_662287.1_3361-3407:0.0000000000);

written code:

replace_strings = {'XM 662287.1:3361-3407':'Cryptosporidium hominis TU502 ATPase (Chro.40306) partial mRNA',
                   'XM 625857.1:3361-3407':'Cryptosporidium parvum Iowa II P-type ATpase involved in cation transport (cgd4 2720) partial mRNA'
                ,'U65981.1:3455-3501':'Cryptosporidium parvum P-ATPase gene (CppA-E1) gene complete cds',
                   'M01601:61:000000000-AK68L:1:2108:7181:8437':'M01601:61:000000000-AK68L:1:2108:7181:8437',
                   'CP044419.1:717961-718007':'Cryptosporidium parvum strain IOWA-ATCC chromosome 4'}


with open("/Users/sabir/Desktop/TMP017787read2_resultsFromNCBI_clustalwd.fas.treefile", "r+") as infile:
    # Read each line of file 
    content = infile.readlines()
    new_content = []

    for line in content:
        new_line = line

        for word in replace_strings.items():
        new_line = new_line.replace(str(word), str(replace_strings[word]))
        new_content.append(new_line)

    test = re.match(replace_strings, new_content)

    with open("/Users/sabir/Desktop/testoutfile.treefile", "w") as outfile:
        for line in new_content:
            outfile.write(line)
python tree • 1.5k views
ADD COMMENT
3
Entering edit mode
3.4 years ago
Mensur Dlakic ★ 27k

There are couple of problems here, and I am not sure whether that's just sloppiness on your part or inconsistencies during code formatting, which was done by someone else.

First, items in your list do not match anything in the tree file, so there is nothing to be replaced. There is no XM 662287.1:3361-3407 in your file but there is XM_625857.1_3361-3407 (note the _ after XM and before 3361). Second, your replace line is wrong with regard to target and replacement strings, and it should be indented as well.

Here is my modification of your code, and you will have to replace file names with yours:

replace_strings = {'XM_625857.1_3361-3407':'Cryptosporidium hominis TU502 ATPase (Chro.40306) partial mRNA',
                   'U65981.1_3455-3501':'Cryptosporidium parvum P-ATPase gene (CppA-E1) gene complete cds',
                   'M01601:61:000000000-AK68L:1:2108:7181:8437':'M01601:61:000000000-AK68L:1:2108:7181:8437',
                   'CP044419.1_717961-718007':'Cryptosporidium parvum strain IOWA-ATCC chromosome 4'}

with open('original.nwk', "r+") as infile:
    content = infile.readlines()
    new_content = []

    for line in content:
        new_line = line

        for word in replace_strings.items():
            new_line = new_line.replace(str(word[0]), str(word[1]))
        new_content.append(new_line)

    with open('new.nwk', "w") as outfile:
        for line in new_content:
            outfile.write(line)

When I run it, it makes this file:

(Cryptosporidium hominis TU502 ATPase (Chro.40306) partial mRNA:0.0000000000,(Cryptosporidium parvum P-ATPase gene (CppA-E1) gene complete cds:0.0000010000,(M01601_61_000000000-AK68L_1_21:0.0679337469,Cryptosporidium parvum strain IOWA-ATCC chromosome 4:0.9567695962):0.0000022960):0.0000023664,XM_662287.1_3361-3407:0.0000000000);

By the way, there should be no columns or semi-columns in your species names, because those characters have special meaning in tree files. I would also replace space characters with underscores as most tree-displaying programs convert underscores into space characters.

ADD COMMENT
0
Entering edit mode

Thank you Mensur, I will keep those notes in my mind.

ADD REPLY

Login before adding your answer.

Traffic: 2193 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6