gravatar for Mensur Dlakic

23 minutes ago by


There are couple of problems here, and I am not sure whether that's just sloppiness on your part or inconsistencies during code formatting, which was done by someone else.

First, items in your list do not match anything in the tree file, so there is nothing to be replaced. There is no XM 662287.1:3361-3407 in your file but there is XM_625857.1_3361-3407 (note the _ after XM and before 3361). Second, your replace line is wrong with regard to target and replacement strings, and it should be indented as well.

Here is my modification of your code, and you will have to replace file names with yours:

replace_strings = {'XM_625857.1_3361-3407':'Cryptosporidium hominis TU502 ATPase (Chro.40306) partial mRNA',
                   'U65981.1_3455-3501':'Cryptosporidium parvum P-ATPase gene (CppA-E1) gene complete cds',
                   'CP044419.1_717961-718007':'Cryptosporidium parvum strain IOWA-ATCC chromosome 4'}

with open('original.nwk', "r+") as infile:
    content = infile.readlines()
    new_content = []

    for line in content:
        new_line = line

        for word in replace_strings.items():
            new_line = new_line.replace(str(word[0]), str(word[1]))

    with open('new.nwk', "w") as outfile:
        for line in new_content:

When I run it, it makes this file:

(Cryptosporidium hominis TU502 ATPase (Chro.40306) partial mRNA:0.0000000000,(Cryptosporidium parvum P-ATPase gene (CppA-E1) gene complete cds:0.0000010000,(M01601_61_000000000-AK68L_1_21:0.0679337469,Cryptosporidium parvum strain IOWA-ATCC chromosome 4:0.9567695962):0.0000022960):0.0000023664,XM_662287.1_3361-3407:0.0000000000);

By the way, there should be no columns or semi-columns in your species names, because those characters have special meaning in tree files. I would also replace space characters with underscores as most tree-displaying programs convert underscores into space characters.

Source link