Ignore:
Timestamp:
05/06/23 20:04:18 (12 months ago)
Author:
Maciej Komosinski
Message:

Don't remove trailing '>' from genotypes

File:
1 edited

Legend:

Unmodified
Added
Removed
  • cpp/frams/genetics/f4/f4_general.cpp

    r1234 r1235  
    44
    55// Copyright (C) 1999,2000  Adam Rotaru-Varga (adam_rotaru@yahoo.com), GNU LGPL
    6 // 2018, Grzegorz Latosinski, added support for new API for neuron types and their properties
     6// 2018, Grzegorz Latosinski, added support for new API for neuron types and development checkpoints
    77
    88#include "f4_general.h"
     
    12121212
    12131213        sprint(out);
     1214        len = out.length();
    12141215
    12151216        // very last '>' can be omitted
    1216         len = out.length();
    1217         if (len > 1)
    1218                 if (out[len - 1] == '>') { (out.directWrite())[len - 1] = 0; out.endWrite(); }; //Macko 2023-04 "can be omitted", but it is removed as a rule even in generated genotypes :)
     1217        // MacKo 2023-05: after tightening parsing and removing a silent repair for missing '>' after '#', this is no longer always the case.
     1218        // For genotypes using '#', removing trailing >'s makes them invalid: /*4*/<X><N:N>X#1>> or /*4*/<X><N:N>X#1#2>>> or /*4*/<X><N:N>X#1#2#3>>>> etc.
     1219        // Such invalid genotypes with missing >'s would then require silently adding >'s, but now stricter parsing and clear information about invalid syntax is preferred.
     1220        // See also comments in f4_processRecur() case '#'.
     1221        //if (len > 1)
     1222        //      if (out[len - 1] == '>') { (out.directWrite())[len - 1] = 0; out.endWrite(); }; //Macko 2023-04 "can be omitted" => was always removed in generated genotypes.
     1223
    12191224        // copy back to string
    12201225        // if new is longer, reallocate buf
     
    13121317                        if (end == NULL)
    13131318                                return pos_inout + 1; //error
    1314                         f4_Node *node = new f4_Node("#", par, pos_inout);
     1319                        f4_Node *node = new f4_Node("#", par, pos_inout); //TODO here or elsewhere: gene mapping seems to map '#' but not the following number
    13151320                        node->reps = reps.getInt();
    13161321                        // skip number
     
    13261331                        {
    13271332                                return genot_len + 1; //MacKo 2023-04: report an error, better to be more strict instead of a silent repair (genotype stays invalid but is interpreted and reported as valid) with non-obvious consequences?
    1328                                 //earlier apporach - silently treating this problem (we don't ever see where the error is because it gets corrected in some way here, while parsing the genotype, and error location in the genotype is never reported):
    1329                                 //node = new f4_Node(">", par, genot_len - 1); // check if needed and if this is really the best repair operation; seemed to happen too many times in succession for some genotypes even though they were only a result of f4 operators, not manually created... and the operators should not generate invalid genotypes, right? Or maybe crossover does? Seems like too many #N's for closing >'s; removing #N or adding > helped. Operators somehow don't do it properly sometimes? But F4_ADD_REP adds '>'... (TODO)
     1333                                //earlier approach - silently treating this problem (we don't ever see where the error is because it gets corrected in some way here, while parsing the genotype, and error location in the genotype is never reported):
     1334                                //node = new f4_Node(">", par, genot_len - 1); // Maybe TODO: check if this was needed and if this was really the best repair operation; could happen many times in succession for some genotypes even though they were only a result of f4 operators, not manually created... and the operators should not generate invalid genotypes, right? Or maybe crossover does? Seemed like too many #n's for closing >'s; removing #n or adding > helped. Examples (remove trailing >'s to make invalid): /*4*/<X><N:N>X#1>> or /*4*/<X><N:N>X#1#2>>> or /*4*/<X><N:N>X#1#2#3>>>> etc.
     1335                                // So operators somehow don't do it properly sometimes? But F4_ADD_REP adds '>'... Maybe the rule to always remove final trailing '>' was responsible? (now commented out). Since the proper syntax for # is #n ...repcode... > ...endcode..., perhaps endcode also needs '>' as the final delimiter. If we have many #'s in the genotype and the final >'s are missing, in the earlier approach we would keep adding them here as needed to ensure the syntax is valid. If we don't add '>' here silently, they must be explicitly added or else the genotype is invalid. BUT this earlier approach here only handled the situation where the genotype ended prematurely; what about cases where '>' may be needed as delimiters for # in the middle of the genotype? Or does # always concern all genes until the end, unless explicitly delimited earlier? Perhaps, if the '>' endcode delimiters are not present in the middle of the genotype, we don't know where they should be so the earlier approach would always add them only at the end of the genotype?
    13301336                        }
    13311337                        return 0;  // OK
Note: See TracChangeset for help on using the changeset viewer.