 Messy genetic representation (f7)

#### Section-structured, BFS approach.

Patryk Aleksander

### 1. Introduction

The main goal of this project was to come up with a representation of genome that is a string made of A to Z capital

letters only, and can always be converted into a correct genome in the f0 original Framstick representation. Since

the string has no restrictions on syntax the representation is called "messy" and will be denoted as f7.

### 2. Sections

In the process of conversion, input string is treated as a repeating sequence of four sections responsible for:

parts, joints, neurons and connections respectively. Each section starts and ends with a letter which takes the role

of a SectionTag at that particular moment of conversion. First letter of the string is the first

SectionTag. Following letters form the first parts' section (in general a consecutive section), up to the

second occurrence of the SectionTag which terminates the current section (exceptions from this rule are

explained later). Next letter takes the role of a new SectionTag (in general it can be a different letter

every time) and so on... (see example below)

Example 2.1:
Which letters are SectionTags? (NOTE: spaces are added for clarity only)

A QWERTY... AB QWERTY... BC QWERTY... CB QWERTY... BH QWERTY... H...

SectionTags: A, B, C, B, H

Information inside each section is coded in 5-letter long 'chunks' called tokens. Tokens are treated either as

labels or values depending on the situation. In the latter case, value in the base-26 numeral system (26 capital

letters) is recalculated to base-10 numeral system (digits 0 9). A 5 letter word from 26 letter alphabet can take

over 11 million states. The assumed precision is 3 digits for the integer part and 3 digits for the fractional part

(further domain restrictions may apply) which requires 1 million states (-500.000 499.999). Since there are over

11 times more states than required, modulo function is used (see example below).

Example 2.2:
What coordinate does token ZABCD stand for?

Z = 25 (position in the alphabet),
A = 0,
B = 1,
C = 2,
D = 3.

ZABCD = ( (25*26^4 + 0*26^3 + 1*26^2 + 2*26^1 + 3*26^0) % 1000000) / 1000) - 500.0 = (11425131 % 1000000) / 1000 -

500.0 = 425131 / 1000 - 500.0 = -74.869

Stage I. base-26 to base-10 conversion
ZABCD = 25*26^4 + 0*26^3 + 1*26^2 + 2*26^1 + 3*26^0 = 11425131,

Stage II. range wrapping
11425131 % 1000000 = 425131,

Stage III. precision and range adjustment
425131 / 1000 - 500.0 = -74.869

### 3. Parts' section

There are two types of definitions for parts: simplified and extended. Simplified definition consists of 4 tokens:

label and x, y, z coordinates. Label serves as part's identifier. Tokens for coordinates are recalculated to

numbers. If the distance between the next token and current part's label (after recalculating into numbers) is not

exceeding a predefined distance, we are dealing with an extended definition. Three consecutive tokens are

recalculated to numbers and serve as values for properties: friction, ingestion and assimilation respectively. To

protect part definitions from being torn apart too often, only label tokens are checked for section tags. Every part

is stored right after decoding and is created only before the creation of a joint linking it.

### 4. Joints' section

To ensure model consistency this section is decoded in the same way Breadth First Search algorithm goes through a

graph. Each token is recalculated to a number. Each number corresponds with an index of a stored part (if it's too

big it's wrapped). First occurrence (o1) of a specific part starts the branching process until second occurrence

(o2) of that part is reached. Joints link o1 with all parts that are between o1 and o2. If the distance between o2

and the next token is not exceeding a predefined distance, it's an extended definition. Three consecutive tokens

provide the values of: stiffness, rotation stiffness and stamina joint properties. Each part between o1 and o2 is

added to a list and becomes the new o1 after o2 is reached (see example below)

Example 4.1:
What joints will the following sequence create? (NOTE: spaces are added for clarity only)

AAAAA AAAAB AAAAC AAAAD AAAAA AAAAB AAAAE AAAAF AAAAC AAAAG AAAAD AAAAE AAAAF AAAAG

Tokens are recalculated to indices of stored parts:
AAAAA = 0,
AAAAB = 1,
AAAAC = 2,
AAAAE = 4,
AAAAF = 5,
AAAAG = 6.

Following joints in a (part1, part2) notation are added to the model:
(0, 1), (0, 2), (0, 3),
(2, 4), (2, 5),
(3, 6).

### 5. Neurons' section

There are two types of definitions for neurons: simplified and extended. Simplified definition consists of 3 tokens:

label, neuron type and attachment type. Label serves as neuron's identifier. Neuron type is recalculated to a number

corresponding with an index of active (option set in simulator) neurons (if it's too big it's wrapped). If neuron

type token matches label token, neuron of default type "N" is created. Attachment type is recalculated to a

number.
Three cases may occur:

1. neuron of current type must be attached to a part,
2. neuron of current type must be attached to a joint,
3. neuron of current type doesn't need any assignment to body element.

Attachment type corresponds with an index of a part (in the 1st case), of a joint (in the 2nd case), of either a

part or a joint or a dummy element (in the 3rd case). The latter are added to index range to ensure a situation with

no attachment is possible. If attachment type token matches label token, neuron with no assignment to body element

is created by default (assuming 3rd case occurred). If the distance between the next token and current part's label

(after recalculating into numbers) is not exceeding a predefined distance, we are dealing with an extended

definition. Number of consecutive tokens equal to number of properties of current neuron type, are recalculated to

new values for these properties (domain restrictions apply). To ensure correctness of the model, neuron type and

attachment type tokens are not checked for section tags.

Example 5.1:
What neurons will the following sequence create? (NOTE: spaces are added for clarity only)

Additional information: model.getPartCount() = 4, model.getJointCount() = 2, active neurons: N, G, T, S, *, | and @

Tokens are recalculated to numbers:
AAAAA = 0,
AAAAB = 1,
AAAAC = 2,
AAAAE = 4,
AAAAF = 5,
AAAAQ = 16.

Active neurons' indices:
N = 0, G = 1, T = 2, S = 3, * = 4, | = 5, @ = 6.

AAAAA AAAAA AAAAA
label = AAAAA,
neuron type == label => N, (NOTE: default, doesn't need attachment)
attachment type == label => not attached. (NOTE: default)
Created: neuron N, not attached.

label = AAAAB,
neuron type = AAAAD = 3 => S, (NOTE: needs to be attached to body part)
attachment type = AAAAF = 6 => 6 % getPartCount() = 6 % 4 = 2 => part(2).
Created: neuron S, attached to part(2).

AAAAC AAAAE AAAAQ
label = AAAAC,
neuron type = AAAAE = 4 => *, (NOTE: doesn't need attachment)
attachment type = AAAAQ = 16 => 16 % (1.5*(getPartCount() + getJointCount())) = 16 % 9 = 7 => not attached. (NOTE: 0-3 => to part, 4-5 => to joint, 6-8 => not attached)
Created: neuron *, not attached.

### 6. Connections' section

Three consecutive tokens form a connection. Each token is recalculated to a number. First token corresponds with an

index of a neuron (if it's too big it's wrapped), from list of neurons with at least one free input. This neuron

serves as a parent neuron. Second token corresponds with an index of a neuron (if it's too big it's wrapped), from

list of neurons with at least one free output. This neuron is connected as parent neuron's input. Third token serves

as weight (after normalization).

### 7. Square example

Convert the following genotype from f7 into f0
ZAAAAABCLQUBCLQUBCLQUYYYYBBCNDGBCLQUBCLQUAAAACBCNDGBCNDGBCLQUYYYYDBCLQUBCNDGBCLQUZZAAAAAYYYYBAAAAAAAAACYYYYBYYYYDAAAACAAAAAYYYYDAAAAAZ

Z AAAAA BCLQU BCLQU BCLQU YYYYB BCNDG BCLQU BCLQU AAAAC BCNDG BCNDG BCLQU YYYYD BCLQU BCNDG BCLQU Z Z AAAAA YYYYB AAAAA AAAAC YYYYB YYYYD AAAAC AAAAA YYYYD AAAAA Z

SectionTag Z, opens parts' section:
AAAAA - label, BCLQU BCLQU BCLQU - coordinates,
YYYYB - label, BCNDG BCLQU BCLQU - coordinates,
AAAAC - label, BCNDG BCNDG BCLQU - coordinates,
YYYYD - label, BCLQU BCNDG BCLQU - coordinates,

Labels are recalculated as follows:
AAAAA = 0*26^4 + 0*26^3 + 0*26^2 + 0*26^1 + 0*26^0 = 0,
YYYYB = 24*26^4 + 24*26^3 + 24*26^2 + 24*26^1 + 1*26^0 = 11406097,
AAAAC = 0*26^4 + 0*26^3 + 0*26^2 + 0*26^1 + 2*26^0 = 2,
YYYYD = 24*26^4 + 24*26^3 + 24*26^2 + 24*26^1 + 3*26^0 = 11406099,
Threshold for extended part definition is 5940688 (50% of the number of 5 letter permutations)
Distances between consequent labels exceed the threshold, which implies the use of simple part definitions.

There are two values of tokens used: BCLQU and BCNDG. They are recalculated as follows:
Stage I. base-26 to base-10 conversion

BCLQU = 1*26^4 + 2*26^3 + 11*26^2 + 16*26^1 + 20*26^0 = 500000,
BCNDG = 1*26^4 + 2*26^3 + 13*26^2 + 3*26^1 + 6*26^0 = 501000,

Stage II. range wrapping
500000 % 1000000 = 500000,
501000 % 1000000 = 501000,

Stage III. precision and range adjustment
500000 / 1000 - 500.0 = 0.0,
501000 / 1000 - 500.0 = 1.0.

Parts defined in the first section:
0: 0, 0, 0,
1: 1, 0, 0,
2: 1, 1, 0,
3: 0, 1, 0.

SectionTag Z, closes parts' section.

SectionTag Z, opens joints' section:
Labels are recalculated as follows:
AAAAA = 0*26^4 + 0*26^3 + 0*26^2 + 0*26^1 + 0*26^0 = 0,
AAAAB = 0*26^4 + 0*26^3 + 0*26^2 + 0*26^1 + 1*26^0 = 1,
AAAAC = 0*26^4 + 0*26^3 + 0*26^2 + 0*26^1 + 2*26^0 = 2,
AAAAD = 0*26^4 + 0*26^3 + 0*26^2 + 0*26^1 + 3*26^0 = 3,
YYYYB = 24*26^4 + 24*26^3 + 24*26^2 + 24*26^1 + 1*26^0 = 11406097,
YYYYC = 24*26^4 + 24*26^3 + 24*26^2 + 24*26^1 + 2*26^0 = 11406098,
YYYYD = 24*26^4 + 24*26^3 + 24*26^2 + 24*26^1 + 3*26^0 = 11406099,
Threshold for extended joint definition is 5940688 (50% of the number of 5 letter permutations)
Distances between consequent labels exceed the threshold, which implies the use of simple joint definitions.

Range of the labels is wrapped modulo 4 (the number of existing parts -> parts' section):
AAAAA: 0 % 4 = 0,
AAAAB: 1 % 4 = 1,
AAAAC: 2 % 4 = 2,
AAAAD: 3 % 4 = 3,
YYYYB: 11406097 % 4 = 1,
YYYYC: 11406098 % 4 = 2,
YYYYD: 11406099 % 4 = 3.

Joints' section can be rewritten using parts' indices as follows:
0 1 0 2 1 3 2 0 3 0

BFS algorithm turn this sequence into a graph as follows:

empty part list is created []
0 added to the list , created, and processed

1 added to the list [0, 1], created, joint (0, 1) created
0 deleted from list 

1 processed (being next on the list)

2 added to the list [1, 2], created, joint (1, 2) created
1 deleted from list 

2 processed

3 added to the list [2, 3], created, joint (2, 3) created
2 deleted from list 

3 processed

0 added to the list [3, 0], not created (part 0 already exists),
joint (3, 0) created
3 deleted from list 

0 processed

0 deleted from list []

SectionTag Z, closes joints' section.

These four parts and four joints form a square.

Previous realization - description in Polish:

doc