Simulating Weasel Evolution

The following content does not necessarily represent the opinion of my employer. All posts on this website are solely my opinion.

I mentioned in my last post a rule of biology and computer science: simple rules lead to complex behavior. I demonstrated this with Conway’s game of life, but there are numerous other solid examples, including the weasel program, a means of simulating how minor mutations in genetic code can lead to major evolutionary strides.

Consider the following example: a frog has 100 offspring. Each of those offsprings’s genes have some mutations. Some of those mutations have no significant effects, some are deleterious, and some actually increase the individual’s reproductive fitness. Those 100 offspring will eventually compete to mate, and those with higher reproductive fitness are more likely to have more offspring (that is what fitness measures).

Let’s say we have some means of measuring fitness numerically; mutations that make an individual more likely to survive and reproduce result in a higher fitness score, while those that are actively harmful to the individual’s chances of reproduction result in a lower fitness score.

Now, let’s say we are trying to model this process of reproduction and competition with artificial organisms. In real life, evolution has no clear goal–it simply occurs, open-ended, and can lead to adaptations. Chaos causes order. For our model, though, let’s say we have a target, and we want to measure fitness by how close a given individual is to that target. In this case, the target will be a simple sentence, METHINKS IT IS LIKE A WEASEL (Shakespeare buffs will recognize this line from Hamlet), for our purposes in all capital letters and without punctuation.

We will start with our mock organism: a string made of random letters, the same length as our target. We’ll assign this string a score, f, where f is increased by 1 for each letter that matches the target. The string “ZZZZZZZZZZZZZZZZZZZZZZZZZZZZ” would have a score of 0, “METHRNOS HXUESULIHE L NEALEH” would have a score of 16, and “METHINKS IT IS LIKE A WEASEL” would have the maximum score of 28.

Now, let’s simulate reproduction. We’ll assume our mock organisms reproduce asexually: each one simply splits into a number of offspring, o. The parent doesn’t ‘die’ upon reproducing, either; it will also compete with its offspring. For every letter in the organism–every character of its DNA–we’ll have a random chance of the letter mutating into another random letter. Sometimes this increases fitness, sometimes it decreases it, sometimes it is neutral. Correct letters are not locked in, after all; mutations can affect any letter.

Now, let’s say we have an extremely hostile environment for our organisms: only the single organism with the highest fitness will survive to reproduce. We can model this by choosing the individual with the highest fitness, discarding all others, and having that fittest organism reproduce. This will repeat over some number of generations, g, until, eventually, it results in an organism with a maximum possible fitness of 28. Note that a larger value of o results in a smaller average number of generations necessary to reach f = 28.

Messing with the mutation rate–the chance a letter will change–can also have a significant impact on the number of generations needed to reach out target phrase. With a 5% mutation rate, a 5-child version of the program will reach its goal in 1000 generations (on average). A mutation rate as low as 1% results in an average of 2500 generations. Having too high a mutation rate is a problem as well; a mutation rate of 10% results in a similar average number of generations to that of 1%. Something around 5% seems ideal.

For an example execution, see this 5-child, 5% mutation run, where each line is of the format GENERATION CHILD SCORE:

0 BLIWFYZRGNVWQYCTYWA ELJEZCJF 2

166 MZQHPNKS YTFIS BIKJ AGWEASEN 19

332 METHTNKS ITNIS UIKE ANWEASEL 24

498 METHINKS IT IS LIKE AAWEASEL 27

664 METHINKS IT IS LIKE AJWEASEL 27

830 METHINKS IT IS LIKE AFWEASEL 27

996 METHINKS IT IS LIKE ALWEASEL 27

1037 METHINKS IT IS LIKE A WEASEL 28

5-child program successful on generation 1037.

This is obviously an extremely simplistic, reductive model of evolution; and yet, it is helpful as a tool for understanding a fundamental truth about the process. Critics of evolution have often argued that it is impossible for evolution by natural selection to result in organisms as complex as humans, arising by random chance alone; and yet, simulations of evolution by natural selection for advantageous mutations tend to work extremely well. Simple rules lead to complex behaviors. Point mutations–single-base changes in an organism’s DNA–can stack up to the point where they result in major changes.

If you want to try my implementation of the Weasel program, you can run it in the browser here, download the source code here, and read more about the thought behind the Weasel program here.

As a postscript, you might notice that runs with different numbers of children report their status on different generations. The 5-child run reports every 166 generations, 1-child reports every 815, 50-child every 20, and 100-child every 12. This is because I wanted each run, regardless of number of children, to report roughly the same number of times. I also wanted it to report in real time, not after the fact. To do this, I needed some function of the number of children in order to determine when to print, preferably one that would scale logarithmically with the number of children. After some testing, I ended up with the following, where i is the number of the current generation, c is the number of children, and m is the maximum number of generations the program will simulate:

(i mod floor(4 * (1 + (log base(1.05^c) of 2m)))) == 0

If this seems fairly arbitrary, that’s because it is. It’s just what I found worked for me. Using m is not ideal; I should scale instead off some function of the number of letters in the goal string. For a 28-letter goal, though, this works fine.

Posted February 10, 2023