|
The "Genetic Algorithm" (GA) is the rough evolution algorithm above more completely specified. We need to answer the questions:
See my page on Common ancestors of all humans and random mating models.
In nature, DNA is a "string" written in an alphabet of 4 "characters" A, T, G and C (the 4 nucleotides). This is the mechanism of inheritance for all life, suggesting all life is from the same family tree (DNA too complex to have evolved multiple times independently. DNA is regarded as the end-product of a long process of pre-DNA evolution).
Simplest chromosome for machine implementation would be string of bits. If chromosome is binary string of length n, this can encode an integer from 0 to 2n-1.
We can divide every decoded number by 2n-1 to normalise this to a real number x in the interval 0 to 1. We can further normalise x to a real number in any interval a to b by taking a + x (b-a).
Can decode it to any real number in some range REAL_MIN to REAL_MAX.
y = 1 / (1 + exp(-x/5))and solving for x:
An actual straight line is no good for this. Why?
Say we encode numbers as binary:
0 000 1 001 2 010 3 011 4 100 5 101 6 110 7 111Note that to get from 3 to 4 we need a massive mutation (of all 3 bits). This can lead to a "barrier" that it is difficult for the GA to pass over.
Gray codes are an attempt to encode numbers such that to go to the next number always entails only 1 bit change. This rule works: "Start with all 0's for 0. For each successive number, successively flip the rightmost bit that produces a new string."
0 000 1 001 2 011 3 010 4 110 5 111 6 101 7 100You could say now, a single bit change will change from 7 to 0. But we had before anyway a single bit change from 4 to 0. And indeed 7 to 3, 8 to 0, etc. Anyway, a small mutation leading to a large (and probably bad) change is not such a problem as a small (desirable) continuous change needing a large mutation.
Or does it? What if 7 is really unfit? i.e. Fit is 3 and 4 only. Then the 7 mutations die immediately. It could get stuck on local minimum, but often you have to be unlucky not to be able to jump barrier somehow.
Instead of an all-or-nothing scheme, we could set up a scheme where fitter individuals were more probable to reproduce, and less fit less probable, as follows. Let the probability of an individual reproducing be:
p = my fitness
---------------------------------
sum of fitnesses of all creatures
With this scheme, fitness must be positive. Q. Why?
In fact, fitnesses must all be greater than zero (not equal). Q. Why?
If fitness is positive, then p is a number between 0 and 1, and the sum of all these probabilities is 1. A separate page shows how to make a choice according to probabilities.
One problem with the above is if there is a large population. Let there be one individual with fitness 1, and 100 individuals with fitness 0.2. Then the probability of the fit individual being chosen is only 1/21. One way around this is the following:
We can choose who reproduces according to a Boltzmann or "soft max" distribution.
Here we use the following. Let individual i have fitness f(i). Then its probability of reproducing, p(i), is:
By varying the temperature T, we can change it from only the very fittest reproduce (perhaps towards the end of run) to almost everybody has a chance, probabilities are almost equal (start of run). To be precise, T large at start of run, small towards end of run. The fittest will always be strictly more probable.
Matters are further complicated by whether we only allow a small percentage of the population reproduce, with multiple children each (and hence probably a varying population size), or whether we take a fixed population size approach like the following:
repeat (population size / 2) times
{
pick individual from whole population according to probabilities
pick individual from whole population according to probabilities
(may be the same individual!)
construct 2 new individuals from these 2 parents
}
entire old population is now replaced by the new population
(keep constant population size)
In this case, in our population of 101 above, with the simple summation fitness function, the fit individual has odds of 1/21 of being picked each time. In total, therefore, it will appear about 5 times in the 100 (rounded to nearest 2) parents of the next generation. So it's not as bad as just a single decision of who reproduces, with odds 1/21. Here there are lots of reproducers, so the fit one will probably reproduce at least once, but the proportion is still 1/21.
This is probably still unsatisfactory. If we want it to be chosen more times, we use a Boltzmann distribution with low temperature T. As T -> 0 the probability of it being picked -> 1 (exercise).
Note: If we adopt the above approach, which is common (because fixed-size structures are usually easier to program that variable-size ones), the population size might have to be even.
On Internet since 1987.