Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

Search:

CA249      CA318      CA425      CA651

w2mind.computing.dcu.ie      w2mind.org


The Genetic Algorithm (GA)

The "Genetic Algorithm" (GA) is the rough evolution algorithm above more completely specified. We need to answer the questions:

  1. How is the DNA encoded? Binary bitstrings? Program code?
  2. What is the definition of a random genotype? A random program?
  3. What does sex do to the genotype? (how do you combine 2 DNA sequences)

  4. What do mutations do to it? e.g. With binary string, mutation means flip bit.
  5. What is the mutation rate? e.g. Binary DNA, each bit has probability pm of mutating during reproduction. What happens if pm = 1? What happens if pm = 0?

  6. Do only the top n percent get to reproduce? Or is it just a declining probability of reproduction as you go down?
  7. Is there a variable population size? Or do we replace the old unfit with the new and keep a constant population size?

  8. Is there geography (you mate with greater probability with fit creatures nearby in the world) or not (you mate with the fittest in the whole population, wherever they are)?

    See my page on Common ancestors of all humans and random mating models.

  9. Is there sexual selection? (i.e. the individuals choose their own mates according to preferences that can be encoded in the genes)




Encoding the Genotype


Binary Chromosome

In nature, DNA is a "string" written in an alphabet of 4 "characters" A, T, G and C (the 4 nucleotides). This is the mechanism of inheritance for all life, suggesting all life is from the same family tree (DNA too complex to have evolved multiple times independently. DNA is regarded as the end-product of a long process of pre-DNA evolution).

Simplest chromosome for machine implementation would be string of bits. If chromosome is binary string of length n, this can encode an integer from 0 to 2n-1.

We can divide every decoded number by 2n-1 to normalise this to a real number x in the interval 0 to 1. We can further normalise x to a real number in any interval a to b by taking a + x (b-a).

Can decode it to any real number in some range REAL_MIN to REAL_MAX.

Remember, it is 1-way: We decode a chromosome to a number (or something else). We never encode a number as a chromosome.


Decoding to minus infinity to plus infinity

Given a real number y from 0 to 1, we can normalise this to any real number x from minus infinity to plus infinity by taking the inverse of a very linear sigmoid function, for example setting:
  y = 1 / (1 + exp(-x/5))
and solving for x:


An actual straight line is no good for this. Why?




Gray codes

Say we encode numbers as binary:

0 000
1 001
2 010
3 011
4 100
5 101
6 110
7 111
Note that to get from 3 to 4 we need a massive mutation (of all 3 bits). This can lead to a "barrier" that it is difficult for the GA to pass over.

Gray codes are an attempt to encode numbers such that to go to the next number always entails only 1 bit change. This rule works: "Start with all 0's for 0. For each successive number, successively flip the rightmost bit that produces a new string."

0 000
1 001
2 011
3 010
4 110
5 111
6 101
7 100
You could say now, a single bit change will change from 7 to 0. But we had before anyway a single bit change from 4 to 0. And indeed 7 to 3, 8 to 0, etc. Anyway, a small mutation leading to a large (and probably bad) change is not such a problem as a small (desirable) continuous change needing a large mutation.


Jumping barriers

Evolution has a way of "jumping" barriers anyway. e.g. In original coding, 3 can't get to 4, but it jumps to 7, and then declines to 4.

Or does it? What if 7 is really unfit? i.e. Fit is 3 and 4 only. Then the 7 mutations die immediately. It could get stuck on local minimum, but often you have to be unlucky not to be able to jump barrier somehow.




Fitness - Who reproduces?

Instead of an all-or-nothing scheme, we could set up a scheme where fitter individuals were more probable to reproduce, and less fit less probable, as follows. Let the probability of an individual reproducing be:

p = my fitness
    ---------------------------------
    sum of fitnesses of all creatures

With this scheme, fitness must be positive. Q. Why?

In fact, fitnesses must all be greater than zero (not equal). Q. Why?


If fitness is positive, then p is a number between 0 and 1, and the sum of all these probabilities is 1. A separate page shows how to make a choice according to probabilities.

One problem with the above is if there is a large population. Let there be one individual with fitness 1, and 100 individuals with fitness 0.2. Then the probability of the fit individual being chosen is only 1/21. One way around this is the following:




Boltzmann "soft max" probability distribution

We can choose who reproduces according to a Boltzmann or "soft max" distribution.

Here we use the following. Let individual i have fitness f(i). Then its probability of reproducing, p(i), is:

By varying the temperature T, we can change it from only the very fittest reproduce (perhaps towards the end of run) to almost everybody has a chance, probabilities are almost equal (start of run). To be precise, T large at start of run, small towards end of run. The fittest will always be strictly more probable.




How is the new population built?

Matters are further complicated by whether we only allow a small percentage of the population reproduce, with multiple children each (and hence probably a varying population size), or whether we take a fixed population size approach like the following:


repeat (population size / 2) times 
{
  pick individual from whole population according to probabilities
  pick individual from whole population according to probabilities
   (may be the same individual!)
  construct 2 new individuals from these 2 parents 
}

entire old population is now replaced by the new population
 (keep constant population size)

In this case, in our population of 101 above, with the simple summation fitness function, the fit individual has odds of 1/21 of being picked each time. In total, therefore, it will appear about 5 times in the 100 (rounded to nearest 2) parents of the next generation. So it's not as bad as just a single decision of who reproduces, with odds 1/21. Here there are lots of reproducers, so the fit one will probably reproduce at least once, but the proportion is still 1/21.

This is probably still unsatisfactory. If we want it to be chosen more times, we use a Boltzmann distribution with low temperature T. As   T -> 0   the probability of it being picked   -> 1   (exercise).

Note: If we adopt the above approach, which is common (because fixed-size structures are usually easier to program that variable-size ones), the population size might have to be even.



Feeds      HumphrysFamilyTree.com

Bookmark and Share           On Internet since 1987.