Genetic Algoritms
The effects of poolsize and domainsize on the time to solve.
This article is a work in progress. If you
would like to comment or have a question
you can contact me at
(a bit smudged to prevent
automatic harvesting of mail adresses).
Introduction
Genetic algorithms are able to solve many optimization problems and excel in areas where the optimization criteria are many or complex. In the last decade of the 20th century they were quite populair but currently it is difficult to find the sort of practical information that is of interest to programmers. This article aims to provide some insight into the behaviour of a genetic algoritm and specifically to help the programmer to make an educated guess for parameters like e.g. poolsize, retention fraction, etc.
This not a rigorously checked scientific work, but were possible relevant references are given. Furthermore as explained below, this not a toy problem and the source code (in python) is available.
Definition of terms
Like any field of expertise genetic algoritms come with a lingo of their own. A short definition of commonly used terms is therefor usefull.
- Poolsize
- The number of possible solutions simultaneously under consideration
- Domainsize
- The number of variables in a single solution
- Rentention fraction
- The fraction of the best solutions that is used to generate new solutions.
The problem domain
The problem domain chosen is the rostering of classes in a fair sized highschool. With typically fifty or sixty classes each doing up to ten subjects teached by different teachers and with not all classroons suitable for every subject, this is not a task easily undertaken 'by hand'. The task is even more involved if we want to take into account that classes should not have the same subject two hours in a row or have a an 'empty' hour.
Poolsize
If the size of the pool of possible solutions increases the time-to-solve is espected to decrease. However, the time-to-solve is defined as the number of generations neccessary to find a perfect solution and although this number may decrease as the poolsize increases, the time to compute a single generation will certainly increase with the poolsize so to find an optimal size we must know the exact relation, e.g. is it linear.
It is clear that increasing the poolsize does decrease the time to solve but the effect is less than linear. The effect is more pronounced for lower retention fractions.
Domainsize
If the number of classes increases, how will the time-to-solve change if we keep the poolsize constant. We conjecture that this relation will not be linear. We expect that uith an increasing number of classes the time to solve will depend less than linear since changes might easier benefit a class. We think this analogous to the behaviour of sorting lists: a list two times longer does generally not take twice as mwch time to sort (*weak argument*)
Retention fraction
Does it matter what fraction of of the population is retained from generation to generation?
From the illustration emerges an almost linear relation on the retention fraction. This might suggest that any positive contribution of the crossover operator is minimal.
When we produce new schemas from the fraction we retained by crossover or mutation, does it make a difference what the selection probability is from within this set? (e.g. uniform or proportional to the fitness of an individual)
Discussion
Problem size
Contrary to our expectations the time to solve shows an almost linear dependency on the problemsize. The question now is whether our generative algorithm is able to profit from an increasing problemsize.
There are two important parts to the algorithm: mixing (crossover) of offspring of two fit parents and generating new offspring from a single fit parent by mutations.
