Genetic Algoritms

The effects of poolsize and domainsize on the time to solve.

This article is a work in progress. If you would like to comment or have a question you can contact me at mail address
(a bit smudged to prevent automatic harvesting of mail adresses).

Introduction

Genetic algorithms are able to solve many optimization problems and excel in areas where the optimization criteria are many or complex. In the last decade of the 20th century they were quite populair but currently it is difficult to find the sort of practical information that is of interest to programmers. This article aims to provide some insight into the behaviour of a genetic algoritm and specifically to help the programmer to make an educated guess for parameters like e.g. poolsize, retention fraction, etc.

This not a rigorously checked scientific work, but were possible relevant references are given. Furthermore as explained below, this not a toy problem and the source code (in python) is available.

Definition of terms

Like any field of expertise genetic algoritms come with a lingo of their own. A short definition of commonly used terms is therefor usefull.

Poolsize
The number of possible solutions simultaneously under consideration
Domainsize
The number of variables in a single solution
Rentention fraction
The fraction of the best solutions that is used to generate new solutions.

The problem domain

The problem domain chosen is the rostering of classes in a fair sized highschool. With typically fifty or sixty classes each doing up to ten subjects teached by different teachers and with not all classroons suitable for every subject, this is not a task easily undertaken 'by hand'. The task is even more involved if we want to take into account that classes should not have the same subject two hours in a row or have a an 'empty' hour.

Poolsize

If the size of the pool of possible solutions increases the time-to-solve is espected to decrease. However, the time-to-solve is defined as the number of generations neccessary to find a perfect solution and although this number may decrease as the poolsize increases, the time to compute a single generation will certainly increase with the poolsize so to find an optimal size we must know the exact relation, e.g. is it linear.

pool size

It is clear that increasing the poolsize does decrease the time to solve but the effect is less than linear. The effect is more pronounced for lower retention fractions.

Domainsize

If the number of classes increases, how will the time-to-solve change if we keep the poolsize constant. We conjecture that this relation will not be linear. We expect that uith an increasing number of classes the time to solve will depend less than linear since changes might easier benefit a class. We think this analogous to the behaviour of sorting lists: a list two times longer does generally not take twice as mwch time to sort (*weak argument*)

domain size

Retention fraction

Does it matter what fraction of of the population is retained from generation to generation?

retention fraction

From the illustration emerges an almost linear relation on the retention fraction. This might suggest that any positive contribution of the crossover operator is minimal.

When we produce new schemas from the fraction we retained by crossover or mutation, does it make a difference what the selection probability is from within this set? (e.g. uniform or proportional to the fitness of an individual)

Discussion

Problem size

Contrary to our expectations the time to solve shows an almost linear dependency on the problemsize. The question now is whether our generative algorithm is able to profit from an increasing problemsize.

There are two important parts to the algorithm: mixing (crossover) of offspring of two fit parents and generating new offspring from a single fit parent by mutations.

Conclusion

References

Wikipedia is a good starting point, more references will follow.

Source code

The source code for the experiments will be available under a Creative Commons attribution by/noncommercial license. It is a fairly simple program written in Python. The graphics were mostly created with IPython/Scipy.

Extenal links

Sections

Recommended