ArrayMiner uses the advanced
optimization technique of Grouping Genetic Algorithm (GGA) to supply clustering
solutions of high quality. The general GGA technique has been described
in the book "Genetic Algorithms and Grouping Problems" (Wiley,
1998) by Dr. Falkenauer.
The GGA is a gradual improvement technique. In practice, this means that
- as soon as the algorithm has been initialized, your data are shown
in the graphic windows. However, no clustering has been performed yet.
- when you click the "Optimize"
button (red worker icon), the optimization process starts. During the
optimization, ArrayMiner uses a semi-stochastic process to iteratively
construct new clustering solutions, using the mechanism of the GGA enhanced
by special heuristics targeted specifically to the clustering problem.
This process can be shown to generate high-quality solutions with high
probability. The clustering
criterion selected by the user is used to evaluate each newly constructed
solution and identify the best ones. Each time a solution is generated
that is better than any of the solutions found up to that point, it
becomes ArrayMiner's current solution.
At any point of the optimization process,
ArrayMiner can generate the optimal solution to the clustering problem,
given the clustering criterion selected by the user. Since it is the optimal
solution, it is the best one possible, i.e., it cannot be improved upon.
However, as observed in the previous section, it is impossible to know
with certainty whether a given solution is the optimal one, so the optimization
continues trying to find a better one still.
Obviously, since it is impossible to identify the best possible solution
when it is found, the optimization process could run forever. In practice,
it is observed that the probability of improving upon any generated solution
decreases with the number of solutions examined by the GGA, i.e. with
the successive iterations of the algorithm.
ArrayMiner takes advantage of this observation to signal
when it is appropriate to stop the algorithm. When a solution is not improved
upon over a number of iterations of the optimization process, the user
is advised that the probability of further improvement is sufficiently
low and the best solution so far can be taken for the final one, and the
algorithm is stopped.
Given the computational resources at his or her disposal,
the user may decide to stop the algorithm at this point or to let it continue
searching. In the latter case, the probability of finding the optimal
clustering solution will be increased.
|