Comparison with Estimation of Distribution Algorithms

EDAs remove the operators of crossover and mutation. In each generation a subset of the population is selected and the distribution of the individuals of this subset is estimated. The individuals of the population for the next generation are obtained sampling the estimated distribution. Although any selection method could be applied, the most common one is the selection of the best individuals of the population.

The first EDAs were developed for discrete spaces. Later, they were adapted to continuous domains. We can distinguish two types of EDAs, whether they take into account dependencies between the variables or not. One of the most used among the EDAs that do not consider dependencies is

(Univariate Marginal Distribution Algorithm for continuous domains) [LELP00]. In every generation and for every variable the

carries out some statistical test in order to find the density function that best fits the variable. Once the densities have been identified, the estimation of parameters is performed by their maximum likelihood estimates. If all the distributions are normal, the two parameters are the mean and the standard deviation. This particular case will be denoted

(Univariate Marginal Distribution Algorithm for Gaussian models).

Among the other type of EDAs, we can consider $EGNA_{BGe}$ (Estimation of Gaussian Network Algorithm) [LELP00] whose good results in function optimization are reported by Bengoetxea and Miqu�lez [BM02]. In each generation, $EGNA_{BGe}$ learns the Gaussian network structure by using a Bayesian score that gives the same value for Gaussian networks reflecting the same conditional dependencies are used. Next, it calculates estimations for the parameters of the Gaussian network structure.

In the experiments we have used the parameters reported by Bengoetxea and T. Miqu�lez [BM02]: a population of 2000 individuals, initialized using a uniform distribution, from which a subset of the best 1000 individuals are selected to estimate the density function, and the elitist approach was chosen (the best individual is included for the next population and 1999 individuals are simulated). Each algorithm has been run 30 times with a stop criterion of 300,000 evaluations of the fitness function.

The results of EDAs are compared with the results of a RCGA with CIXL2 of parameters

and $1-\alpha=0.70$ . We performed an ANOVA I analysis where the three levels of the factor are the different algorithms: RCGA with CIXL2,

and $EGNA_{BGe}$ . We also carried out a multiple comparison test.

Table 4 shows the average values and standard deviations for 30 runs for each algorithm. Table 11 in Appendix A shows how, for all the functions excepting $f_{Ack}$ , the type of algorithm has a significant effect over the linear model and exist inequality of the variances of the results (Levene test). So, we have used Tamhane test for all the functions and Bonferroni test for $f_{Ack}$ . Table 17 (Appendix A) shows the results of the multiple comparison test and the ranking established by the test.

For $f_{Sph}$ the results are very similar. The fitness behaves as a singular random variable with sample variance near 0 and the statistical tests are not feasible.

For $f_{SchDS}$ the results of CIXL2 are significantly better than the results of

and $EGNA_{BGe}$ . The same situation occurs for $f_{Ros}$ , $f_{Ras}$ , $f_{Sch}$ and $f_{Ack}$ , with the exception that in these four functions there are no significant differences between the two EDAs. For $f_{Gri}$ , $EGNA_{BGe}$ and

achieve the best results, significantly better than CIXL2. For $f_{Fle}$ ,

is significantly better than $EGNA_{BGe}$ and CIXL2, but there are no differences between these two. For $f_{Lan}$ , CIXL2 obtains the best results, but there are no significant differences among the three algorithms.

The estimation of the distribution function of the best individuals of the population performed by EDAs is an advantage in $f_{Sph}$ , unimodal and separable, and $f_{Gri}$ and $f_{Ack}$ whose optima are regularly distributed. The results of EDAs for $f_{Gri}$ are better than the results of CIXL2, but the results for $f_{Ack}$ are worse. The results for $f_{Sph}$ of all the algorithms are very similar. For non-separable unimodal functions, such as $f_{SchDS}$ and $f_{Ros}$ , the interdependence among their variables should favor the performance of $EGNA_{BGe}$ over

and CIXL2. Nevertheless, CIXL2 achieves the best results for these two functions. For multimodal separable functions, $f_{Ras}$ and $f_{Sch}$ , it is difficult to identify the distribution of the best individuals and the performance of EDAs is below the performance of CIXL2.

For extremely complex functions, such as $f_{Fle}$ and $f_{Lan}$ , the results are less conclusive. For $f_{Fle}$ the best results are obtained with

, and there are no differences between $EGNA_{BGe}$ and CIXL2. For $f_{Lan}$ , CIXL2 achieves the best results, but the differences among the three algorithms are not statistically significant.

Table 4: Average values and standard deviation for the 30 runs of three evolutionary algorithms: RCGA with CIXL2 crossover,

and $EGNA_{BGe}$ .

EA	$\mathbf{Mean}$	$\mathbf{St. Dev.}$	$\mathbf{Mean}$	$\mathbf{St. Dev.}$	$\mathbf{Mean}$	$\mathbf{St. Dev.}$
	$\mathbf{f_{Sph}}$		$\mathbf{f_{SchDS}}$		$\mathbf{f_{Ros}}$
	6.365e-16	2.456e-16	1.995e-03	2.280e-03	2.494e+01	1.283e+00
	1.196e-16	1.713e-17	2.221e+01	3.900e+00	2.787e+01	2.278e-02
$EGNA_{BGe}$	1.077e-16	1.001e-17	2.096e-01	1.189e-01	2.785e+01	1.629e-01
	$\mathbf{f_{Ras}}$		$\mathbf{f_{Sch}}$		$\mathbf{f_{Ack}}$
CIXL2	2.919e+00	1.809e+00	6.410e+02	2.544e+02	1.378e-08	5.677e-09
	1.576e+02	7.382e+00	1.153e+04	9.167e+01	2.478e-08	1.831e-09
$EGNA_{BGe}$	1.563e+02	8.525e+00	1.155e+04	8.754e+01	2.297e-08	2.095e-09
	$\mathbf{f_{Gri}}$		$\mathbf{f_{Fle}}$		$\mathbf{f_{Lan}}$
CIXL2	1.525e-02	1.387e-02	1.523e+04	1.506e+04	-2.064e-01	9.346e-02
	9.465e-16	1.207e-16	5.423e+03	1.562e+03	-1.734e-01	4.258e-11
$EGNA_{BGe}$	8.200e-16	1.149e-16	9.069e+03	7.592e+03	-1.734e-01	1.864e-11