Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology

On the Statistical Mechanics Approximation of Biogeography-Based Optimization Haiping Ma

[email protected] Department of Electrical Engineering, Shaoxing University, Shaoxing, Zhejiang, 312000, China; Shanghai Key Laboratory of Power Station Automation Technology, School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, 200444, China

Dan Simon

[email protected] Department of Electrical and Computer Engineering, Cleveland State University, Cleveland, Ohio, 441155, USA

Minrui Fei

[email protected] Shanghai Key Laboratory of Power Station Automation Technology, School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, 200444, China

Abstract Biogeography-based optimization (BBO) is an evolutionary algorithm inspired by biogeography, which is the study of the migration of species between habitats. This paper derives a mathematical description of the dynamics of BBO based on ideas from statistical mechanics. Rather than trying to exactly predict the evolution of the population, statistical mechanics methods describe the evolution of statistical properties of the population fitness. This paper uses the one-max problem, which has only one optimum and whose fitness function is the number of ones in a binary string, to derive equations that predict the statistical properties of BBO each generation in terms of those at the previous generation. These equations reveal the effect of migration and mutation on the population fitness dynamics of BBO. The results obtained in this paper are similar to those for the simple genetic algorithm with selection and mutation. The paper also derives equations for the population fitness dynamics of general separable functions, and we find that the results obtained for separable functions are the same as those for the one-max problem. The statistical mechanics theory of BBO is shown to be in good agreement with simulation. Keywords biogeography-based optimization, evolutionary algorithms, statistical mechanics, genetic algorithm, dynamics

1 Introduction Mathematical models of biogeography describe the migration of species between habitats. Biogeography-based optimization (BBO) (Simon, 2008) is a new evolutionary algorithm for global optimization that was introduced in 2008, and is an extension of biogeography theory to evolutionary algorithms (EAs). BBO has demonstrated good performance on various unconstrained and constrained benchmark functions (Boussaid et al., 2012; Du et al., 2009; Ergezer et al., 2009; Gong et al., 2010; Gong et al., 2011; Li and Yin, 2012; Ma et al., 2009; Ma and Simon, 2011). It has also been applied to real-world optimization problems, including economic load dispatch (Bhattacharya c °200X by the Massachusetts Institute of Technology

Evolutionary Computation x(x): xxx-xxx

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology H. Ma, D. Simon, M. Fei

and Chattopadhyay, 2010), wireless network power allocation (Boussaid et al., 2011a), satellite image classification (Panchal et al., 2009), flexible job shop scheduling (Rahmati and Zandieh, 2012), power system optimization (Jamuna and Swarup, 2012; Kankanala et al., 2012; Lohokare et al., 2012; Rarick et al., 2009), antenna design (Singh et al., 2010), and others (Boussaid et al., 2011b; Chatterjee et al., 2012; Hadidi and Nazari, 2013; Van Laarhoven and Aarts, 1987). BBO probabilistically shares information between candidate solutions, like other EAs (Ahn, 2006; Vose, 1999; Yao et al., 1999). In BBO, each candidate solution is comprised of a set of independent variables, also called features. Each candidate solution immigrates features from other candidate solutions based on its immigration rate, and emigrates features to other candidate solutions based on its emigration rate. BBO has a close relationship with other EAs such as genetic algorithms (GAs), differential evolution (DE), and particle swarm optimization (PSO). This has been highlighted in earlier work (Simon et al., 2011), which also showed that in spite of commonality with other EAs, BBO has distinct features that give it unique properties. Because of these unique properties, it is useful to retain the distinction between BBO and other EAs. Another reason that it is useful to retain BBO as a distinct EA is that it easily allows the incorporation of behaviors from natural biogeography into BBO, such as the effect of geographical proximity on migration rates, nonlinear migration curves, the effect of species populations (including mortality and reproduction), predator/prey relationships, species mobilities, directional migration momentum, habitat area and isolation, and others (Ma, 2010; Costa e Silva et al., 2012; Goel et al., 2012). Although BBO has shown good performance on various problems, it is not wellunderstood for what problems it is effective and why. The efficiency of BBO depends on the problem representation and the BBO tuning parameters. These parameters include the population size, the immigration curve shape, the emigration curve shape and the mutation rate. For many problems, BBO can be very effective when a good representation is chosen and the parameters are set to appropriate values for the particular problem. When poor choices are made, BBO can perform like random search. If a mathematical model as a function of the BBO parameters could predict the improvement in fitness from one generation to the next, it could be used to find optimal values of the tuning parameters. For example, consider a problem with very expensive fitness function evaluations; we may even need to do physical experiments to evaluate fitness. If we can find a model that predicts fitness improvement, we can use the model during early generations to tune BBO parameters, or to predict if BBO will perform better than other EAs. So a mathematical model of the evolution of BBO will be useful to develop effective BBO modifications. More generally, a model of BBO dynamics could be useful in producing insights as to how the algorithm behaves and when it is likely to be effective. There have been some attempts to describe the dynamics of BBO. The most welldeveloped method is the Markov chain model (Nix and Vose, 1992; Suzuki, 1995), which is an exact model of the evolution of BBO (Simon et al., 2009, 2011a and 2011b). It gives the probability of arriving at any population given the starting population, as the generation count approaches infinity. But it is of little practical use for real world problems because transition probabilities are never known in real applications. In this paper, we present a formalism which is based on ideas from statistical mechanics to study the dynamics of BBO. This method could lead to better understanding of the evolution of BBO in realistic, high-dimensional optimization problems. Statistical mechanics is a branch of physics that applies probability theory to the 2

Evolutionary Computation

Volume x, Number x

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology Statistical Mechanics Approximation of BBO

study of the thermodynamic behavior of systems consisting of a huge number of interacting entities. It provides methods to calculate the average properties of these systems by treating the small-scale motions as random. It is usually used to compute thermodynamic properties of particles interacting with a heat bath, but has recently been applied to other problems, for example, mathematical modeling for the genetic algorithm (GA) ¨ (Prugel-Bennett and Shapiro, 1994; Rattray, 1995 and 1996; Rattray and Shapiro, 1996; Shapiro, 1994) and simulated annealing (Van Laarhoven and Aarts,1987). However, the modeling of GAs and simulated annealing using statistical mechanics has been limited to specific problems. This paper extends that work to BBO, and also, for the first time, uses statistical mechanics ideas to model the population dynamics for a general but separable fitness function. The advantage of statistical mechanics to describe the dynamics of BBO is that we only assume knowledge of the cumulants of the fitness distribution and the average fitness correlation within the population. The cumulant is defined by the generating function, which is given by the natural logarithm of the Fourier transform of the sample fitness probability distribution function. The cumulants contain the same information as the distribution function. The first and second cumulants are respectively called the mean and variance of the distribution, and the third cumulant is called the skewness and measures the asymmetry of the distribution. All cumulants except the first two are zero for a Gaussian distribution, so the higher order cumulants measure the nonGaussian content of the distribution. In this paper, the dynamics description of BBO is computed in terms of the change of the cumulants due to the migration and mutation. The cumulants are used for a number of reasons. Cumulants can readily be measured, they are more stable than other statistical parameters because they tend to self-average, and they provide interesting results concerning how BBO works. Section 2 gives a description of the BBO algorithm. Section 3 first introduces statistical mechanics, and derives the statistical mechanics approximation of BBO for the one-max problem. Section 4 uses simulations to study the effect of migration and mutation, and then compares BBO with GA based on statistical mechanics models. Section 5 derives the statistical mechanics approximation of BBO for separable functions to generalize the model and to further reveal how BBO works. Section 6 provides some concluding remarks and suggestions for future work.

2 Biogeography-Based Optimization BBO is a new evolutionary optimization algorithm that is inspired by biogeography (Simon, 2008). In BBO, a biogeography habitat denotes a candidate optimization problem solution. Each candidate solution is comprised of a set of features, which are also called decision variables, or independent variables, and which are similar to genes in GAs. A set of biogeography habitats denotes a population of candidate solutions. Like other EAs, each candidate solution in BBO probabilistically shares decision variables with other candidate solutions to improve candidate solution fitness. This sharing process is analogous to migration in biogeography. That is, each candidate solution immigrates decision variables from other candidate solutions based on its immigration rate, and emigrates decision variables to other candidate solutions based on its emigration rate. BBO consists of two main steps: migration and mutation. Migration is a probabilistic operator that is intended to improve a candidate solution yk . For each decision variable of a given candidate solution yk , the candidate Evolutionary Computation

Volume x, Number x

3

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology H. Ma, D. Simon, M. Fei

solution’s immigration rate λk is used to probabilistically decide whether or not to immigrate. If immigration is selected, then the emigrating candidate solution yj is probabilistically chosen based on the emigration rate µj . Migration is written as yk (s) ← yj (s)

(1)

where s is a decision variable. In BBO, each candidate solution yk has its own immigration rate λk and emigration rate µk . A good candidate solution has a relatively high emigration rate and low immigration rate, while the converse is true for a poor candidate solution. Immigration rate and emigration rate are based on particular migration curves, such as the linear migration curves presented in Figure 1, where the maximum immigration rate and maximum emigration rate are assumed both equal to 1. Nonlinear immigration rates and emigration rates have been discussed in Ma (2010).

1 1

µk

ra te

λk 0 0

n

fitness

Figure 1: Linear migration curves for BBO. λk is immigration rate and µk is emigration rate, n is the best fitness, and we assume that the maximum immigration rate and maximum emigration rate are both equal to 1. The probability of immigrating to yk and the probability of emigrating from yk , for k ∈ [1, N ] , are calculated as Prob(immigration to yk ) = λk µk Prob(emigration from yk ) = P N j=1

(2)

µj

where N is the population size. Mutation is a probabilistic operator that randomly modifies a decision variable of a candidate solution. The purpose of mutation is to increase diversity among the population, just as in other EAs. At the end of each generation, mutation of the kth candidate solution is implemented as shown in Algorithm 1. In the mutation logic of Algorithm 1, rand(a, b) is a uniformly distributed random number between a and b, pm is the mutation rate, and Ls and Us are the lower and upper search bounds of decision variable s. The above logic mutates each independent variable with a probability of pm . If mutation occurs for a given independent variable, then that independent variable is replaced with a random number in its search domain. 4

Evolutionary Computation

Volume x, Number x

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology Statistical Mechanics Approximation of BBO

Algorithm 1: BBO mutation logic. For each decision variable s in candidate solution yk If rand(0, 1) < pm yk (s) ← rand(Ls , Us ) End if Next decision variable

A description of one generation of BBO is given in Algorithm 2. Note that Algorithm 2 includes the use of a temporary population so that migration completes before the original population is replaced with the new population at the end of each generation. Now we make several clarifications about Algorithm 2. First, a solution can emigrate to itself, which means that j might be chosen to be equal to k in the statement ”zk (s) ← yj (s).” That is, when an independent variable in a candidate solution zk is replaced via migration, the new independent variable might be chosen to come from zk itself. In this case, the independent variable is not actually changed in zk . However, the probabilistic choice of the emigrating candidate solution allows this to happen on occasion. Algorithm 2: One generation of the BBO algorithm, where N is the population size. y and z are the entire population of candidate solutions, yk is the kth candidate solution, and yk (s) is the sth decision variable of yk . For each candidate solution yk , define emigration rate µk ∝ fitness of yk , with µk ∈ [0, 1] For each candidate solution yk , define immigration rate λk = 1 - µk z←y For each candidate solution zk (k = 1 to N ) For each decision variable s Use λk to probabilistically decide whether to immigrate to zk (see Eq. 2) If immigrating then Use {µ} to probabilistically select the emigrating candidate solution yj (see Eq. 2) zk (s) ← yj (s) End if Next decision variable Probabilistically decide whether to mutate zk Next candidate solution y←z

Second, in this paper we assume that the migration rates λ and µ are independent of the population distribution. That is, the best fitness that is used to define µ = 1 in Figure 1 is the best fitness for the problem, not the best fitness in the current population. Similarly, the worst fitness that is used to define µ = 0 is the worst fitness for the problem. Using the worst fitness to define µ = 0 is not much of a limitation because, in practice, we can assign an arbitrary small fitness value to poor candidate solutions. Alternatives to these assumptions are commonly used in practice but Evolutionary Computation

Volume x, Number x

5

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology H. Ma, D. Simon, M. Fei

these assumptions are needed for the statistical mechanics model development of the following section.

3 Statistical Mechanics Model of BBO This section provides a preliminary foundation for the definition of the cumulants of a distribution function (Section 3.1), and then derives the statistical mechanics model of BBO under migration (Section 3.2) and mutation (Section 3.3). 3.1

Preliminary foundation

The cumulants of a probability distribution describe the shape of the distribution. Cumulant dynamics have already been developed for some evolutionary algorithms, such as the simple genetic algorithm (Reeves and Rowe, 2003). They are natural statistics to describe distributions that are close to Gaussian since the higher order cumulants are a measure of deviation from a Gaussian distribution. The first two cumulants are the mean and variance of the distribution, while the third and fourth cumulants are related to skewness and kurtosis respectively. Since populations in evolutionary algorithms are often initialized randomly, and fitness is a function of many variables, and the central limit theorem states that the accumulated distribution of random variables becomes more Gaussian as the number of variables increases, many real-world problems have a Gaussian fitness distribution during the early generations. Considering a scalar random variable X which can take non-negative integer values, for each k ≥ 0, there is a certain probability Pk that X takes the value k. So the characteristic function of the probability distribution for X and its Taylor series can be described as X X ω2 2 ω3 3 ω1 z+ z + z + ··· ϕ (z) = Pr (X = k)ekz = Pk ekz = 1 + (3) 1! 2! 3! k≥0

k≥0

where the coefficients ωk are called the moments of the probability distribution, given by £ ¤ ωk = E X k (4) Expanding the natural logarithm of the characteristic function ϕ (z) as a Taylor series gives H (z) = log ϕ (z) = log(P0 + P1 ez + P2 e2z + P3 e3z + · · ·) =

κ1 κ2 κ3 z + z 2 + z 3 + · · · (5) 1! 2! 3!

where the coefficients κk for k = 1, 2, 3 · · ·, are called the cumulants of the probability distribution. In particular, the relationships between the first three cumulants and the moments of the probability distribution have simple expressions: κ1 = ω1 = E [X] £ ¤ 2 κ2 = ω2 − ω12 = E X 2 − (E [X]) £ ¤ £ ¤ 3 κ3 = ω3 − 3ω1 ω2 + 2ω13 = E X 3 − 3 · E [X] · E X 2 + 2 · (E [X])

(6)

From the above equations, it is seen that the first cumulant κ1 is the mean of the distribution, and the second cumulant κ2 is the variance. Since the Gaussian distribution is completely defined by its mean and variance and all the higher cumulants are zero, any distribution with very small values for its higher cumulants will look like a Gaussian. 6

Evolutionary Computation

Volume x, Number x

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology Statistical Mechanics Approximation of BBO

Namely, as long as populations fitness are distributed approximately Gaussian, we can truncate the series expansion as shown in (5) and use just the first few cumulants to approximate fitness distributions. The cumulants of a probability distribution have a linearity property which can be formulated as follows (Grinstead and Snell, 1997). If Xi is an independent random variable for i = 1, 2, 3, · · · n, then X2 Xn 1 +X2 +···+Xn 1 κX = κX k k + κk + · · · + κk =

n X

i κX k

(7)

i=1

An example To clarify the notations, the one-max problem is presented, which has only one optimum. The fitness of an individual is the number of ones in the binary string. Given n as the length of a binary string, and X as the possible fitness values, the problem is defined as follows: n X M aximizeX, whereX = xj (8) j=1

where xj ∈ I = {0, 1} is bit j of a binary string. In this example, the length of the binary string n is set to 5 which is the maximum fitness, and the size of a population N is set to 24. An initial population is randomly generated with each bit independent of the others. It might be the following: 00011 00001 10111

01001 01000 10100

10000 11011 11010

01110 00111 00101

10101 01100 01110

11110 11111 01001

11010 00000 01101

00101 01000 00101

Before calculating the cumulants, the independence of the bits is checked to confirm the linearity property in (7). Considering bit 1 and bit 2, for example, we obtain à ! à ! P P E [x1 ] · E [x2 ] = i Pr [x1 = i] · i Pr [x2 = i] i=0,1

i=0,1

= (Pr(x1 = 1)) · (Pr(x2 = 1)) = (9/24) · (13/24) = 0.204 E [x1 · x2 ] =

X

i · j · Pr [x1 = i, x2 = j] = Pr(x1 = 1, x2 = 1) = 5/24 = 0.208

i,j=0,1

We see that E [x1 · x2 ] ' E [x1 ] · E [x2 ] which is consistent with the fact that bit 1 and bit 2 are independent. Similarly, we can numerically confirm that all of the other bits in the population are independent, which confirms that the cumulants have the linearity property shown in (7) at the first generation. After migration at the end of the first generation, each bit in a given individual will still be independent because migration to each individual bit is independent of migration to the other bits (see Algorithm 2). Similarly, after mutation in the first generation, each bit in a given individual will be independent because mutation of each individual bit is independent (see Algorithm 1). Evolutionary Computation

Volume x, Number x

7

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology H. Ma, D. Simon, M. Fei

In this example, the population distribution over the set of possible fitness values is obtained as Pr [X = 0] = 0.041, Pr [X = 3] = 0.292,

Pr [X = 1] = 0.167, Pr [X = 4] = 0.125,

Pr [X = 2] = 0.333 Pr [X = 5] = 0.041

Based on (3), the characteristic function of the probability distribution and its Taylor series are ϕ (z) = 0.041 + 0.167ez + 0.333e2z + 0.292e3z + 0.125e4z + 0.041e5z 7.152 2 23.840 3 86.772 4 = 1 + 2.414 1! z + 2! z + 3! z + 4! z + · · · Based on (6), the cumulants of the probability distribution are obtained as κ1 = ω1 = 2.414 2 κ2 = ω2 − ω12 = 7.152 − (2.414) = 1.324 3 κ3 = ω3 − 3ω1 ω2 + 2ω13 = 23.840 − 3 × 2.414 × 7.152 + 2 × (2.414) = 0.180 We find that the mean is 2.414 and the variance is 1.324. This distribution is approximately Gaussian as shown in Figure 2. From this example, it is seen that cumulants can approximate the population distribution by truncating its series expansion. 0.35 0.3

Probability

0.25 0.2 0.15 0.1 0.05 0

0

1

2

3

4

5

Fitness

Figure 2: Probability distribution of the fitness X. The solid line denotes a Gaussian distribution, and the dots denote the real probabilities of the population. The argument above is that after migration, each bit in a given individual will remain independent because migration to each bit is independent of migration to the other bits, as shown in Algorithm 2. However, this is a simplistic assumption. Because of selection, BBO will introduce correlations between bits and they will not remain independent for very long. Our hypothesis here is that the variables will remain mostly independent, at least during the early generations, so that statistical mechanics will provide a good fitness approximation. This hypothesis will be verified in Section 4. 3.2

BBO Migration

Cumulants of a BBO fitness probability distribution provide certain average properties. These cumulants describe the fitness as a whole each generation. The cumulant 8

Evolutionary Computation

Volume x, Number x

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology Statistical Mechanics Approximation of BBO

dynamics tell us the trajectory of the distribution of the population fitness during the evolutionary process. In the following subsections the equations of the dynamic evolution of BBO are derived based on the cumulants of its fitness distribution, and the specific one-max problem described in (8) is used to study the properties of BBO. Although the one-max problem is a simple optimization problem, the dynamic equations of BBO can provide predictive power and insight into BBO dynamics. In Section 5 we extend this model to general but separable fitness functions. Consider a single bit position xj , suppose that it has a randomly distributed population, and consider the effect of migration on the values in this bit position. There are two ways in which a bit can be equal to 1 after an immigration trial (that is, after one iteration of the ”for each decision variable s” loop in Algorithm 2). First, it can immigrate 1 from another individual in the population (that is, another individual emigrates a 1 to this bit). Second, it can start as 1 and not immigrate. So the expected value of this bit after an immigration trial is computed by P E [xj ] = i Pr [xj = i] = Pr(xj = 1) i=0,1

= Pr(xj = 1|immigration)Pr(immigration) +Pr(xj = 1|no immigration)Pr(no immigration) = Pr(emigrating bit = 1)Pr(immigration) +Pr(xj = 1|no immigration)Pr(no immigration)

(9)

Consider the first probability on the right side, that is, the probability that an emigrating bit is 1, which is proportional to the sum of the emigration rates of all individuals whose jth bit is equal to 1: P µk Pr(emigrating bit = 1) =

k∈ρj (1) N P

(10)

µk

k=1

where ρj (1) = {population indices of individuals whose jth bit is equal to 1}, and N denotes the size of the population. Now consider the third probability on the right side of (9): if immigration does not occur, the probability that the value of bit j is 1 after an immigration trial is equal to the probability that the value of bit j is 1 before the immigration trial, which can be written as Pr(xj = 1|no immigration) = Pr(x0j = 1)

(11)

where x0j denotes the value of bit j before an immigration trial. Now consider the probability that the value of bit j is 1 before the immigration trial: £ ¤ Pr(x0j = 1) = E x0j (12) Note that the mean value of bit j before the immigration trial is £ ¤ |ρj (1)| E x0j = N

(13)

where |ρj (1)| denotes the number of individuals in the population whose jth bit is equal to 1. Combining this with (6), the first cumulant of the jth bit before the immigration trial is |ρj (1)| (j) (j) (14) κ1 = ω 1 = N Evolutionary Computation

Volume x, Number x

9

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology H. Ma, D. Simon, M. Fei

Next we consider the probabilities Pr(immigration) and Pr(no immigration) in (9), which are the average values of the immigration curve; that is, they are independent of any additional information about fitness value of any individual. Assuming that the migration curve is shown as Figure 1, the emigration rate µk is normalized as µk =

fitness number of 1 bits = n+1 n+1

(15)

So the probability Pr(immigration) can written as Pr(immigration) = 1 − Pr(emigration) of 1 bits] = 1 − µ = 1 − E [µk ] = 1 − E[number n+1

(16)

Furthermore, the average number of 1 bits in any given individual is E [X], which denotes the mean of the fitness, and which is equal to the cumulant κ1 . The probabilities Pr(immigration) and Pr(no immigration) can thus be written as κ1 Pr(immigration) = 1 − n+1 Pr(no immigration) = 1 − Pr(immigration) =

(17)

κ1 n+1

Note that if there is a different immigration curve other than the one shown in Figure 1, we would need to re-compute the average of the immigration curve to find Pr(immigration) and Pr(no immigration). Substituting (10), (11), and (17) into (9), the expected bit value after an immigration trial is written as P µk µ ¶ µ ¶ k∈ρj (1) κ1 κ1 E [xj ] = N 1− + Pr(x0j = 1) (18) P n+1 n+1 µk k=1

Note that we have normalized the emigration rate so that it has a minimum of 0, and a maximum of n/(n+1). Other normalizations are possible, as long as the emigration rate is between 0 and 1, but this as a typical approach. Then we use (15) to obtain N P

µk =

k=1

= = Next we note that P P µk = k∈ρj (1)

=

N P

number of 1 bits in the kth individual n+1

k=1 number of 1 bits in entire population n+1 N ·E[X] N κ1 = n+1 n+1

(19)

number of 1 bits in the kth individual whose jth bit is 1 n+1

k∈ρj (1) 1 n+1 |ρj (1)|

· E [number of 1 bits in any individual whose jth bit is 1]

(20)

Further notice that E [number of 1 bits in any individual whose jth bit is 1] = 1 + E [number of 1 bits in remaining (n − 1) bits except the jth bit] n n P P |ρ (1)| |ρk (1)| |ρk (1)| =1+ − jN =1+ N N =1+

k=1 k6=j n P

k=1

10

k=1

(k)

(21)

(j)

κ1 − κ1

Evolutionary Computation

Volume x, Number x

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology Statistical Mechanics Approximation of BBO

Assuming that the cumulants have the linear property described in (7), the cumulant of the binary string is equal to the sum of cumulants of each bit of the binary string: κ1 =

n X

(k)

κ1

(22)

k=1

Combining (14), (20), (21) and (22), gives ³ ´ (j) 1 |ρj (1)| 1 + κ1 − κ1 µk = n+1 k∈ρj (1) ´ (j) ³ N κ1 (j) 1 + κ1 − κ1 = n+1 P

(23)

Now the expected value of bit xj after an immigration trial can be written as P (j) κ1,t+1

= E [xj ] = µ

µk

k∈ρj (1) N P

³ 1−

µk

κ1,t n+1

´

+ Pr(x0j = 1)

³

κ1,t n+1

´

´³ ´ ³ ´¶ (j) (j) κ1,t κ1,t 1 + κ + κ − κ 1 − 1,t 1,t 1,t n+1 κ1,t n+1 ³ ´ κ(j) ³ ´ “κ(j) ”2 1,t (j) κ1,t κ1,t 1,t = κ1,t + 1 − n+1 κ1,t − 1 − n+1 κ1,t =

k=1 (j) κ1,t

³

(24)

Note that the subscript t +1 and t denote quantities after and before migration respectively. For the other cumulants of bit xj , we note that bits can only take the value 0 or 1, so xkj = xj (25) for all values of k. So the moments of all orders are equal, namely, £ ¤ (j) ωk = E xkj = E [xj ]

(26)

Based on the relationships of moments and cumulants described in (6), we obtain the (j) (j) equations of cumulants κ2 and κ3 as follows: ´2 µ κ(j) ¶2 ³ ´ κ(j) ³ (j) (j) κ1,t κ1,t 2,t 3,t κ2,t+1 = κ2,t − 1 − n+1 + 1 − n+1 κ1,t 2κ1,t ´3 µ κ(j) ¶3 ³ ´2 κ(j) κ(j) ³ ´ κ(j) (27) ³ (j) (j) κ1,t κ1,t κ1,t 2,t 2,t 3,t κ3,t+1 = κ3,t + 2 1 − n+1 − 3 1 − n+1 + 1 − n+1 κ4,t κ1,t (κ )2 1,t 1,t

The cumulant approximations of the population after migration are ´ ³ n P (j) κ2,t κ1,t κ1,t+1 = κ1,t+1 = κ1,t + 1 − n+1 κ1,t j=1 ³ ´2 ³ ´ ´2 ³ n P (j) κ1,t κ2,t κ3,t κ1,t κ2,t+1 = κ2,t+1 ≈ κ2,t − 1 − n+1 + 1 − κ1,t n+1 κ1,t j=1 ³ ´3 ³ ´3 n P (j) κ2,t κ1,t κ3,t+1 = κ3,t+1 ≈ κ3,t + 2 1 − n+1 κ1,t j=1 ³ ´2 ³ ´ κ1,t κ2,t κ3,t κ1,t κ4,t −3 1 − n+1 + 1 − n+1 κ1,t (κ )2

(28)

1,t

The derivations of equations (27) and (28) are in the appendix; note that κ2,t+1 and κ3,t+1 are approximations. Cumulants κ1 and κ2 denote the mean and the variance of Evolutionary Computation

Volume x, Number x

11

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology H. Ma, D. Simon, M. Fei

the population fitness distribution, and the cumulant κ3 denotes the skewness, which measures the degree of asymmetry of the distribution. Cumulant κ4 and other higher order cumulants are not discussed in this section. κ4 denotes the kurtosis, which measures the peakedness of distribution. EA researchers are typically more interested in the mean and the variance of fitness distributions, namely κ1 and κ2 , and κ4 only affects κ3 based on (28). In fact, it is apparent from (28) that the influence of κ4 on κ3 is relatively small. So we make the approximation here that κ4 and other higher order cumulants are zero, which gives us an approximation of the migration dynamics using just three variables. Equation (28) gives the migration dynamics of the statistical properties of BBO. Figure 3 shows the evolution of the BBO fitness distribution due to migration for the one-max problem, where the probability distributions of the population fitness X are shown every 4 generations, up to 20 generations. From the figure, we see that, as expected, the mean of the best fitness becomes larger as the generation count increases. 0.5

Probability

0.4

0.3

0.2

Gen.0

Gen.4 Gen.12 Gen.8 Gen.16

Gen.20

0.1

0

0

1

2

3

4

5

fitness

Figure 3: Evolution of BBO fitness due to only migration. Probability distributions of the fitness X of the one-max problem are shown every 4 generations up to 20. 3.3

BBO Mutation

The previous subsection considers only migration. In this subsection mutation is added to the statistical mechanics model. Mutation in BBO is the operator which modifies a decision variable of a candidate solution with a small probability. It is the same as mutation in GA (see Algorithm 1). So the theory of GA mutation presented in Reeves and Rowe (2003) can directly apply to BBO. The results for the first three cumulants for the one-max problem considering only mutation are given here: κ1,t+1 = mn + (1 − 2m) κ1,t 2 κ2,t+1 = m (1 − m) n + (1 − 2m) κ2,t 3 κ3,t+1 = m (1 − m) (1 − 2m) (n − 2κ1,t ) + (1 − 2m) κ3,t

(29)

where m is the mutation rate, and t+1 and t denote the quantitiess after and before mutation respectively. Note that these mutation-only equations were previously obtained 12

Evolutionary Computation

Volume x, Number x

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology Statistical Mechanics Approximation of BBO

for the one-max problem in the case that the variables were taken to be ±1 (Pr¨ ugelBennett, 1997, Section 3). We combine these mutation equations with those of BBO migration derived in the previous subsection. This gives the following model for BBO with both migration and mutation: ´ ´ ³ ³ κ2,t κ1,t κ1,t+1 = mn + (1 − 2m) κ1,t + 1 − n+1 κ 1,t ¶ µ ´2 ³ ´2 ³ ´ ³ κ2,t κ1,t κ3,t κ1,t 2 κ2,t+1 = m (1 − m) n + (1 − 2m) κ2,t − 1 − n+1 + 1 − n+1 κ1,t κ1,t ³ ³ ³ ´ ´´ κ1,t κ2,t κ3,t+1 = m (1 − m) (1 − 2m) n − 2 κ1,t + 1 − n+1 κ1,t ¶ µ ´3 ³ ´3 ´2 ´ ³ ³ ³ κ2,t κ4,t κ1,t κ1,t κ2,t κ3,t κ1,t 3 + (1 − 2m) κ3,t + 2 1 − n+1 − 3 1 − + 1 − 2 κ1,t n+1 n+1 κ1,t (κ ) 1,t

(30) These equations give statistical properties of BBO fitness values after migration and mutation in terms of those at the previous generation. So it is possible to iterate these equations to obtain predictions for the evolution of BBO populations.

4 Simulation Results This section first confirms the BBO statistical mechanics approximation model derived in Section 3 with simulation (Section 4.1), and then compares the approximation results of BBO with those of the GA (Section 4.2). 4.1

Comparisons between Theory and Simulation

This subsection uses the 100-bit one-max problem described in (8) to verify the statistical model of BBO with simulation. The statistical approximations of BBO are obtained by iterating (28) when considering only migration, and by iterating (30) when considering both migration and mutation. We use κ4,t = 0 in the iterations of (28) and (30). The simulation parameters of BBO are as follows: population size of 50, maximum immigration rate and maximum emigration rate of 1, and 200 Monte Carlo runs. Figures 4-6 show comparisons between theoretical (statistical approximation) and simulated BBO results. Several things are notable about Figures 4-6. First, the approximation results of BBO show that as the generation count increases, the first cumulant κ1 increases and the second cumulant κ2 decreases; namely, the mean increases and the variance decreases with the generation count. This is consistent with the expected behavior of BBO: as the generation count increases, the mean fitness of the population increases, and the population becomes more uniform. Second, the cumulant approximations and the simulation results match well. This indicates that the statistical mechanics approximation model of BBO is reasonable, and that the statistical mechanics approximation is valid for the one-max problem. This fact allows us to make quantitative conclusions from statistical mechanics approximations, and to use those approximations to obtain valid conclusions regarding the performance of BBO with various tuning parameters. Third, as the mutation rate decreases, the approximation for the mean becomes larger; namely, BBO achieves better fitness values. This indicates that low mutation rates have better performance for the one-max problem. A high mutation rate of 0.1 results in too much exploration, and the population remains too widely distributed across the search space. However, a low mutation rate of 0.01 or 0.001 allows BBO Evolutionary Computation

Volume x, Number x

13

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology H. Ma, D. Simon, M. Fei

70 Approximation

68

Simulation 66 m=0.001 Cumulant K1

64 62 60 m=0.01

58 56 54 52 50

m=0.1 0

20

40 60 Generations

80

100

Figure 4: BBO approximation and simulation results of cumulant κ1 with mutation rate m = 0.1, 0.01 and 0.001 for the 100-bit one-max problem.

36 Approximation Simulation 34

Cumulant K2

32 m=0.001 30

28 m=0.01 26 m=0.1 24

0

20

40 60 Generations

80

100

Figure 5: BBO approximation and simulation results of cumulant κ2 with mutation rate m = 0.1, 0.01 and 0.001 for the 100-bit one-max problem. to exploit better fitness values in the population. These results are similar to those obtained by Markov model analysis in Simon et al. (2011a). 4.2

Comparisons between BBO and GA

To further investigate the statistical properties of BBO, we compare the approximation model of BBO derived above with that of a simple GA using proportional selection and ¨ mutation presented in Prugel-Bennett and Shapiro (1994), Reeves and Rowe (2003). The 14

Evolutionary Computation

Volume x, Number x

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology Statistical Mechanics Approximation of BBO

0.6 Approximation Simulation

0.4

Cumulant K3

0.2 m=0.001

0

m=0.1

−0.2

m=0.01

−0.4 −0.6 −0.8 −1

0

20

40 60 Generations

80

100

Figure 6: BBO approximation and simulation results of cumulant κ3 with mutation rate m = 0.1, 0.01 and 0.001 for the 100-bit one-max problem. statistical mechanics model of a GA with selection only is κ1,t+1 = κ1,t + κ2,t+1 = κ2,t −

κ2,t κ1,t ³ ´2 κ2,t κ1,t

³

κ3,t+1 = κ3,t + 2

κ2,t κ1,t

+

´3

κ3,t κ1,,t



κ κ 3 (κ2,t 3,t 2 1,t )

(31) +

κ4,t κ1,t

Recall that the mutation model is given in (29). So the three-parameter approximation equations for a simple GA with selection and mutation are ´ ³ κ κ1,t+1 = mn + (1 − 2m) κ1,t + κ2,t 1,t ¶ µ ³ ´2 κ2,t κ3,t 2 κ2,t+1 = m (1 − m) n + (1 − 2m) κ2,t − κ1,t + κ1,,t ³ ³ ´´ (32) κ2,t κ3,t+1 = m (1 − m) (1 − 2m) n − 2 κ1,t + κ1,t µ ¶ ³ ´3 κ κ κ2,t κ4,t 3 + (1 − 2m) κ3,t + 2 κ1,t − 3 (κ2,t 3,t + κ1,t )2 1,t

We see that the approximation equations of BBO in (30) are similar to those of the κ1 in some terms of the GA in (32), and the difference is the additional coefficient 1 − n+1 BBO model. As the problem size n becomes large, the BBO approximation approaches the GA approximation. These equations can be used to compare the statistical approximations of BBO and GA for the 100-bit one-max problem, as shown in Figures 7-9. From Figures 7-9, several things are notable about the approximation comparisons between BBO and GA. First, the trends of the cumulants are very similar for BBO as for GA. When the generation count increases, the mean (the first cumulant κ1 ) increases, and the variance (the second cumulant κ2 ) decreases. The only qualitative difference we see is that the skewness (the third cumulant κ3 ) of GA suddenly changes slope during the first few generations, while the skewness of BBO changes more smoothly. Evolutionary Computation

Volume x, Number x

15

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology H. Ma, D. Simon, M. Fei

85 BBO

80

GA m=0.001

Cumulant K1

75 70 65

m=0.01

60

m=0.001

55

m=0.01 m=0.1

50

m=0.1

45

0

20

40 60 Generations

80

100

Figure 7: Comparison of the approximation values of cumulant κ1 between BBO and GA with mutation rate m = 0.1, 0.01 and 0.001 for the 100-bit one-max problem.

35 BBO GA

Cumulant K2

30 m=0.001

m=0.1

25

m=0.1 ↓ m=0.01 m=0.01 m=0.001

20

0

20

40 60 Generations

80

100

Figure 8: Comparison of the approximation values of cumulant κ2 between BBO and GA with mutation rate m = 0.1, 0.01 and 0.001 for the 100-bit one-max problem.

These comparisons indicate that the expected behavior of BBO is similar to that of GA. This point has already been implied by equations (30) and (32). It is not too surprising that BBO is similar to GA, because many EAs can be expressed in terms of each other. A previous paper has compared the optimization performance between BBO and GA, and has reported similar results (Simon et al. 2011). On the other hand, the figures show that differences do exist between BBO and GA. This indicates that BBO has its own particular characteristics that are distinct from GA. A BBO individual uses its own 16

Evolutionary Computation

Volume x, Number x

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology Statistical Mechanics Approximation of BBO

1 BBO

0.8

GA 0.6

Cumulant K3

0.4 0.2 m=0.001

0

m=0.001

m=0.01

m=0.1

−0.2

m=0.1 m=0.01

−0.4 −0.6 −0.8 −1

0

20

40 60 Generations

80

100

Figure 9: Comparison of the approximation values of cumulant κ3 between BBO and GA with mutation rate m = 0.1, 0.01 and 0.001 for the 100-bit one-max problem.

fitness before deciding how likely it is to accept features from other candidate solutions. This simple and intuitive idea does not have an analogy in genetics, but is motivated by biogeography. This implies that it is useful to retain the distinction between BBO and GA, and that the biogeographical foundation of BBO can motivate many extensions. Second, Figure 7 shows that the mean fitness of BBO is smaller than that of GA, and Figure 8 shows that the fitness variance of BBO is larger than that of GA. This indicates that GA might have better convergence properties than BBO for high mutation rates for the one-max problem. We should be wary of drawing general conclusions about BBO vs. GA comparisons from these results because the BBO and GA versions used here are basic versions, and the results in this section are limited to the 100-bit one-max problem.

5 Separable Fitness Functions In this section we extend the analysis of Section 3, which was limited to the one-max problem, to a general separable function f (x), which is written as f (x) = f1 (x1 ) + · · · + fn (xn )

(33)

where x=

£

x1

· · · xn

¤

(34)

where xj ∈ I = {0, 1} is bit j of a binary string, and n is the length of this binary string and also the number of features in each candidate solution. Now we consider the effect of migration on the fitness in a single bit position xj . Evolutionary Computation

Volume x, Number x

17

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology H. Ma, D. Simon, M. Fei

The expected fitness of this bit after an immigration trial is computed by (j)

κ1 = E [fj (xj )] =

P i=0,1

fj (xj = i) Pr [xj = i]

= fj (xj = 1) Pr (xj = 1) + fj (xj = 0) Pr (xj = 0) = fj (xj = 1) Pr (xj = 1) + fj (xj = 0) (1 − Pr (xj = 1))

(35)

First, we analyze the probability of xj = 1 as follows. Pr (xj = 1) = Pr(xj = 1|immigration)Pr(immigration) +Pr(xj = 1|no immigration)Pr(no immigration) = Pr(emigrating bit = 1)Pr(immigration) +Pr(xj = 1|no immigration)Pr(no immigration)

(36)

Consider the probability that an emigrating bit is 1, which is proportional to the sum of the emigration rates of all individuals whose jth bit is equal to 1: P µk Pr(emigrating bit = 1) =

k∈ρj (1) N P

(37)

µk

k=1

where ρj (1) = {population indices of individuals whose jth bit is equal to 1}, and N denotes the size of the population. We write Pr(xj = 1|no immigration) = Pr(x0j = 1)

(38)

where x0j denotes the value of bit j before an immigration trial. We write the following probabilities Pr(immigration) and Pr(no immigration), which are the same as (17) in section 3.2, assuming that the immigration curve is shown as Figure 1: κ1 Pr(immigration) = 1 − n+1 κ1 Pr(no immigration) = n+1

(39)

Substituting (37), (38), and (39) into (36), we obtain P µk µ ¶ µ ¶ k∈ρj (1) κ1 κ1 0 Pr (xj = 1) = N 1− + Pr(xj = 1) P n+1 n+1 µk

(40)

k=1

Now consider the probability that the value of bit j is 1 before the immigration trial: Pr(x0j = 1) =

|ρj (1)| N

(41)

where |ρj (1)| denotes the number of individuals in the population whose jth bit is equal to 1. The emigration rate µk , based on the linear migration rate function of Figure 1, is normalized as f (xk ) Fitness of the kth individual (42) = µk = Maximum fitness f max 18

Evolutionary Computation

Volume x, Number x

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology Statistical Mechanics Approximation of BBO

Then we can write N P N X

µk =

k=1

N X k=1

f (xk ) = f max

f (xk )

k=1

=

f max

N · E [f (x)] N κ1 = f max f max

(43)

and P

µk =

P Fitness of the kth individual whose jth bit is 1 Maximum fitness

k

k∈ρj (1) 1 = f max 1 = f max 1 = f max 1 = f max

|ρj (1)| · E [Fitness of individual whose jth bit is 1] |ρj (1)| · (fj (xj = 1) + E [Fitness of remaining (n − 1) bit except the jth bit]) |ρj (1)| · (fj (xj = 1) + (E [f (x)] − E [fj (xj )])) ³ ³ ´´ (j) |ρj (1)| · fj (xj = 1) + κ1 − κ1 (44) Now we substitute (41), (43), and (44) into (40) to obtain P

Pr (xj = 1) = =

µk

k∈ρj (1)

³ 1−

N P

κ1 n+1

´

+ Pr(x0j = 1)

µk k=1 ”” “ “ (j) 1 f max |ρj (1)|· fj (xj =1)+ κ1 −κ1 N κ1 f max “ “ ”” (j) |ρj (1)|· fj (xj =1)+ κ1 −κ1

³ 1−

³

=

µ“

=

|ρj (1)| N

=

|ρj (1)| N

µ

³

1−

κ1 n+1 “

N κ1 “ ”” κ1 (j) fj (xj =1)+ κ1 −κ1 −( n+1 )

κ1 n+1

κ1 n+1

´

´

´ +

|ρj (1)| N

|ρj (1)| N “

³

³

κ1 n+1

´

´

κ1 n+1 ”” ¶ (j) fj (xj =1)+ κ1 −κ1

+

κ1 κ1 κ1 )fj (xj =1)+κ1 −(1− n+1 )κ(j) (1− n+1 1



+

|ρj (1)| N

³

κ1 n+1

´

κ1

(45) Combining (35) and (41), the expected fitness of bit xj before an immigration trial can be written as (j)

κ1, t = E [fj (xj )] = fj (xj = 1) Pr ¡(xj = 1)| 1))|t j = ¢¢ ¢ t + fj (xj = 0) ¡ (1 − Pr¡ (x 0 = fj (xj = 1) Pr x0j = 1 + fj (xj³= 0) 1 − Pr x = 1 ´ j = fj (xj = 1) = fj (1)

|ρj (1)| N

|ρj (1)| N

+ fj (xj = 0) 1 − ³ ´ |ρ (1)| + fj (0) 1 − jN

|ρj (1)| N

(46)

Note that the subscript t denotes a quantity before migration. For ease of notation, we use fj (1) and fj (0) instead of fj (xj = 1) and fj (xj = 0) respectively. Equation (46) can then be written as (j)

κ1,t − fj (0) |ρj (1)| = N fj (1) − fj (0)

(47)

The expected fitness of bit xj after an immigration trial can be written as follows, which Evolutionary Computation

Volume x, Number x

19

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology H. Ma, D. Simon, M. Fei

we obtain by substituting (45) into (35): (j)

κ1, t+1 = E [fj (xj )] = fj (xj µ= 1) Prµ(xj = 1)|t+1 + fj (xj = 0) (1 −¶¶ Pr (xj = 1))|t+1 κ1,t κ1,t )fj (1)+κ1,t −(1− n+1 )κ(j) (1− n+1 |ρj (1)| 1,t = fj (1) N κ1,t µ ¶¶ µ κ1,t κ1,t )fj (1)+κ1,t −(1− n+1 )κ(j) (1− n+1 |ρj (1)| 1,t +fj (0) 1 − N κ1,t ¶ µ κ1,t κ1,t )fj (1)+κ1,t −(1− n+1 )κ(j) (1− n+1 |ρj (1)| 1,t = fj (0) + (fj (1) − fj (0)) N κ1,t

(48)

Note that the subscript t+1 denotes a quantity after migration. Now we can use (47) to obtain ¶ µ κ1,t κ1,t (1− n+1 )fj (1)+κ1,t −(1− n+1 )κ(j) (j) |ρj (1)| 1,t κ1, t+1 = fj (0) + (fj (1) − fj (0)) N κ1,t ³ ´ µ 1− κ1,t f (1)+κ − 1− κ1,t κ(j) ¶ ( n+1 ) j ) 1,t ( (j) 1,t n+1 (49) = fj (0) + κ1,t − fj (0) κ1,t = fj (0) +

“ ”“ ” κ1,t κ1,t (j) κ1,t −fj (0) (1− n+1 )fj (1)+κ1,t −(1− n+1 )κ(j) 1,t κ1,t

Next, we use (7) to obtain n P (j) κ1,t+1 = κ11,t+1 + κ21,t+1 + · · · + κn1,t+1 = κ1,t+1 j=1 µ ³ ´ µ 1− κ1,t f (1)+κ − 1− κ1,t κ(j) ¶¶ n P ( n+1 ) j 1,t ( (j) n+1 ) 1,t = fj (0) + κ1,t − fj (0) κ1,t j=1 µ³ ´ µ 1− κ1,t f (1)+κ − 1− κ1,t κ(j) ¶¶ n n P P ( n+1 ) j 1,t ( (j) n+1 ) 1,t = fj (0) + κ1,t − fj (0) κ1,t j=1 j=1 ´ ´ ´ ³³ ³ ´´ n ³³ P (j) (j) κ1,t κ1,t 1 fj (1) + κ1,t − 1 − n+1 κ1,t = f (0) + κ1,t κ1,t − fj (0) 1 − n+1 j=1 µ ¶ ³ ´P ³ ´2 n (j) (j) (j) κ1,t 1 = κ1,t + κ1,t 1 − n+1 fj (1) κ1,t − κ1,t − fj (0) fj (1) + fj (0) κ1,t j=1

(50) Furthermore, using (41) and (47), we obtain h i h i ³ ´2 (j) (j) 2 2 2 κ2,t = E (fj (xj )) − (E [fj (xj )]) = E (fj (xj )) − κ1,t t ³ ´2 P (j) 2 = (fj (i)) Pr [xj = i]|t − κ1,t j=0,1 ¡ ¢ ¡ ¢ ³ (j) ´2 2 2 = (fj (1)) Pr x0j = 1 + (fj (0)) Pr x0j = 0 − κ1,t ³ ´ ³ ´2 (j) |ρ (1)| 2 2 |ρ (1)| − κ1,t = (fj (1)) jN + (fj (0)) 1 − jN ³ ´2 ³ ´ (j) 2 2 2 |ρj (1)| = (fj (0)) + (fj (1)) − (fj (0)) − κ1,t N ´2 ³ ´ κ(j) −f (0) ³ j (j) 2 2 2 = (fj (0)) + (fj (1)) − (fj (0)) fj1,t (1)−fj (0) − κ1,t ³ ´2 (j) (j) (j) = fj (1) κ1,t + fj (0) κ1,t − fj (1) fj (0) − κ1,t Substituting (51) into (50), we obtain µ ¶ n ¶ µ 1 κ1,t X (j) κ2,t κ1,t κ1,t+1 = κ1,t + 1− κ2,t =κ1,t + 1 − κ1,t n + 1 j=1 n + 1 κ1,t 20

Evolutionary Computation

(51)

(52)

Volume x, Number x

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology Statistical Mechanics Approximation of BBO

Next, we again use (41) and (47) to compute h i h i ¡ ¢3 (j) 3 2 κ3,t = E (fj (xj )) − 3E (fj (xj )) E [(fj (xj ))]t + 2 E [fj (xj )]t 2

(j)

t

t

2

(j)

2

(j)

2

= (fj (1)) κ1,t − (fj (1)) fj (0) + 4fj (1) fj (0) κ1,t − (fj (0)) fj (1) + (fj (0)) κ1,t ³ ´2 ³ ´2 ³ ´3 (j) (j) (j) −3fj (1) κ1,t − 3fj (0) κ1,t + 2 κ1,t (53) We use (45), (47), and (49) to obtain h i (j) 2 κ2,t+1 = E (fj (xj ))

³ ´2 − E [fj (xj )]t+1 t+1 ³ ´2 (j) (j) (j) = fj (1) κ1,t + fj (0) κ1,t − fj (1) fj (0) − κ1,t „



„ ” ««2 “ (j) 2 (j) (j) fj (1)κ1,t +fj (0)κ1,t −fj (1)fj (0)− κ1,t

κ

1,t (1− n+1 )

(κ1,t )2 (j) (j) 2 2 (fj (1)) κ1,t − (fj (1)) fj (0) + 4fj (1) fj (0) κ1,t − fj (1) (fj (0)) κ1,t B ´ ³ ´ ³ ´ ³ B 2 2 3 (1− n+1 )@ (j) (j) (j) 2 (j) + (fj (0)) κ1,t − 3fj (1) κ1,t − 3fj (0) κ1,t + 2 κ1,t + κ1,t 0

2

1 C C A

(54) The derivations of equations (53) and (54) are in the appendix. Substituting (51) and (53) into (54), we obtain µ

(j) κ2,t+1

=

(j) κ2,t

κ1,t − 1− n+1

¶2 Ã

(j)

κ2,t κ1,t

!2

µ ¶ (j) κ3,t κ1,t + 1− n + 1 κ1,t

(55)

which gives κ2,t+1 =

n P j=1

(j) κ2,t+1

³

≈ κ2,t − 1 −

κ1,t n+1

n P

à (j) κ2,t

= j=1 ´2 ³ ´2 κ2,t κ1,t

³

κ1,t n+1

´2 µ κ(j) ¶2

− 1− ³ ´ κ1,t κ3,t + 1 − n+1 κ1,t

2,t

κ1,t

³ + 1−

κ1,t n+1

´ κ(j)

!

3,t

κ1,t

(56)

Note that the derivation of (56) is the same as that of (28) as shown in the appendix. Comparing (52) and (56) with the first two equations of (28) in Section 3.2, we find the cumulants κ1,t+1 and κ2,t+1 are the same. Namely, the dynamics of the mean and the variance are identical for all separable functions. The skewness (the third cumulant κ3,t+1 ) of separable functions is not derived here, but is relegated to future work. Note that higher order cumulants quantify the non-Gaussian content of the distribution. These results indicate that formulations of the dynamics of BBO based on statistical mechanics can model how BBO works not only for the simple one-max problem, but also for general but separable fitness functions. Many complex, non-separable functions have been approximated as separable functions (Simon, 2011). Figures 10 and 11 show comparisons between approximation results based on (52) and (56), and experimental results, for the 30-dimensional sphere function where each independent variable is restricted to 0, 1. We use κ3,t = 0 when iterating (56). Figure 10 shows that the mean approximation based on (52) matches the empirical results quite well. Figure 11 shows that the variance approximation based on (56) matches very well for high mutation rates; when the mutation rate is low the variance approximation still matches the empirical results well, but not as well as for high mutation rates. Evolutionary Computation

Volume x, Number x

21

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology H. Ma, D. Simon, M. Fei

25 Approximation

24

Simulation

m=0.001

23

Cumulant K1

22 21 m=0.01 20 19 18 17 m=0.1

16 15

0

20

40 60 Generations

80

100

Figure 10: BBO approximation and simulation results of cumulant κ1 with mutation rate m = 0.1, 0.01 and 0.001 for the 30-bit sphere function. 8 Approximation 7.8

Simulation

7.6 Cumulant K2

m=0.1 7.4

m=0.1

7.2

m=0.01 m=0.01

7

m=0.001 6.8 m=0.001

6.6 6.4

0

20

40 60 Generations

80

100

Figure 11: BBO approximation and simulation results of cumulant κ2 with mutation rate m = 0.1, 0.01 and 0.001 for the 30-bit sphere function.

6 Conclusion This paper presents formalisms for studying the dynamics of BBO for the one-max problem, and general but separable functions, based on ideas from statistical mechanics. The derived model describes the evolution of the statistical properties of the BBO population fitness. The approximation ability of the statistical mechanics model has been explored, and describes the effect of the BBO migration and mutation operators. These new theoretical results are confirmed with simulation results. In addition, the 22

Evolutionary Computation

Volume x, Number x

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology Statistical Mechanics Approximation of BBO

theoretical results for BBO are compared with a simple GA. This paper has obtained an approximate statistical mechanics model of BBO for separable problems, which can be used to study the effect of BBO tuning parameters on optimization performance. The numerical results in this paper are limited to a 100bit one-max problem, which implies that GA outperforms BBO for that problem, and a 30-bit sphere problem. However, the results in this paper could conceivably be useful as a general tool for BBO tuning parameter tradeoff studies. For future work we suggest several important directions. First, the analysis of this paper is based on the one-max problem and general but separable functions. Future work could explore how well the model predicts performance on separable benchmark functions; how well it predicts performance on nonseparable benchmark functions; and how it could be used to estimate cumulants on more general and complicated optimization problems, including continuous optimization problems and the traveling salesman problem. Second, this paper uses linear migration curves. It is also of interest to combine the statistical mechanics approximation with nonlinear migration curves (Ma, 2010) to explore the evolution of BBO populations in that case. This could provide insight into optimal BBO migration curves. The third important direction for future work is to examine BBO performance variations with population size, immigration and emigration curves, and other BBO parameters, based on the statistical mechanics model of BBO obtained in this paper. The fourth direction is to generalize the statistical mechanics models of BBO and GA into a unified framework, and use the methods of this paper to model other evolutionary algorithms. The fifth direction is to further develop the optimization ability of BBO by using principles from natural biogeography theory to modify the BBO algorithm.

Acknowledgement This material is based upon work supported by the National Science Foundation under Grant No. 0826124, 1344954, and the National Natural Science Foundation of China under Grant No. 61305078, 61074032, 61179041. The comments of the anonymous reviewers were instrumental in improving this paper from its original version.

Evolutionary Computation

Volume x, Number x

23

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology H. Ma, D. Simon, M. Fei

Appendix (j)

(j)

First we derive the cumulants κ2,t and κ3,t of bit j for one-max problem in equation (27). Based on (6) and (26), they are obtained as follows. “ ”2 ˆ ˜ (j) (j) (j) κ2,t+1 = ω2,t+1 − ω1,t+1 = E x2j,t+1 − (E [xj,t+1 ])2 “ ”2 “ ” (j) (j) (j) (j) = E [xj,t+1 ] − (E [xj,t+1 ])2 = κ1,,t+1 − κ1,t+1 = κ1,t+1 1 − κ1,t+1 ! ” κ(j) “ ” “κ(j) ”2 “ κ1,t κ1,t (j) 1,t 1,t = κ1,t + 1 − n+1 − 1 − κ1,t n+1 κ1,t !! ” κ(j) “ ” “κ(j) ”2 “ κ1,t κ1,t (j) 1,t 1,t 1 − κ1,t + 1 − n+1 κ1,t − 1 − n+1 κ1,t =

=

” κ(j) “ ” “κ(j) ”2 “ ”2 “ ” “κ(j) ”2 κ1,t κ1,t κ1,t (j) 1,t 1,t 1,t − 1 − n+1 − κ1,t − 1 − n+1 n+1 κ κ1,t κ1,t ”31,t ”2 ”2 “ “ “ (j) “ ” κ(j) “ ” κ(j) “ ” κ1,t κ1,t κ1,t 2 κ1,t 1,t 1,t + 1 − n+1 − 1 − n+1 − 1 − n+1 κ1,t κ1,t (κ1,t )2 ”2 “κ(j) ”3 “ ” “κ(j) ”3 “ ”2 “κ(j) ”3 “ ” “ (j) ”4 “ κ1,t κ1,t κ1,t 2 κ1,t κ1,t 1,t 1,t 1,t + 1 − + 1 − − 1 − + 1 − n+1 n+1 κ1,t n+1 n+1 (κ1,t )2 (κ )2 (κ1,t )«2 „ ! 1,t ” 2 “ „ (j) 2 ” κ(j) −3“κ(j) ”2 +2“κ(j) ”3 “ ”2 κ(j) “ ”2 « “ 1,t − κ1,t κ1,t κ1,t (j) (j) 1,t 1,t 1,t − 1 − n+1 κ1,t − κ1,t + 1 − n+1 κ1,t (κ )2 (j) κ1,t

“ + 1−

“ (j) = κ2,t − 1 −

κ1,t n+1

”2 „ κ(j) «2 2,t

κ1,t

1,t

“ + 1−

κ1,t n+1



(j) κ3,t

κ1,t

”3 “ (j) (j) (j) (j) (j) κ3,t+1 = ω3,t+1 − 3ω1,t+1 ω2,t+1 + 2 ω1,t+1 ˆ ˜ ˆ ˜ = E x3j,t+1 − 3 · E [xj,t+1 ] · E x2j,t+1 + 2 · (E [xj,t+1 ])3

”3 ”2 “ “ (j) (j) (j) = E [xj,t+1 ] − 3 · (E [xj,t+1 ])2 + 2 · (E [xj,t+1 ])3 = κ1,t+1 − 3 κ1,t+1 + 2 κ1,t+1 ! “ ” κ(j) “ ” “κ(j) ”2 κ1,t κ1,t (j) 1,t 1,t = κ1,t + 1 − n+1 − 1 − κ1,t n+1 κ1,t ! “ ” κ(j) “ ” “κ(j) ”2 2 κ1,t κ1,t (j) 1,t 1,t −3 κ1,t + 1 − n+1 κ1,t − 1 − n+1 κ1,t ! “ “ ” κ(j) “ ” κ(j) ”2 3 κ1,t κ1,t (j) 1,t 1,t +2 κ1,t + 1 − n+1 κ1,t − 1 − n+1 κ1,t “ ” κ(j) “ ” “κ(j) ”2 ”3 “ ” “ (j) ”3 “ κ1,t κ1,t κ1,t 2 κ1,t (j) (j) 1,t 1,t = κ1,t + 1 − n+1 − 1 − + + 2 1 − 2 κ 1,t κ1,t n+1 κ1,t n+1 (κ1,t )2 “ ”2 “κ(j) ”5 “ ” “κ(j) ”3 “ ” “κ(j) ”4 κ1,t κ1,t κ1,t 1,t 1,t 1,t +8 1 − n+1 + 4 1 − n+1 − 8 1 − n+1 κ1,t κ1,t (“κ1,t )”2 “ ”3 “ ” 4 (j) (j) 3 ” ” κ(j) ” “ “ “ κ1,t 2 κ1,t κ1,t κ1,t 3 κ1,t 1,t + + −8 1 − n+1 2 1 − 2 1 − n+1 κ1,t n+1 (κ1,t )2 (κ1,t )3 “ ”3 “κ(j) ”5 ”2 “κ(j) ”3 “ ”2 “κ(j) ”4 “ κ1,t κ1,t κ1,t 1,t 1,t 1,t +8 1 − n+1 + 4 1 − n+1 − 8 1 − n+1 (κ1,t )”3 (κ1,t”)2 (κ1,t”)2 “ “ “ (j) 4 (j) 4 “ ” “ ” κ(j) 4 “ ” κ1,t 3 κ1,t κ1,t κ1,t 3 κ1,t 1,t −8 1 − n+1 − 4 1 − 3 − 4 1 − n+1 κ n+1 1,t (κ1,t ) (κ1,t“)3 ” (j) 5 “ ”3 “κ(j) ”6 “ ”2 “κ(j) ”4 ” “ κ1,t κ1,t κ1,t 2 κ1,t 1,t 1,t −16 1 − n+1 − 8 1 − + 16 1 − n+1 n+1 (κ1,t )3 (κ1,t )2 (κ1,t )2 “ ”3 “κ(j) ”5 “ ”2 “ ”2 3“κ(j) ”2 “ ” “ (j) ”4 κ1,t κ1,t κ1,t 2 κ1,t (j) 1,t 1,t +16 1 − n+1 − 3 κ1,t − 1 − n+1 − 12 1 − n+1 (κ1,t )3 (κ1,t )2 (κ1,t )2 “ ” “κ(j) ”2 ” “κ(j) ”3 ”2 “κ(j) ”3 “ “ κ1,t κ1,t κ1,t 1,t 1,t 1,t −6 1 − n+1 + 12 1 − n+1 + 12 1 − n+1 κ1,t κ1,t (κ1,t )2

24

Evolutionary Computation

Volume x, Number x

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology Statistical Mechanics Approximation of BBO

„ “ ”2 “ ”3 « “ (j) (j) (j) = κ1,t − 3 κ1,t + 2 κ1,t + 1− “ + 1−

κ1,t n+1

“ + 1−

κ1,t n+1

κ1,t n+1

” ! ” “ ” “ “ (j) 4 (j) 3 (j) (j) 2 κ1,t −7 κ1,t +12 κ1,t −6 κ1,t



κ1,t

“ ” “ ” “ ” “ ” ! (j) 2 (j) 3 (j) 4 (j) 5 −3 κ1,t +12 κ1,t −15 κ1,t +6 κ1,t

”2

2

(κ1,t ) “ ” “ ” “ ” “ ” ! (j) 3 (j) 4 (j) 5 (j) 6 2 κ1,t −6 κ1,t +6 κ1,t −2 κ1,t

”3

(κ1,t )3 „ « “ ”2 “ ”3 “ (j) (j) (j) = κ1,t − 3 κ1,t + 2 κ1,t +2 1− “ −3 1 − “ + 1−

κ1,t n+1

„ “ ” «3 (j) 2 ” κ(j) 1,t − κ1,t κ1,t 3 n+1 (κ1,t )3 „ “ ” «„ “ ” “ ” « (j) 2 (j) (j) 2 (j) 3 ”2 κ(j) κ1,t −3 κ1,t +2 κ1,t 1,t − κ1,t 2

(κ1,t ) ” ! ” “ ” “ “ (j) 4 (j) 3 (j) 2 κ1,t +12 κ1,t −6 κ1,t



κ1,t n+1

(j) κ1,t −7

κ1,t

“ (j) = κ3,t + 2 1 −

κ1,t n+1

”3 „ κ(j) «3 2,t

κ1,t

“ −3 1−

κ1,t n+1

”2

(j) (j)

κ2,t κ3,t 2

(κ1,t )

“ + 1−

κ1,t n+1



(j)

κ4,t

κ1,t

Next we derive the cumulants κ1 , κ2 and κ3 of the population after migration in equation (28):

κ1,t+1 =

n P j=1

(j) κ1,t+1

=

n P j=1

(j) κ1,t

” n “ P κ1,t 1 − n+1 j=1 j=1 “ ”P n n P κ1,t (j) = κ1,t + 1 − n+1 =

n P

(j)

κ1,t +

j=1

κ2,t+1 =

n P j=1

=

j=1

(j) κ2,t+1

n P

=

j=1

” “ n “ P (j) κ2,t + 1 −

j=1

“ = κ2,t + 1 − “ ≈ κ2,t + 1 − “ = κ2,t − 1 −

κ1,t n+1



κ1,t n+1

κ3,t κ1,t

(j) κ2,t

“ + 1−

κ1,t n+1 ” «



(j)

κ1,t

κ1,t

“ − 1−

κ1,t n+1



“ ” ! (j) 2 κ1,t κ1,t

„ “ (j) (j) 2 κ1,t − κ1,t κ1,t (j) κ2,t

κ1,t

“ = κ1,t + 1 −

“ − 1−

”P n



j=1

“ − 1−

κ1,t n+1

« (j)

κ3,t

κ1,t

κ1,t n+1

”2 „ κ(j) «2 2,t

κ1,t

“ − 1−

κ1,t n+1

n “ (j) ”2 P κ2,t ”2



κ2,t κ1,t

“ + 1−

”2 P n j=1

κ1,t n+1



(j)

!

κ3,t

2κ1,t

“ ” ! (j) 2 κ2,t

(κ1,t )2

j=1

(κ1,t )2 ! n P

(j) κ2,t “ ” κ1,t κ3,t κ1,t 2 j=1 − 1 − 2 n+1 κ1,t n+1 κ ”2 “ ”2 “ ”( 1,t ) κ1,t κ1,t κ2,t κ3,t + 1 − n+1 n+1 κ1,t κ1,t



κ1,t n+1

2

where the approximation is valid if κ2,t < κ1,t (that is, the variance is less than the Evolutionary Computation

Volume x, Number x

25

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology H. Ma, D. Simon, M. Fei

mean), in which case the square term has only a small effect on the final result. κ3,t+1 =

n P

(j)

j=1

=

n P

(j) κ3,t

κ3,t+1

“ +2 1−

κ1,t n+1

”3 „ κ(j) «3 2,t

“ −3 1−

κ1,t n+1

”2

(j) (j)

κ2,t κ3,t

“ + 1−

κ1,t n+1

(κ1,t )2 ” n ” n “ (j) ”3 “ “ “ (j) (j) n P κ1,t 2 P 3κ2,t κ3,t κ1,t 3 P κ2,t (j) = κ3,t − 1 − n+1 +2 1 − 1− 2 3 + n+1 j=1 (κ1,t ) j=1 (κ1,t ) j=1 j=1

κ1,t

n P

(j) κ2,t

n P

n P

(j) κ3,t

(j) κ2,t



”P n

” ” “ “ “ κ1,t 2 j=1 κ1,t 3 j=1 j=1 ≈ − 3 1 − n+1 + 2 1 − n+1 + 1− 2 3 κ κ ( 1,t ) ( 1,t ) j=1 “ ” “ ” ” ” “ “ κ1,t 3 κ2,t 3 κ4,t κ1,t 2 κ2,t κ3,t κ1,t = κ3,t + 2 1 − n+1 − 3 1 − n+1 1 − n+1 2 + κ1,t κ1,t (κ1,t ) n P

(j) κ3,t

!

κ1,t

κ1,t n+1

!3

(j)

κ4,t

j=1 κ1,t n+1

(j)

κ4,t

κ1,t



n P

(j)

j=1

κ4,t

κ1,t

where the approximation is valid if κ2,t κ3,t < κ21,t (that is, the variance and skewness are both less than the mean). (j) (j) Finally we derive the cumulants κ3,t and κ2,t+1 of bit j for the separable function in equations (53) and (54), which are obtained as follows. ˆ ˜ ˆ ˜ ` ´3 (j) κ3,t = E (fj (xj ))3 t − 3E (fj (xj ))2 t E [(fj (xj ))]t + 2 E [fj (xj! )]t ”3 “ P P (j) (j) (fj (xj ))3 Pr [xj = i]|t − 3 (fj (xj ))2 Pr [xj = i]|t κ1,t + 2 κ1,t = j=0,1 `j=0,1 ´ = (fj (1))3 Pr [xj = 1]|t + (fj (0))3 Pr [xj = 0]|t ”3 “ ` ´ (j) (j) −3 (fj (1))2 Pr [xj = 1]|t + (fj (0))2 Pr [xj = 0]|t κ1,t + 2 κ1,t ` ` ´ ˆ ˜´ = (fj (0))3 + (fj (1))3 − (fj (0))3 Pr x0j = 1 ”3 “ ` ` ´ ˆ ˜´ (j) (j) −3 (fj (0))2 + (fj (1))2 − (fj (0))2 Pr x0j = 1 κ1,t + 2 κ1,t „ « ` 3 3 3 ´ |ρj (1)| = (fj (0)) + (fj (1)) − (fj (0)) N „ « “ ”3 ´ ` |ρj (1)| (j) (j) −3 (fj (0))2 + (fj (1))2 − (fj (0))2 κ1,t + 2 κ1,t N „ « ` ´ κ(j) −fj (0) = (fj (0))3 + (fj (1))3 − (fj (0))3 fj1,t (1)−fj (0) « „ “ ”3 (j) ` ´ κ −fj (0) (j) (j) κ + 2 κ −3 (fj (0))2 + (fj (1))2 − (fj (0))2 fj1,t 1,t 1,t (1)−fj (0) ”” “ ` ´ “ (j) 3 2 = (fj (0)) + (fj (1)) + fj (1) fj (0) + (fj (0))2 κ1,t − fj (0) ”” “ ”3 “ “ (j) (j) (j) −3 (fj (0))2 + (fj (1) + fj (0)) κ1,t − fj (0) κ1,t + 2 κ1,t “ ” (j) (j) (j) = (fj (1))2 κ1,t − (fj (1))2 fj (0) + fj (1) fj (0) κ1,t − fj (1) (fj (0))2 + (fj (0))2 κ1,t “ ” “ ”3 (j) (j) (j) (j) −3 fj (1) κ1,t + fj (0) κ1,t − fj (1) fj (0) κ1,t + 2 κ1,t (j)

(j)

(j)

= (fj (1))2 κ1,t − (fj (1))2 fj (0) + fj (1) fj (0) κ1,t − fj (1) (fj (0))2 + (fj (0))2 κ1,t “ ”2 “ ”2 “ ”3 (j) (j) (j) (j) −3fj (1) κ1,t − 3fj (0) κ1,t + 3fj (1) fj (0) κ1,t + 2 κ1,t (j)

(j)

(j)

= (fj (1))2 κ1,t − (fj (1))2 fj (0) + 4fj (1) fj (0) κ1,t − (fj (0))2 fj (1) + (fj (0))2 κ1,t “ ”2 “ ”2 “ ”3 (j) (j) (j) −3fj (1) κ1,t − 3fj (0) κ1,t + 2 κ1,t

26

Evolutionary Computation

Volume x, Number x

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology Statistical Mechanics Approximation of BBO

“ ”2 ˆ ˜ ` ´2 ˆ ˜ (j) (j) κ2,t+1 = E (fj (xj ))2 t+1 − E [fj (xj )]t+1 = E (fj (xj ))2 t+1 − κ1,t+1 “ ”2 P (j) = (fj (xj ))2 Pr [xj = i]t+1 − κ1,t+1 j=0,1 “ ”2 (j) = (fj (1))2 Pr [xj = 1]t+1 + (fj (0))2 Pr [xj = 0]t+1 − κ1,t+1 “ ”2 ` ´ (j) = (fj (0))2 + (fj (1))2 − (fj (0))2 Pr [xj = 1]t+1 − κ1,t+1 “ „ « „ “ κ1,t ” κ1,t ” (j) « ` ´ κ(j) κ1,t 1− n+1 fj (1)+κ1,t − 1− n+1 1,t −fj (0) = (fj (0))2 + (fj (1))2 − (fj (0))2 fj (1)−fj (0) κ1,t “ “ ”““ „ κ1,t ” (j) ” «2 κ1,t ” (j) fj (1)+κ1,t − 1− n+1 κ1,t κ −fj (0) 1− n+1 − fj (0) + 1,t κ1,t “ „ (j) « „ “ κ1,t ” κ1,t ” (j) « κ1,t 1− n+1 fj (1)+κ1,t − 1− n+1 k −fj (0) = (fj (0))2 + ((fj (1) − fj (0)) (fj (1) + fj (0))) fj1,t (1)−fj (0) κ1,t 0 “ ”2 ““ “ κ1,t ”2 “ (j) κ1,t ” κ1,t ” (j) ”2 1 1−

κ

−fj (0)

1−

fj (1)+κ1,t − 1−

κ

1,t 1,t 2 n+1 n+1 n+1 B (fj “(0)) +” C (k1,t )2 “ ”““ “ −@ A κ1,t κ1,t ” κ1,t ” (j) ” (j) 2 1− n+1 fj (0) k1,t −fj (0) 1− n+1 fj (1)+κ1,t − 1− n+1 κ1,t + k 1,t ” (f (1)+f (0))“k(j) −f (0)”““1− κ1,t ”f (1)+κ1,t −“1− κ1,t ”κ(j) ” “ j j j j κ1,t 1,t 1,t n+1 n+1 = 1 − n+1 k1,t “ ”2 ““ ” “ ” ”2 κ κ (j) (j) “ ” 1,t 1,t 1− n+1 fj (1)+κ1,t − 1− n+1 κ1,t κ1,t 2 k1,t −fj (0) − 1 − n+1 2 (k1,t ) “ ” f (0)“k(j) −f (0)”““1− κ1,t ”f (1)+κ1,t −“1− κ1,t ”κ(j) ” j j j κ1,t 1,t 1,t n+1 n+1 − 2 1 − n+1 k1,t “ ” ”““ ” k(j) −f (0) 1− κ1,t f (1)+κ1,t −“1− κ1,t ”κ(j) ”(f (1)−f (0)) “ j j j j κ1,t 1,t 1,t n+1 n+1 = 1 − n+1 k1,t 11 0 0 “ ”2 κ1,t 2 2 (f (1)) + (k ) 1 − CC B B j 1,t n+1 CC B B CC B„“ ”2 ”2 “ “ ” “ «B ”2 CC B B 2 κ κ (j) (j) (j) 1,t 1,t CC B B + 1− k + f (0) −2f (0)k ( ) j j + k 2 1 − k f (1) CC B 1,t 1,t B 1,t j 1,t n+1 n+1 CC B B “ “ ” ” CC B B 2 AA @ @ κ1,t κ1,t (j) (j) −2 1 − n+1 fj (1) k1,t − 2 1 − n+1 k1,t k1,t − (k1,t )2 ” (f (1))2 k(j) −f (1)“k(j) ”2 +f (0)“k(j) ”2 −f (0)(f (1))2 +f (1)(f (0))2 −(f (0))2 k(j) “ j j j j j j j j κ1,t 1,t 1,t 1,t 1,t = 1 − n+1 k1,t

(j)

(j)

+fj (1) k1,t − fj (0) k1,t − fj (0) fj (1) + (fj (0))2 0 ”3 1 ”4 “ ”2 “ “ (j) (j) (j) 2 − 2f (1) k + k k C B (fj (1)) j 1,t 1,t 1,t C B C B ”3 “ C B (j) (j) 2 C B −2f (0) − 2f (0) k (f (1)) k C B j j j 1,t 1,t C B “ ” B C 2 C B 2 2 C B +4f (0) f (1) k (j) + (f C B j (0)) (fj (1)) j j 1,t C B “ ” C B 2 A @ (j) (j) 2 2 “ ”2 + (f (0)) k − 2 (f (0)) (f (1)) k j j j 1,t 1,t κ1,t − 1 − n+1 (k1,t )2 ”2 “ (j) (j) 2 − k1,t + 2fj (0) k1,t − (fj (0)) 0 “ ”2 “ ”3 (j) (j) (j) − 2 k − 4fj (0) fj (1) k1,t B 2fj (1) k1,t 1,t B “ ” B 2 @ (j) (j) “ ” +4fj (0) k1,t + 2fj (1) (fj (0))2 − 2 (fj (0))2 k1,t κ1,t − 1 − n+1 k “ 1,t”2 (j) (j) (j) = fj (1) κ1,t + fj (0) κ1,t − fj (1) fj (0) − κ1,t “ − 1− “ + 1−

κ1,t n+1

κ1,t n+1

”2

1 C C C A

„ “ ” «2 (j) (j) (j) 2 fj (1)κ1,t +fj (0)κ1,t −fj (1)fj (0)− κ1,t 2

(κ1,t ) (j) (j) 2 + 4fj (1) fj (0) κ1,t − fj (1) (fj (0))2 B (fj (1)) κ1,t − (fj (1)) fj (0) “ ”2 “ ”2 “ ”3 B @ (j) (j) (j) (j) + (fj (0))2 κ1,t − 3fj (1) κ1,t − 3fj (0) κ1,t + 2 κ1,t 0



Evolutionary Computation

2

1 C C A

κ1,t

Volume x, Number x

27

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology H. Ma, D. Simon, M. Fei

References Ahn, C. (2006).Advances in Evolutionary Algorithms: Theory, Design and Practice. New York: Springer Publishing. Bhattacharya, A. and Chattopadhyay, P. (2010). Hybrid differential evolution with biogeographybased optimization for solution of economic load dispatch, IEEE Transactions on Power Systems, 25:1955-1964. Boussaid, I., Chatterjee, A., Siarry, P. and Ahmed-Nacer, M. (2011a). Hybridizing biogeographybased optimization with differential evolution for optimal power allocation in wireless sensor networks, IEEE Transactions on Vehicular Technology, 60: 2347-2353. Boussaid, I., Chatterjee, A., Siarry, P., and Ahmed-Nacer, M. (2011b). Two-stage update biogeography based optimization using differential evolution algorithm (DBBO), Computers & Operations Research, 38: 1188-1198. Boussaid, I., Chatterjee,A., Siarry, P., and Ahmed-Nacer, M. (2012). Biogeography based optimization for constrained optimization problems, Computers & Operations Research, 39: 3293-3304. Chatterjee, A., Siarry, P., Nakib, A. and Blanc, R. (2012). An improved biogeography based optimization approach for segmentation of human head ct-scan images employing fuzzy entropy, Engineering Applications of Artificial Intelligence, 25: 1698-1709. Costa e Silva, M., Coelho, L., and Lebensztajn, L. (2012). Multiobjective biogeography-based optimization based on predator-prey approach, IEEE Transactions on Magnetics, 48: 951-954. Du, D., Simon, D., and Ergezer, M. (2009). Biogeography-based optimization combined with evolutionary strategy and immigration refusal, In IEEE Conference on Systems, Man, and Cybernetics, Pages 1023-1028, San Antonio, Texas. Ergezer, M., Simon, D., and Du, D. (2009). Oppositional biogeography-based optimization, In IEEE Conference on Systems, Man, and Cybernetics, pages 1035-1040, San Antonio, Texas. Goel, L., Gupta, D., and Panchal, V. (2012). Extended species abundance models of biogeography based optimization, In International Conference on Computational Intelligence, Modelling and Simulation, pages 7-12, Kuantan, Malaysia. Gong, W., Cai, Z., and Ling, C. X. (2011). DE/BBO: A hybrid differential evolution with biogeography-based optimization for global numerical optimization, Soft Computing, 15:645665. Gong, W., Cai, Z., Ling, C. X.and Li, H. (2010). A real-coded biogeography-based optimization with mutation, Applied Mathematics and Computation, 216: 2749-2758. Grinstead, C. and Snell, J. (1997). Introduction to Probability. Providence, Rhode Island: American Mathematical Society. Hadidi, A. and Nazari, A. (2013). Design and economic optimization of shell-and-tube heat exchangers using biogeography-based (BBO) algorithm, Applied Thermal Engineering, 51:12631272. Jamuna, K. and Swarup, K. (2012). Multi-objective biogeography based optimization for optimal PMU placement, Applied Soft Computing, 12: 1503-1510. Kankanala, P., Srivastava, S., Srivastava, A. and Schulz, N. (2012). Optimal control of voltage and power in a multi-zonal MVDC shipboard power system, IEEE Transactions on Power Systems, 27: 642-650.

28

Evolutionary Computation

Volume x, Number x

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology Statistical Mechanics Approximation of BBO

Li, X. and Yin, M. (2012). Multi-operator based biogeography based optimization with mutation for global numerical optimization, Computers & Mathematics with Applications, 64: 2833-2844. Lohokare, M., Panigrahi, B., Pattnaik, S., Devi, S. and Mohapatra, A. (2012). Neighborhood search-driven accelerated biogeography-based optimization for optimal load dispatch, IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 42: 641-652. Ma, H. (2010). An analysis of the equilibrium of migration models for biogeography-based optimization, Information Sciences, 180: 3444-3464. Ma, H., Ni, S. and Sun, M. (2009). Equilibrium species counts and migration model tradeoffs for biogeography-based optimization, In IEEE Conference on Decision and Control, Pages 3306-3310, Shanghai, China. Ma, H., and Simon, D. (2011). Blended biogeography-based optimization for constrained optimization, Engineering Applications of Artificial Intelligence, 24: 517-525. Nix, A. and Vose, M. (1992). Modeling genetic algorithms with Markov chains, Annals of Mathematics and Artificial Intelligence, 5: 79-88. Panchal, V., Singh, P., Kaur, N. and Kundra, H. (2009). Biogeography based satellite image classification, International Journal of Computer Science and Information Security, 6: 269-274. ¨ Prugel-Bennett, A. and Shapiro, J. L. (1994). An analysis of genetic algorithms using statistic mechanics, Phys. Rev. Lett., 72: 1305-1324. Pr¨ ugel-Bennett, A. (1997). Modelling evolving populations, Journal of Theoretical Biology, 185: 81-95. Rahmati, S. and Zandieh, M. (2012). A new biogeography-based optimization (BBO) algorithm for the flexible job shop scheduling problem, The International Journal of Advanced Manufacturing Technology, 58: 1115-1129. Rarick, R. Simon, D. Villaseca, F. E. and Vyakaranam, B. (2009). Biogeography-based optimization and the solution of the power flow problem, In IEEE Conference on Systems, Man, and Cybernetics, Pages 1029-1034, San Antonio, Texas. Rattray, M. (1996). Modeling the dynamics of genetic algorithms using statistical mechanics, PhD Thesis, University of Manchester. Rattray, M. (1995). The dynamics of a genetic algorithm under stabilizing selection, Complex Systems, 9: 213-234. Rattray, M. and Shapiro, J. L. (1996). The dynamics of a genetic algorithm for a simple learning problem, Journal of Physics A, 29: 7451-7473. Reeves, C. and Rowe, J. (2003). Genetic Algorithms: Principles and Perspectives, Pages 173-200, Norwell, Massachusetts: Kluwer Academic Publisher. Shapiro, J. L., Prgel-Bennett A. and Rattray, M. (1994). A statistical mechanical formulation of the dynamics of genetic algorithms, Lecture Notes in Computer Science, 865: 17-27. Simon, D. (2008). Biogeography-based optimization, IEEE Transactions on Evolutionary Computation, 12: 702-713. Simon, D., Ergezer, M. and Du, D. (2009). Population distributions in biogeography-based optimization algorithms with elitism, In IEEE Conference on Systems, Man, and Cybernetics, Pages 1017-1022, San Antonio, Texas.

Evolutionary Computation

Volume x, Number x

29

Evolutionary Computation Just Accepted MS. doi:10.1162/EVCO_a_00160  by the Massachusetts Institute of Technology H. Ma, D. Simon, M. Fei

Simon, D., Ergezer, M., Du, D. and Rarick, R. (2011a). Markov models for biogeography-based optimization, IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics, 41: 299-306. Simon, D., Rarick, R., Ergezer, M. and Du, D. (2011b). Analytical and numerical comparisons of biogeography-based optimization and genetic algorithms, Information Sciences, 181: 1224-1248. Simon, D. (2011). A probabilistic analysis of a simplified biogeography-based optimization algorithm, Evolutionary Computation, 19: 167-188. Singh, U., Singla, H. and Kamal, T. (2010). Design of Yagi-Uda antenna using biogeography based optimization, IEEE Transactions on Antennas and Propagation, 58: 3375-3379. Suzuki, J. (1995). A Markov chain analysis on simple genetic algorithms, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 25: 655-659. Van Laarhoven, P. J., Aarts, E. H. (1987). Simulated Annealing: Theory and Applications, New York: Springer Publishing. Vose, M. D. (1999). The Simple Genetic Algorithm: Foundations and Theory. The MIT Press. Wang, L. and Xu, Y. (2011). An effective hybrid biogeography-based optimization algorithm for parameter estimation of chaotic systems, Expert Systems with Applications, 38: 15103-15109. Yao, X., Liu,Y., and Lin, G. (1999). Evolutionary programming made faster, IEEE Transactions on Evolutionary Computation, 3: 82-102.

30

Evolutionary Computation

Volume x, Number x

Statistical Mechanics Approximation of Biogeography-Based Optimization.

Biogeography-based optimization (BBO) is an evolutionary algorithm inspired by biogeography, which is the study of the migration of species between ha...
2MB Sizes 2 Downloads 6 Views