This post assumes the reader has a prequisite knowledge about Bayesian optimization. Recall that Bayesian optimization is a zero-th order optimization method which aims to find the optimum of a black-box function . Bayesian optimization requires two important components, that is the surrogate model and the acquisition function. Due to black-box nature of , we introduce a surrogate function and instead perform the optimization w.r.t. this new function. In common settings, we utilize a Gaussian process (GP) as our surrogate model. The motivation comes from the Bayesian philosophy where we start with a belief and iteratively update it as we encounter data which explains the true distribution called the posterior distribution.
This post is focused on the second component, that is the acquisition function. This function governs the next input to be evaluated, with denotes the particular time step of the acquisition. The acquired data has the optimum statistical properties w.r.t. our surrogate model, i.e. the expectation, entropy, etc. Now, let us narrowing our scope again into two specific acquisition functions, that is the probability improvement and the expected improvement.
Probability Improvement
We first give the definition of improvement. Given the incummbent best value and an arbitrary input , we define the improvement as
Note that we abuse the notation for a while as previously denotes the global optimum of the function . It is obvious that and . Recall that we employ GP as the proxy of . For a particular data , the function value follows a Gaussian distribution
Commonly we perform reparameterization-trick to draw samples from . First, we introduce a random variable drawn from a standard normal distribution . It is known that drawing samples from such distribution is relatively easy. Leveraging this random variable, we obtain a new sample . Substituting the new definition will give us
The probability improvement evaluates how likely the candidate gives us a positive improvement. Recall that we evaluate the probability w.r.t. . Mathematically, we can write as
By applying the additive and constant scaling properties of normal distribution, we obtain the following analytical form
with and
Expected Improvement
Unlike the probability improvement, the expected improvement (EI) (as the name suggests) aims to evaluate the expected value of over . Intuitively, this criterion evaluates the average magnitude of the improvement.
Substituting the definition of probability improvement, we then obtain
In order to compute the integral, we need to get rid of the operator. First, we decompose the integral into two parts. The first part is where and the later part is where . To set the bound for each integral, recall that we can perform the reparameterization trick to rewerite , that is . Thus, we can write as
Observe that the first term vanishes to since we have . Therefore, we only need to evaluate the second part of the integral.
The last row comes from the fact that the normal density is symmetric, i.e., . takes high value when As a side note, requires the uncertainty since . Finally, we intoroduce a hyperparameter which control the degree of exploration
Note that
References
- Kamperis, S. (2021) Acquisition functions in Bayesian Optimization, https://ekamperi.github.io/machine%20learning/2021/06/11/acquisition-functions.html.