## Abstract

Bayesian optimization (BO) is an efiective surrogate-based method that has been widely used to optimize simulation-based applications. While the traditional Bayesian optimization approach only applies to single-fidelity models, many realistic applications provide multiple levels of fidelity with various computational complexity and predictive capability. In this work, we propose a multi-fidelity Bayesian optimization method for design applications with both known and unknown constraints. The proposed framework, called sMF-BO-2CoGP, is built on a multi-level CoKriging method to predict the objective function. An external binary classifier, which we approximate using a separate CoKriging model, is used to distinguish between feasible and infeasible regions. The sMF-BO-2CoGP method is demonstrated using a series of analytical examples, and a fiip-chip application for design optimization to minimize the deformation due to warping under thermal loading conditions.

## 1 Introduction

High-fidelity engineering models are frequently utilized to predict quantities of interests, such as properties or performances, with respect to a specific design. These predictions are then fed back into the design process to find a better design that outperforms the previous ones by changing the design parameters. This process, often called design optimization, is ubiquitous in industrial settings. Simulation-based optimization is challenging due to the tremendous computational cost associated with high-fidelity models. However, for many practical applications, multiple models of various fidelity can be developed and a multi-fidelity optimization framework, such as Bayesian optimization (BO), can be then applied to optimize the objective function at the highest level of fidelity, but at a reduced computational cost by leveraging lower-fidelity data.

Multi-fidelity methods provide an efiective framework to reduce the computational cost in optimization and uncertainty quantification by leveraging the correlations with lower-fidelity models to reduce the computational burden on high-fidelity models. Multi-fidelity frameworks are particularly practical for engineering simulation-based applications, such as computational fiuid dynamics and solid mechanics problems, because most of these involve discretizations (spatial and/or temporal), where a finer discretization typically corresponds to a higher level of fidelity and the coarser discretization corresponds to a lower level of fidelity.

Incorporating physical and practical constraints into the optimization formulation is also a critical task. Digabel and Wild [1] proposed the QRAK taxonomy to classify constrained optimization problems. In engineering settings, constraints arise from multiple sources and many problems require both known and unknown constraints to be incorporated into the formulation. Constraints are known if the feasibility of the input can be determined directly from the input sampling location, i.e., without actually evaluating the model. Such known constraints are often formulated as a set of inequalities. On the other hand, constraints are unknown if the feasibility of the input must be evaluated indirectly by evaluating the model. Common examples of unknown constraints are ill-conditioning induced by the parameters, singularity in the design, and mesh generation problems. These constraints are implicitly imposed and feasibility cannot be determined without evaluating the computational model.

Gaussian process (GP) methods provide an eficient framework to model a response surface that approximates the objective function for single-fidelity formulations. In the traditional BO approach, an acquisition function a(x) is constructed based on a utility function, which rewards the BO method if the new sampling location outperforms the rest. The acquisition function is typically constructed based on the posterior mean and posterior variance of the GP. Because of its fiexibility, many extensions based on the traditional BO framework have been proposed to solve other optimization problems, including both constrained and multi-fidelity problems. Incorporating constraints is a well-studied subject in the context of BO methods and typically involves adopting a penalty scheme to penalize the infeasible sampling locations that do not satisfy all of the constraints. Multi-fidelity BO problems are more complicated to deal with. To generalize to multiple levels of fidelity, one needs to consider the correlation between levels of fidelity from the objective function and fuse the predictions across all levels of fidelity. For example, Kennedy and O'Hagan [2] proposed an auto-regressive approach to form a link between lower-fidelity to the next higher-fidelity by a linear regression between two levels of fidelity. The terms CoGP and CoKriging are used interchangeably in this work to describe the recursive auto-regressive GP model. Because the constrained problems have been relatively well studied, we will focus the literature review in Sec. 2 on multi-fidelity GP.

In this work, we develop a sequential constrained multi-fidelity method sMF-BO-2CoGP, as an extension of sBF-BO-2CoGP [3], using a CoKriging approach to approximate the objective function at the highest level of fidelity. The known constraints are implemented by penalizing the acquisition function directly for infeasible input sampling locations. The unknown constraints are adaptively learned using another CoKriging model, which acts as a probabilistic binary classifier. The unknown-constrained acquisition function is conditioned on this predicted probability mass function, in addition to the penalty scheme for known constraints. The optimal location for the next sample is determined by maximizing the constrained acquisition function. Next, an uncertainty reduction scheme, where uncertainty is measured by the integrated mean-square error, is proposed to determine the appropriate level of fidelity to evaluate. Compared to the maximum mean square error criteria, the integrated mean-square error is demonstrated to be more robust and eficient.

The content of this paper is invited following the conference paper presented at the ASME IDTEC CIE 2020 (August 18{21, 2019) at Anaheim, CA [3]. The main difierence is that this paper generalizes our previous work [3] from bi-fidelity to multi-fidelity problems. The remainder of this paper is organized as follows. Section 2 provides a brief introduction to the BO method. Section 3 describes the multi-fidelity sMF-BO-2CoGP method proposed in this paper, including the constrained acquisition function, the fidelity selection criteria. Section 4 demonstrates the application of the proposed sMF-BO-2CoGP methodology using several analytical examples and an engineering application in designing a fiip-chip package based on the finite element model. Section 5 discusses and Sec. 6 concludes the paper.

## 2 Related Works

Let f denote a function of x, where $x∈X$ is a d-dimensional input and y is the observation. The optimization considered in this paper is formulated as
$argmaxx∈Xf(x)$
(1)
subjected to a set of inequality constraints
$gj(x)≤0,j=1,…,J$
(2)
where J is the number of inequality constraints.

We briefiy review the classical BO method, the CoKriging method, and the most common acquisition functions in BO in Secs. 2.1, 2.2, and 2.3, respectively. Readers are referred to other comprehensive reviews and tutorials [47] for rigorous literature reviews on GP and BO methods and its variants.

### 2.1 Gaussian Process.

In this section, we followed the notation of Shahriari et al. [5] in the GP formulation. Let $D=(xi,yi)i=1n$ denote the dataset, where n is the number of observations and $x∈X$ is the d-dimensional input. A GP regression approach assumes that f = f1:n is jointly Gaussian, and the observation y is normally distributed given f,
$f|X∼N(m,K)$
(3)
$y|f,σ2∼N(f,σ2I)$
(4)
where miμ(xi) and Ki,jk(xi, xj).
The choice of the kernel K depends on the covariance between inputs. At an unknown sampling location x, the predicted response is described by a posterior Gaussian distribution, where the posterior mean is
$μn(x)=μ0(x)+k(x)T(K+σ2I)−1(y−m)$
(5)
and the posterior variance is
$σn2=k(x,x)−k(x)T(K+σ2I)−1k(x)$
(6)
where $μ0(x):x∈X↦R$ is the prior mean function and $k:X×X↦R$ is the covariance function between the query point x and x1:n. The classical GP formulation assumes a stationary covariance matrix, which only depends on the distance $r=||x−x′||$, where $||⋅||$ is usually the classical L2-norm, but other choices, e.g., the L1-norm and weighted variants, have also been explored. Common kernels for GP include [5]
$kMate´rn1(x,x′)=θ02exp(−r)kMate´rn3(x,x′)=θ02exp(−3r)(1+3r)kMate´rn5(x,x′)=θ02exp(−5r)(1+5r+53r2)ksq-exp(x,x′)=θ02exp(−12r2)$
The log-likelihood function can be written as
$logp(y|x1:n,θ)=−n2log(2π)−12log|Kθ+σ2I|−12(y−mθ)T(Kθ+σ2I)−1(y−mθ)$
(7)
Optimizing the log-likelihood function yields the optimal hyper-parameter θ at the computational cost of $O(n3)$ due to the inversion of the covariance matrix.

### 2.2 Multi-Fidelity CoKriging.

One of the advantages of CoKriging is that it can exploit the correlation between low- and high-fidelity and improve the prediction at high-fidelity level by adding more low-fidelity training data points. If the computational costs of evaluation between the high- and low-fidelity difier significantly, this advantage ofiers a reduction in the number of training data points, thus increasing the eficiency of the optimization problem. Kennedy and O'Hagan [2] proposed an auto-regressive model that couples all levels of fidelity together. Le Gratiet and Garnier [8] proposed a nested scheme $D1⊂D2⊆⋯⊆Ds$ to decouple s levels of fidelity into s standard levels of GP regression, where the GP is used to model the discrepancy between two consecutive levels of fidelity. Karniadakis et al. [911] employed the same method to approximate the highest level of fidelity and extended for noisy evaluations using the same method. Perdikaris et al. [12] proposed a generalized framework that can model nonlinear and space-dependent cross-correlations between models of variable fidelity. The multi-fidelity Bayesian optimization approach is sometimes known as multi-information source optimization [13] or multi-task Bayesian optimization [14]; they are all closely related to each other. For example, Ghoreishi and Allaire have proposed several approaches to solve the multi-information source optimization problem in the context of constraints [15], knowledge-gradient acquisition function [16], Monte Carlo-based approach [17], and applications to computational micromechanics [18]. In this paper, we follow the formulation of Xiao et al. [19] in developing a multi-fidelity CoKriging framework due to its simplicity and the relaxation of the nested requirement, compared to Le Gratiet and Garnier [8] and Perdikaris et al. [12].

Assuming that the prediction at highest level of fidelity, i.e., level s, can be written as an auto-regressive model [19],
$fs(x)=∑t=1s−1ρtft(x)+δ(x)$
(8)
where s is the high-fidelity level, the remaining (s − 1) levels correspond to the low-fidelity levels, and ρt's are the scaling factors. Two important assumptions are typically made. First, δ(x) is assumed to be independent of ft(x), i.e.,
$Cov[ft(x),δ(x)]=0,t=1,…,s−1$
(9)
Second, we assume that (s − 1) low-fidelity levels are uncorrelated, i.e.,
$Cov[fi(x),fj(x)]=0,1≤i≠j≤s−1$
(10)
Then, the covariance matrix for s levels of fidelity is given by
$K=(σ12K1(X1,X1)0⋯ρ1σ12K1(X1,Xe)0σ22K2(X2,X2)⋯ρ2σ22K2(X2,Xe)⋮⋮⋱⋮ρ1σ12K1(Xe,X1)ρ2σ22K2(Xe,X2)∑t=1sρt2σt2Kt(Xe,Xe)+σd2Ke(Xe,Xe))$
(11)
where σt is the intrinsic variance of noisy observations at the tth level of fidelity. The hyper-parameters ${θt}t=1s$ are obtained by optimizing the maximum likelihood function,
$logp(yt|x1:nt,θt)=−nt2log(2π)−12log|Ktθt+σ2I|$
(12)
whereas the hyper-parameters θδ that corresponds to the discrepancy are obtained by maximizing the likelihood function
$logp(yδ|x1:ns,θδ)=−ns2log(2π)−12log|Kδθδ(Xs,Xs)+σ2I|−12(δ−mθδ)T(Kθδ+σ2I)−1(δ−mθδ)$
(13)
The coeficients, ${ρt}t=1s−1$, are obtained by maximizing
$−ns2log(2π)−12log|Kδθδ(Xs,Xs)+σ2I|$
(14)

If the number of fidelity levels are two (s = 2), then the conventional bi-fidelity CoKriging framework is conveniently recovered from the multi-fidelity CoKriging. The dataset $D$ is divided into $Dc$ and $De$ corresponding to cheap and expensive datasets, respectively. The bi-fidelity formulation is closely related to Couckuyt et al. [2022] and Forrester et al. [23]. Following the auto-regressive scheme described above, the first GP models the low-fidelity response {xc, yc}, whereas the second GP models the discrepancy between the high- and low-fidelity model δ(x).

The correlation vector k(x) and the covariance matrix K(x) are then updated [19,21] as
$k(x)=(ρ⋅σc2⋅kc(x)ρ⋅σ2⋅kc(x,X))$
(15)
$K=[σc2⋅Kcρ⋅σc2⋅Kc(Xc,Xe)ρ⋅σc2⋅Kc(Xe,Xc)ρ2⋅σc2⋅Kc(Xe,Xe)+σd2⋅Ke(Xe,Xe)]$
(16)
The hyper-parameters for the low-fidelity level θc are obtained by maximizing the likelihood function at the lower-fidelity level,
$logp(yc|xnc,θc)=−n2log(2π)−12log|Kcθc+σc2I|.$
(17)
The hyper-parameters for the high-fidelity level θe are obtained along with ρ, again by maximizing the likelihood function,
$logp(ye|xne,θe)=−n2log(2π)−12log|Keθ+σe2I|−12(y−mθe)T(Keθe+σ2I)−1(y−mθe)$
(18)
The predicted distribution of CoKriging is also characterized by a Gaussian distribution, where the posterior mean and posterior variance are given by Eqs. (5) and (6), respectively.

### 2.3 Acquisition Function.

In the traditional BO method, the next sampling location is determined by maximizing an acquisition function, i.e.,
$x*=argmaxx∈Xa(x)$
(19)
where a(x) denotes the acquisition function and x* is the next sampling location. The acquisition function is deeply connected to the utility function, which corresponds to the rewarding scheme for BO methods, if the next sampling point outperforms the other sampling locations.

Three acquisition functions are widely used: the probability of improvement (PI), the expected improvement (EI), and the upper-confident bounds (UCB), but other forms also exist, for example, entropy-based approaches, GP-PES [2426], GP-ES [27], GP-EST [28], and GP-EPS [29].

The PI acquisition function [30] is defined as
$aPI(x;{xi,yi}i=1n,θ)=Φ(γ(x))$
(20)
where
$γ(x)=μ(x;{xi,yi}i=1n,θ)−f(xbest)σ(x;{xi,yi}i=1n,θ)$
(21)
indicates the deviation away from the best sample. The PI acquisition function is constructed based on the idea of binary utility function, where a unit reward is received if a new best-so-far sample is found and zero otherwise.
The EI acquisition function [3134] is defined as
$aEI(x;{xi,yi}i=1n,θ)=σ(x;{xi,yi}i=1n,θ)⋅(γ(x)Φ(γ(x))+Φ(γ(x))$
(22)
The EI acquisition is constructed based on an improvement utility function, where the reward is the relative difierence if a new best-so-far sample is found and zero otherwise.
The UCB acquisition function [3537] is defined as
$aUCB(x;{xi,yi}i=1n,θ)=μ(x;{xi,yi}i=1n,θ)+κσ(x;{xi,yi}i=1n,θ)$
(23)
where κ is a hyper-parameter describing the acquisition exploitation-exploration balance. We adopt the κ computation from Daniel et al. [37], which is based on Srinivas et al. [35,36], instead of fixing κ as a constant.

## 3 Methodology

In this section, we describe the sMF-BO-2CoGP method solving the multi-fidelity optimization problem in Sec. 2.

### 3.1 Constraints.

We adopt the method from our previous work [3840] to handle the known and unknown constraints. For known constraints, where the sampling location is known to be infeasible without running any functional evaluation, the acquisition function is penalized by setting it to zero. The penalization scheme is equivalent with multiplying the acquisition function a(x) with another indicator function $I(x)$, where
$I(x)={1,if∀j(1≤j≤J):gj(x)≤00,if∃j(1≤j≤J):gj(x)>0$
(24)
The indicator function can be easily implemented by iterating through the known constraints.

To handle the unknown-constrained problem, an external binary probabilistic classifier is employed to predict the probability of feasibility. Practically speaking, the approach employed to approximate the binary classifier for feasibility is up to users. Some examples are k-NN [41], AdaBoost [42], RandomForest [43], support vector machine [44], least squares support vector machine (LSSVM) [45], GP [46], and convolutional neural network [47]. One notable choice for the binary classifier is the GP classifier, which performs relatively well when compared with other binary classifiers. In sMF-BO-2CoGP, another CoGP is adopted as a binary classifier to predict the probability of feasibility of the sampling location considered.

At an unknown sampling location x, the coupled binary classifier predicts a probability of feasibility based on the trained dataset, where the probability of being feasible is Pr(clf(x) = 1), whereas the probability of being infeasible is Pr(clf(x) = 0) = 1 − Pr(clf(x) = 1). Again, we condition the sampling point on this predicted probability mass function by assigning zero value to the probability of being infeasible. Taking the expectation of the acquisition function conditioned on this probability mass function results in a new acquisition function, which can be rewritten in a product form as
$a*(x)=a(x)⋅I(x)⋅Pr(clf(x)=1)$
(25)
Maximizing the new acquisition function a*(x) yields the next sampling location of sMF-BO-2CoGP:
$x*=argmaxx∈Xa*(x)=argmaxx∈Xa(x)⋅I(x)⋅Pr(clf(x)=1)$
(26)
In practice, we adopt the covariance matrix adaptation evolution strategy (CMA-ES) from Hansen et al. [48,49] to maximize the new acquisition function a*(x).

### 3.2 Fidelity Selection Criteria.

In this section, we propose the fidelity selection criteria for the multi-fidelity frameworks. The computational cost as well as the reduction of uncertainty are used as the two factors to determine the fidelity level at which the evaluation will be performed.

To determine the level of fidelity in evaluating the new sampling location, a fidelity selection criteria balancing the computational cost and integrated mean-squared error (IMSE) reduction is utilized based on a one-step hallucination. The process of hallucination is adopted from our previous work [38]. For the sake of completeness, the process is summarized here.

The CoKriging is hallucinated at a point x* if the observation, i.e., training data, is assumed to be exactly the same with the GP posterior mean prediction temporarily. The CoKriging posterior distribution is then updated accordingly based on the assumption. The posterior variance σ2(x*) at x* is σ2 for the posterior prediction (in particular, σ2(x*) at x* is 0 for deterministic functional evaluation). Then, if the sampling point x* is feasible, the GP is updated with the true observation, instead of the posterior GP prediction. If the sampling point x* is infeasible with respect to unknown constraints, then the hallucination process will take place for every iteration at x*.

If t = 1, …, s are the s levels of fidelity, then the optimal fidelity level t* to perform the functional evaluation is determined by
$t*=argmint(IMSEt,hallucinated⋅Ct)=argmint(∫Xσ2(x)dx⋅Ct)$
(27)
where IMSE is the integrated mean-square error and the computational cost at level t is denoted as Ct. The term $(IMSEt,hallucinated⋅Ct)$ quantifies the performance of querying at level t of fidelity, which is measured as a product between the estimated IMSE and the computational cost. The integrated mean-square error, IMSE, is calculated as
$IMSE=∫Xσ2(x)dx$
(28)
where the posterior variance field σ2(x) is hallucinated at the sampling location x*, i.e., assuming that y(x*) = μ(x*). The optimal level t* corresponding with the optimal product $(IMSEt,hallucinated⋅Ct)$ as a measure of cost and efiectiveness is selected to query the model.

Additionally, to promote the functional evaluation at highest fidelity level, i.e., level t = s, we choose the highest fidelity data (instead of t = t*) if the ratio of number of training data points is larger than the computational cost ratio since the goal is to optimize at the highest level of fidelity t = s. In particular, let t* be the optimal level of fidelity to query for the next sampling point, and $|D(t)|$ be the cardinality of the training dataset at level t of fidelity (i.e., the number of training data points at level t) and Ct be the computational cost at level t. We compare two quantities, $(Ct*⋅|D(t*)|)$ and $(Cs⋅|D(s)|)$. If $Ct*⋅|D(t*)|≥Cs⋅|D(s)|$, which means some of the computational cost building $D(t*)$ could be traded for building $D(s)$ (which is consistent with the policy of promoting evaluation at highest fidelity level s), then level s, i.e., the highest level of fidelity is chosen instead of t*.

For the case of bi-fidelity, the criteria selection is obtained by restricting the multi-fidelity in Eq. (27) to the bi-fidelity settings. We compare the measure of the high-fidelity level $(IMSEh,hallucinated⋅Ch)$ and that of the low-fidelity level $(IMSEl,hallucinated⋅Cl)$. For the sake of convenience, we define afidelity ratio of measure at the high-fidelity level to that of low-fidelity as
$afidelity:=IMSEh,hallucinatedIMSEl,hallucinated⋅ChCl$
(29)
where Ch and Cl are the computational costs at the high- and low-fidelity levels, respectively. If afidelity ≤ 1, then the function evaluator is called at the high-fidelity level, whereas if afidelity > 1, then the function is evaluated at the low-fidelity level. The proposed fidelity selection criteria defined in Eq. (29) determines the trade-ofi between running at low-fidelity and high-fidelity levels. If the high-fidelity return is higher than the low-fidelity, then the high-fidelity level is chosen, and vice versa.

In practice, to promote the high-fidelity evaluations, a hard condition is proposed to prevent the imbalance between low- and high-fidelity data sets based on the comparison between the number of data points available and the relative computational cost between high- and low-fidelity data. If the ratio of low-to-high-fidelity data points is higher, the relative computational cost, then the high-fidelity level will be chosen to evaluate the sampling locations. The IMSE is computed by Monte Carlo sampling in high-dimensional space. It is noted that if the relative computational cost between the high- and low-fidelity is 1, then fidelity criteria selection always promotes evaluating the sampling data point at the high-fidelity level.

## 4 Applications

In this section, we demonstrate the proposed sMF-BO-2CoGP through several analytical functions in Sec. 4.1, including 1d Forrester function [23] and a subset of benchmark functions from Kandasamy et al. [50] and Xiong et al. [51], including 2d Currin exponential function (two levels of fidelity), 8d borehole function (two and three levels of fidelity), welded beam design problem (four levels of fidelity), and an 11d real-world engineering application (two levels of fidelity) in designing fiip-chip package (Sec. 4.6). Some implementations are adopted from Surjanovic and Bingham [52]. In all the benchmark of difierent acquisition functions, the computational cost between the high- and low-fidelity model is fixed at 2.50.

### 4.1 Forrester Function (1d) With Two Fidelity Levels.

In this section, we consider the Forrester function [23], where x ∈ [0, 1], the low-fidelity function is
$fL(x)=0.5(6x−2)2sin(12x−4)+10(x−0.5)−5$
(30)
and the high-fidelity function is
$fH(x)=(6x−2)2sin(12x−4)$
(31)

First, consider a baseline set of four low-fidelity and two high-fidelity data points. We compare the efiects of adding low- and high-fidelity observations on the prediction of CoKriging. Figure 2 shows the comparison between the posterior mean μ(x) and posterior variance σ2(x) between adding four more low-fidelity and two more high-fidelity data points. The common low-fidelity data points are denoted as blue squares, the common high-fidelity data points are denoted as black diamonds, and the added data points are denoted as red circles (low-fidelity for Figs. 1(a) and 2(a); high-fidelity for Figs. 1(b) and 2(b)).

Fig. 1
Fig. 1
Close modal

For the low-fidelity level, Figs. 1(a) and 2(a) show the updated posterior mean μ(x) and posterior variance σ2(x) after four more low-fidelity data points are added, respectively. The posterior mean μ(x) prediction slightly improves near the end of the domain x = 1, but does not improve significantly near the other end of the domain x = 0 (Fig. 1(a)). The posterior variance σ2(x) slightly reduces at the location where the low-fidelity data points are added.

Fig. 2
Fig. 2
Close modal

For the high-fidelity level, Figs. 1(b) and 2(b) show the updated posterior mean μ(x) and posterior variance σ2(x) after two more high-fidelity data points are added, respectively. The posterior mean μ(x) improves as expected, as shown in Fig. 1(b). The posterior variance σ2(x) reduces to zero for noiseless evaluations at the two added sampling locations.

Next, we test the numerical implementation of the sMF-BO-2CoGP method by considering the minimization problem $argminfH(x)$ with no constraint and various computational relative cost ratio between the high- and low-fidelity levels. Figures 3(a) and 3(b) show the convergence plot with respect to iterations and total computation cost, respectively. The case where the relative cost ratio is 1.0 serves as a benchmark for traditional sequential BO using only high-fidelity. We verified that when the relative cost ratio is 1.0, all the evaluations are evaluated only at high-fidelity level. When the relative cost ratio is higher than 1.0, the sMF-BO-2CoGP selects the fidelity criteria on-the-fiy, using the fidelity criteria selection described above. It is worth noting that Fig. 3(a) only shows the convergence plot at high-fidelity level. That means, the convergence plot only updates when a better high-fidelity result is available. The numerical performance at high-fidelity level of the multi-fidelity sMF-BO-2CoGP framework degrades when the computational cost ratio increases, because more low-fidelity points are selected at high computational cost ratio, according to Eq. (29).

Fig. 3
Fig. 3
Close modal

As shown in Fig. 3(a), when the computational cost ratio is 1.0, the sMF-BO-2CoGP converges to a sequential BO with high-fidelity and is the fastest with respect to the number of iterations. Figure 3(b) shows comparable performances between the cases of ratio 1.0 and 2.5, where the performance degrades when the computational cost ratio increases. However, they all converge after approximately seven iterations.

In this example, we consider an initial sampling data set comprised of four low-fidelity and two high-fidelity data points. The numerical performances are expected to change with difierent initial samples, as well as the behavior of high- and low-fidelity models.

Figure 4 shows the convergence plot of 1d Forrester function. Five trial runs are performed with difierent initial samples. Bands are covered by the lower and upper bounds at a fixed iteration with respect to difierent trial runs. Solid lines denote the mean objective function at a fixed iteration. The EI acquisition function is denoted with blue circles and blue band. The UCB acquisition function is denoted with red crosses and red band. The PI acquisition function is denoted with green squares and green band. The UCB acquisition function converges slowly at the beginning, but outperforms the EI acquisition function later on. The PI acquisition function does not perform very well. In a case of UCB, five high-fidelity and three low-fidelity evaluations are performed to achieve convergence.

Fig. 4
Fig. 4
Close modal

### 4.2 Currin Exponential Function (2d) With Two Fidelity Levels.

We adopted the multi-fidelity Currin functions, where the high- and low-fidelity functions are written as, respectively,
$fH(x)=[1−exp(−12x2)]2300x13+1900x12+2092x1+60100x13+500x12+4x1+20$
(32)
$fL(x)=14[fH(x1+0.05,x2+0.05)+fH(x1+0.05,max(0,x2−0.05))]+14[fH(x1−0.05,x2+0.05)+fH(x1−0.05,max(0,x2−0.05))]$
(33)
where the domain is $X=[0,1]×[0,1]$.

Figure 5 shows the convergence plot of 2d Currin function. The explanation of the plots is similar to the case of 1d Forrester function. In this example, the UCB acquisition function outperforms both the EI and PI acquisition functions. The PI acquisition function performs poorly. In a case of EI, 16 high-fidelity and 14 low-fidelity evaluations are performed to achieve convergence.

Fig. 5
Fig. 5
Close modal

### 4.3 Welded Beam Design Problem (4d) With Four Fidelity Levels.

In this example, we adopted the welded beam design problem from one of our previous work [39] illustrated in Fig. 6. The objective function f is calculated as
$f(w,m,h,l,t,b)=(1+C1)(wt+l)h2+C2tb(L+l)$
(34)
subject to five known inequality constraints,
$shearstress(τ):g1=0.577σd−τ(x)≥0$
(35)
$bendingstressinthebeam(σ):g2=σd−σ(x)≥0$
(36)
$bucklingloadonthebar(Pc):g3=b−h≥0$
(37)
$deflectionofthebeam:g4=Pc(x)−F≥0$
(38)
$sideconstraints:g5=δmax−δ(x)≥0$
(39)
where
$σ(x)=6FLt2b$
(40)
$δ(x)=4FL3Et3b$
(41)
$Pc(x)=4.013tb3EG6L2(1−t4LEG)$
(42)
$τ=(τ′)2+(τ′′)2+2τ′τ′′cosθ$
(43)
$τ′=FA$
(44)
$τ′′=F(L+0.5l)RJ$
(45)
$w=0:{A=2hlJ=2hl[(h+t)24+l212]R=12l2+(h+t)2cosθ=l2R$
(46)
$w=1:{A=2h(t+l)J=2hl[(h+t)24+l212]+2ht[(h+l)24+t212]R=max{12l2+(h+t)2,12t2+(h+l)2}cosθ=l2R$
(47)
C1(m), C2(m), σd(m), E(m), and G(m) are parameters that depend on the bulk materials m, as listed in Table 1. The lower and upper bounds of the problem are 0.0625 ≤ h ≤ 2, 0.1 ≤ l ≤ 10, 2.0 ≤ t ≤ 20.0, and 0.0625 ≤ b ≤ 2.0. m ∈ {1, 2, 3, 4} encodes the bulk materials as steel, cast iron, aluminum, and brass, respectively.
Fig. 6
Fig. 6
Close modal
Table 1

Material-dependent parameters and constants in the welded beam design problem

ConstantsDescriptionSteelCast ironAluminumBrass
C1Cost per volume of the welded material ($/in3)0.10470.04890.52350.5584 C2Cost per volume of the bar stock ($/in3)0.04810.02240.24050.2566
σdDesign normal stress of the bar material (psi)30 · 1038 · 1035 · 1038 · 103
EYoung's modulus of bar stock (psi)30 · 10614 · 10610 · 10616 · 106
GShear modulus of bar stock (psi)12 · 1066 · 1064 · 1066 · 106
ConstantsDescriptionSteelCast ironAluminumBrass
C1Cost per volume of the welded material ($/in3)0.10470.04890.52350.5584 C2Cost per volume of the bar stock ($/in3)0.04810.02240.24050.2566
σdDesign normal stress of the bar material (psi)30 · 1038 · 1035 · 1038 · 103
EYoung's modulus of bar stock (psi)30 · 10614 · 10610 · 10616 · 106
GShear modulus of bar stock (psi)12 · 1066 · 1064 · 1066 · 106

The goal is to minimize the objective function f(w, m, h, l, t, b), which is the estimated cost. The physical meaning of the input parameters is as follows. h is the thickness of the weld; l is the length of the welded join, t is the width of the beam; b is the thickness of the beam. m is a discrete variable enumerating the bulk material of the beam, which can be steel, cast iron, aluminum, or brass; and w is a binary variable to model the type of weld: w = 0 for two-sided welding and w = 1 for four-sided welding. Difierent bulk materials are associated with difierent materials property, as described in Table 1. The fixed design parameters of the beam are L = 14 inch, δmax = 0.25 in., and F = 6,000 lb.

Compared to the example [39], here, we fix w = 0, and consider four levels of fidelity. The high-fidelity function and three low-fidelity functions are
$fH(x)=f(w=0,m=1,h,l,t,b)$
(48)
$fL1(x)=f(w=0,m=2,h,l,t,b)$
(49)
$fL2(x)=f(w=0,m=3,h,l,t,b)$
(50)
$fL3(x)=f(w=0,m=4,h,l,t,b)$
(51)
We wish to minimize the cost of using steel as the bulk material at the high-fidelity level, where the low-fidelity functions $fL1(x)$, $fL2(x)$, and $fL3(x)$ are used to estimate the cost of using cast iron, aluminum, and brass, respectively. The computational cost of the high-fidelity level is 2.5 times higher than that of low-fidelity levels.

Figure 7 shows the convergence plot of the welded beam design problem with four levels of fidelity. In this example, the UCB and EI acquisition functions perform on par with each other and outperform the PI acquisition functions. In a case of EI, six high-fidelity and 14 low-fidelity evaluations are performed to achieve convergence.

Fig. 7
Fig. 7
Close modal

### 4.4 Borehole Function (8d) With Two Fidelity Levels.

In this example, we adopted the multi-fidelity borehole function from Xiong et al. [51], where two fidelity levels are considered. The high- and low-fidelity functions are, respectively,
$fH(x)=2πx3(x4−x6)log(x2/x1)(1+2x7x3log(x2/x1)x12x8+x3x5)$
(52)
$fL(x)=5x3(x4−x6)log(x2/x1)(1.5+2x7x3log(x2/x1)x12x8+x3x5)$
(53)
The domain of the 8d borehole function is x1 ∈ [0.05, 0.15], x2 ∈ [100, 50,000], x3 ∈ [63,070, 115, 600], x4 ∈ [990, 1110], x5 ∈ [63.1, 116], x6 ∈ [700, 820], x7 ∈ [1120, 1680], and x8 ∈ [9855, 12,045].

Figure 8 shows the convergence plot of 8d borehole function. The explanation of the plots is similar to the case of 1d Forrester function. In this example, the EI acquisition function outperforms both the UCB and PI acquisition functions, where the PI and UCB acquisition functions perform on-par with each other. In the case of EI, 22 high-fidelity and 53 low-fidelity evaluations are performed to achieve convergence.

Fig. 8
Fig. 8
Close modal

### 4.5 Borehole Function (8d) With Three Fidelity Levels.

We further modify and extend the previous 8d borehole function in the previous example with three levels of fidelity. The high-fidelity and low-fidelity functions are described as
$fH(x)=2πx3(x4−x6)log(x2/x1)(1+2x7x3log(x2/x1)x12x8+x3x5)$
(54)
$fL1(x)=5x3(x4−x6)log(x2/x1)(1.5+2x7x3log(x2/x1)x12x8+x3x5)$
(55)
$fL2(x)=7x3(x4−x6)log(x2/x1)(0.5+2x7x3log(x2/x1)x12x8+x3x5)$
(56)

It is obvious to see that $fL1(x)≤fH(x)≤fL2(x)$; thus, the high-fidelity function is bounded between two low-fidelity functions. Figure 9 shows the convergence plot of the 8d borehole function with three levels of fidelity. In this example, all the acquisition functions, including EI, UCB, and PI perform on par with each other, almost throughout the optimization process. In the case of PI, 28 high-fidelity and 82 low-fidelity evaluations are performed to achieve convergence.

Fig. 9
Fig. 9
Close modal

### 4.6 Flip-Chip Package Design (11d) With Two Fidelity Levels.

In this section, we demonstrate the design application of a fiip-chip package using the proposed sMF-BO-2CoGP, where the details of development and implementation are fully described in our previous work [54]. A lidless fiip-chip package with a monolithic silicon die (FCBGA) mounted on a printed circuit board (PCB) with a stifiener ring is considered in this example. The computational model is constructed based on a 2.5D, half symmetry to reduce the computational time.

Figure 10 shows the geometric model of the thermomechanical finite element model (FEM), where the mesh density varies for difierent levels of fidelity. Two design variables are associated with the die, three are associated with the substrate, three more are associated with the stifiener ring, two are with the underfill, and the last one is with the PCB board. Only two levels of fidelity are considered in this example. Table 2 show the design variables, the physical meaning of the design variables, as well as their lower and upper bounds in this case study.

Fig. 10
Fig. 10
Close modal
Table 2

Design variables for the FCBGA design optimization

VariableDesign partLower boundUpper boundOptimal value
x1Die20,00030,00020,702
x2Die300750320
x3Substrate30,00040,00035,539
x4Substrate10018001614
x5Substrate10 · 10−617 · 10−617 · 10−6
x6Stifiener ring200060004126
x7Stifiener ring10025001646
x8Stifiener ring8 · 10−625 · 10−68.94 · 10−6
x9Underfill1.03.01.52
x10Underfill0.51.00.804
x11PCB board12.0 · 10−616.7 · 10−616.7 · 10−6
VariableDesign partLower boundUpper boundOptimal value
x1Die20,00030,00020,702
x2Die300750320
x3Substrate30,00040,00035,539
x4Substrate10018001614
x5Substrate10 · 10−617 · 10−617 · 10−6
x6Stifiener ring200060004126
x7Stifiener ring10025001646
x8Stifiener ring8 · 10−625 · 10−68.94 · 10−6
x9Underfill1.03.01.52
x10Underfill0.51.00.804
x11PCB board12.0 · 10−616.7 · 10−616.7 · 10−6

After the numerical solution is obtained, the component warpage at 20 °C, 200 °C, and the strain energy density of the furthest solder joint are calculated. The strain energy density is one of accurate predictors to estimate the fatigue life of the solder joints during thermal cycling [55].

A vectorized 11-dimensional input is used to parameterize the design. Nine low-fidelity and three high-fidelity data points are used as initial samples. It is noted that not all of the initial samples are feasible. There are some unknown constraints, but no known constraint is imposed in this example. We consider that the sampling locations where the FEM solutions diverge are infeasible. This condition can be regarded as an unknown constraint, because no prior knowledge regarding divergence is known beforehand but only after the simulation is finished. ansys parametric design language (APDL) software is used to evaluate the model in the batch mode with no graphical user interface. The sMF-BO-2CoGP is implemented in matlab, where an interface using python is devised to communicate with the APDL FEM model. The average computational time for one iteration is approximately 0.4 CPU h.

Figure 11 presents the convergence plot of the FCBGA design optimization, where the feasible sampling points are plotted as blue circles, whereas the infeasible sampling points are plotted as red squares. The EI acquisition function is used in this example to locate the next sampling point. It is observed that the predicted warpage is converging steadily. The numerical solver fails to converge on many cases. It has also demonstrated that the proposed sMF-BO-2CoGP is robust against diverging simulations, by its convergent objective despite numerous failed cases.

Fig. 11
Fig. 11
Close modal

The optimization results are relatively close to designs commonly used in the microelectronics packing industry. It is observed that thin and small die, as well as thick substrate, are suggested in order to minimize the component warpage.

## 5 Discussion

The main contribution of this work is the proposal of the fidelity selection criteria. The criteria is inspired by the work of Huang et al. [56], where the original criteria is proposed based on the EI acquisition function as
$EI(x,l)=EIm(x)$
(57)
$×Corr(flp(x),fmp(x))$
(58)
$×(1−σε,lsl2(x)+σε,l2)$
(59)
$×CmCl$
(60)
where m is an arbitrary level of fidelity and l is the highest level of fidelity. In this scheme, after each point is nominated at a level of fidelity, a unique sampling point is chosen by looping over all the levels. The uncertainty reduction is measured in the second term of the above equation, $(1−σε,l/sl2(x)+σε,l2)$. In our scheme, the uncertainty is measured by IMSEh,hallucinated/IMSEl,hallucinated in Eq. (29). One advantage of the proposed criteria is that it truly estimates the reduction of uncertainty at a particular level. While the uncertainty could be measured by the maximum σ2(x) for $x∈X$ for the uncertainty reduction, the maximal location is often found on the border of the bounded domain. Another advantage of the proposed criteria is that it removes the restriction of using EI acquisition and generalizes to any arbitrary acquisition function. The choice of the acquisition function is left to users as a choice. Previous work by Gauthier et al. [57,58] and Silvestrini et al. [59] have shown that the performance of IMSE supersedes the performance of maximal MSE. The scheme proposed by Huang et al. [56] in Eq. (57) can be further generalized to some other commonly used acquisition functions, such as PI and UCB. Furthermore, multiple acquisition functions can be considered simultaneously based on their performance, as in GP-Hedge scheme [60].

In the implementation, the CMA-ES framework is adopted to maximize the acquisition function a*(x). For computationally expensive high-fidelity simulations, the CMA-ES parameters must be tuned to search carefully with multiple restarts to avoid local minima. In practice, optimizing the acquisition function takes some amount of time, thus it also reduces the eficiency of the method. However, it has been rarely discussed in the literature, and there has not been much work dedicated to benchmarking and quantifying the computational cost of this process. For batch-sequential parallel BO approaches, the computational cost is much more severe, particularly with simulations that are associated with large infeasible space.

The use of the probabilistic binary classifier to learn and distinguish feasible and infeasible region also depends many factors of the problems. Essentially, the classifier needs to accurately predict the feasibility before the optimal point is obtained. This depends largely on the dimensionality of the problem considered. However, once the feasibility is accurately predicted through Eq. (25), the convergence to the global optimal point is guaranteed through the classical BO framework. The analytical convergence rate can be found in the seminal work of Rasmussen [46].

BO is indeed a fiexible framework that allows for numerous possible extensions in engineering domains [61]. One of those is multi-fidelity, which is studied in this paper. Other extensions include batch-sequential parallel sampling [38], asynchronously parallel sampling [40], mixed-integer optimization with a small number of discrete/categorical variables [39], latent variable model [62]. More sophisticated GP models, including local GP, [63,64], sparse GP [65], heteroscedastic [66], and even deep learning [67], have been developed to widen the range of multi-disciplinary applications. This area remains active for further research.

While the proposed sequential multi-fidelity sMF-BO-2CoGP aims at improving the eficiency compared to the sequential high-fidelity BO, the eficiency can be further improved by performing parallel optimization. That is to sample multiple locations concurrently (i.e., at the same time) and asynchronously (i.e., sampling points do not have to wait for others to complete). The proposed multi-fidelity framework serves as a foundation work to tackle the constrained multi-fidelity problem in an asynchronously parallel manner. The research question remains open and poses as a potential future work.

## 6 Conclusion

In this paper, a sequential multi-fidelity BO optimization, called sMF-BO-2CoGP, is proposed to solve the constrained simulation-based optimization problem. A fidelity selection criteria is proposed to determine the level of fidelity for evaluating the objective function value. Another CoKriging model is coupled into the method to classify the next sampling point and distinguish between feasible and infeasible regions.

The proposed sMF-BO-2CoGP method is demonstrated using a simple analytic 1D example, as well as an engineering thermomechanical FEM for fiip-chip package design optimization. The preliminary results provided in this study demonstrates the applicability of the proposed sMF-BO-2CoGP method.

## Acknowledgment

This research was supported by the U.S. Department of Energy, Ofice of Science, Early Career Research Program, under award 17020246 and the Sandia LDRD program. The views expressed in the article do not necessarily represent the views of the U.S. Department of Energy or the United States Government. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA-0003525.

## References

1.
Digabel
,
S. L.
, and
Wild
,
S. M.
,
2015
, “
A Taxonomy of Constraints in Simulation-Based Optimization
,” .
2.
Kennedy
,
M. C.
, and
O'Hagan
,
A.
,
2000
,
Predicting the Output From a Complex Computer Code When Fast Approximations Are Available
,”
Biometrika
,
87
(
1
), pp.
1
13
. 10.1093/biomet/87.1.1
3.
Tran
,
A.
,
Wildey
,
T.
, and
McCann
,
S.
,
2019
, “
sBF-BO-2CoGP: A Sequential Bi-Fidelity Constrained Bayesian Optimization for Design Applications
,”
Proceedings of the ASME 2019 IDETC/CIE, Volume 1: 39th Computers and Information in Engineering Conference of International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Anaheim, CA
,
Aug. 18–21
,
American Society of Mechanical Engineers
, p.
V001T02A073
.
4.
Brochu
,
E.
,
Cora
,
V. M.
, and
De Freitas
,
N.
,
2010
, “
A Tutorial on Bayesian Optimization of Expensive Cost Functions, With Application to Active User Modeling and Hierarchical Reinforcement Learning
”, .
5.
Shahriari
,
B.
,
Swersky
,
K.
,
Wang
,
Z.
,
,
R. P.
, and
de Freitas
,
N.
,
2016
, “
Taking the Human out of the Loop: A Review of Bayesian Optimization
,”
Proc. IEEE
,
104
(
1
), pp.
148
175
. 10.1109/JPROC.2015.2494218
6.
Frazier
,
P. I.
,
2018
, “
A Tutorial on Bayesian Optimization
,” .
7.
Jones
,
D. R.
,
Schonlau
,
M.
, and
Welch
,
W. J.
,
1998
, “
Eficient Global Optimization of Expensive Black-Box Functions
,”
J. Global Optim.
,
13
(
4
), pp.
455
492
. 10.1023/A:1008306431147
8.
Le Gratiet
,
L.
, and
Garnier
,
J.
,
2014
, “
Recursive Co-Kriging Model for Design of Computer Experiments With Multiple Levels of Fidelity
,”
Int. J. Uncertainty Quantif.
,
4
(
5
), pp.
365
386
. 10.1615/Int.J.UncertaintyQuantification.2014006914
9.
Raissi
,
M.
, and
,
G.
,
2016
, “
Deep Multi-Fidelity Gaussian Processes
,” .
10.
Raissi
,
M.
,
Perdikaris
,
P.
, and
,
G. E.
,
2017
, “
Machine Learning of Linear Difierential Equations Using Gaussian Processes
,”
J. Comput. Phys.
,
348
, pp.
683
693
. 10.1016/j.jcp.2017.07.050
11.
Perdikaris
,
P.
,
Venturi
,
D.
,
Royset
,
J.
, and
,
G.
,
2015
, “
Multi-Fidelity Modelling Via Recursive Co-Kriging and Gaussian{Markov Random Fields
,”
Proc. R. Soc. A
, Vol.
471
,
The Royal Society
, p.
20150018
.
12.
Perdikaris
,
P.
,
Raissi
,
M.
,
Damianou
,
A.
,
Lawrence
,
N.
, and
,
G. E.
,
2017
, “
Nonlinear Information Fusion Algorithms for Data-Eficient Multi-Fidelity Modelling
,”
Proc. R. Soc. A: Math., Phys. Eng. Sci.
,
473
(
2198
), p.
20160751
. 10.1098/rspa.2016.0751
13.
Poloczek
,
M.
,
Wang
,
J.
, and
Frazier
,
P.
,
2017
, “
Multi-Information Source Optimization
,”
Advances in Neural Information Processing Systems
, pp.
4288
4298
.
14.
Swersky
,
K.
,
Snoek
,
J.
, and
,
R. P.
,
2013
, “
,”
Advances in Neural Information Processing Systems
, pp.
2004
2012
.
15.
Ghoreishi
,
S. F.
, and
Allaire
,
D.
,
2019
, “
Multi-Information Source Constrained Bayesian Optimization
,”
Struct. Multidiscip. Optim.
,
59
(
3
), pp.
977
991
. 10.1007/s00158-018-2115-z
16.
Ghoreishi
,
S. F.
, and
Allaire
,
D. L.
,
2018
, “
A Fusion-Based Multi-Information Source Optimization Approach Using Knowledge Gradient Policies
,”
2018 AIAA/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference
, p.
1159
.
17.
Ghoreishi
,
S. F.
, and
Allaire
,
D. L.
,
2018
, “
Gaussian Process Regression for Bayesian Fusion of Multi-Fidelity Information Sources
,”
2018 Multidisciplinary Analysis and Optimization Conference
, p.
4176
.
18.
Ghoreishi
,
S. F.
,
Molkeri
,
A.
,
Srivastava
,
A.
,
Arroyave
,
R.
, and
Allaire
,
D.
,
2018
, “
Multi-Information Source Fusion and Optimization to Realize ICME: Application to Dual-Phase Materials
,”
ASME J. Mech. Des.
,
140
(
11
), p.
111409
. 10.1115/1.4041034
19.
Xiao
,
M.
,
Zhang
,
G.
,
Breitkopf
,
P.
,
Villon
,
P.
, and
Zhang
,
W.
,
2018
, “
Extended Co-Kriging Interpolation Method Based on Multi-Fidelity Data
,”
Appl. Math. Comput.
,
323
, pp.
120
131
.
20.
Couckuyt
,
I.
,
Forrester
,
A.
,
Gorissen
,
D.
,
De Turck
,
F.
, and
Dhaene
,
T.
,
2012
, “
Blind Kriging: Implementation and Performance Analysis
,”
,
49
(
1
), pp.
1
13
21.
Couckuyt
,
I.
,
Dhaene
,
T.
, and
Demeester
,
P.
,
2013
,
ooDACE Toolbox, A Matlab Kriging Toolbox: Getting Started
,
Universiteit Gent
,
Ghent, Belgium
, pp.
3
15
.
22.
Couckuyt
,
I.
,
Dhaene
,
T.
, and
Demeester
,
P.
,
2014
, “
ooDACE Toolbox: a Flexible Object-Oriented Kriging Implementation
,”
J. Mach. Learn. Res.
,
15
(
1
), pp.
3183
3186
.
23.
Forrester
,
A. I.
,
Sóbester
,
A.
, and
Keane
,
A. J.
,
2007
, “
Multi-Fidelity Optimization Via Surrogate Modelling
,”
Proc. R. Soc. London A: Math., Phys. Eng. Sci.
,
463
(
2088
), pp.
3251
3269
. 10.1098/rspa.2007.1900
24.
Hernández-Lobato
,
J. M.
,
Hofiman
,
M. W.
, and
Ghahramani
,
Z.
,
2014
, “
Predictive Entropy Search for Eficient Global Optimization of Black-Box Functions
,”
Advances in Neural Information Processing Systems
, pp.
918
926
.
25.
Hernández-Lobato
,
J. M.
,
Gelbart
,
M.
,
Hofiman
,
M.
,
,
R.
, and
Ghahramani
,
Z.
,
2015
, “
Predictive Entropy Search for Bayesian Optimization With Unknown Constraints
,”
International Conference on Machine Learning
, pp.
1699
1707
.
26.
Hernández-Lobato
,
D.
,
Hernández-Lobato
,
J.
,
Shah
,
A.
, and
,
R.
,
2016
, “
Predictive Entropy Search for Multi-Objective Bayesian Optimization
,”
International Conference on Machine Learning
, pp.
1492
1501
.
27.
Hennig
,
P.
, and
Schuler
,
C. J.
,
2012
, “
Entropy Search for Information-Eficient Global Optimization
,”
J. Mach. Learn. Res.
,
13
(
Jun
), pp.
1809
1837
.
28.
Wang
,
Z.
,
Zhou
,
B.
, and
Jegelka
,
S.
,
2016
, “
Optimization As Estimation With Gaussian Processes in Bandit Settings
,”
Artificial Intelligence and Statistics
, pp.
1022
1031
.
29.
Shahriari
,
B.
,
Wang
,
Z.
,
Hofiman
,
M. W.
,
Bouchard-Côté
,
A.
, and
de Freitas
,
N.
,
2014
, “
An Entropy Search Portfolio for Bayesian Optimization
,” .
30.
Kushner
,
H. J.
,
1964
, “
A New Method of Locating the Maximum Point of An Arbitrary Multipeak Curve in the Presence of Noise
,”
ASME J. Basic Eng.
,
86
(
1
), pp.
97
106
. 10.1115/1.3653121
31.
Mockus
,
J.
,
1975
, “
On Bayesian Methods for Seeking the Extremum
,”
Optimization Techniques IFIP Technical Conference
,
Springer
, pp.
400
404
.
32.
Mockus
,
J.
,
1982
, “The Bayesian Approach to Global Optimization,”
System Modeling and Optimization
,
M.
Hazewinkel
, ed.,
,
Dordrecht
, pp.
473
481
.
33.
Bull
,
A. D.
,
2011
, “
Convergence Rates of Eficient Global Optimization Algorithms
,”
J. Mach. Learn. Res.
,
12
(
Oct
), pp.
2879
2904
.
34.
Snoek
,
J.
,
Larochelle
,
H.
, and
,
R. P.
,
2012
, “
Practical Bayesian Optimization of Machine Learning Algorithms
,”
Advances in Neural Information Processing Systems
, pp.
2951
2959
.
35.
Srinivas
,
N.
,
Krause
,
A.
,
,
S. M.
, and
Seeger
,
M.
,
2009
, “
Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design
,” .
36.
Srinivas
,
N.
,
Krause
,
A.
,
,
S. M.
, and
Seeger
,
M. W.
,
2012
, “
Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting
,”
IEEE Trans. Inf. Theory
,
58
(
5
), pp.
3250
3265
. 10.1109/TIT.2011.2182033
37.
Daniel
,
C.
,
Viering
,
M.
,
Metz
,
J.
,
Kroemer
,
O.
, and
Peters
,
J.
,
2014
, “
Active Reward Learning
,”
Robotics: Science and Systems
,
Berkeley, CA
,
July 12–16
.
38.
Tran
,
A.
,
Sun
,
J.
,
Furlan
,
J. M.
,
Pagalthivarthi
,
K. V.
,
Visintainer
,
R. J.
, and
Wang
,
Y.
,
2019
, “
pBO-2GP-3B: A Batch Parallel Known/unknown Constrained Bayesian Optimization With Feasibility Classification and Its Applications in Computational Fluid Dynamics
,”
Comput. Methods Appl. Mech. Eng.
,
347
, pp.
827
852
. 10.1016/j.cma.2018.12.033
39.
Tran
,
A.
,
Tran
,
M.
, and
Wang
,
Y.
,
2019
, “
Constrained Mixed-Integer Gaussian Mixture Bayesian Optimization and Its Applications in Designing Fractal and Auxetic Metamaterials
,”
Struct. Multidiscip. Optim.
,
59
(
6
), pp.
2131
2154
.
40.
Tran
,
A.
,
Scott
,
M.
,
Furlan
,
J. M.
,
Pagalthivarthi
,
K. V.
,
Visintainer
,
R. J.
, and
Wildey
,
T.
,
2019
, “
aphBO-2GP-3B: A Budgeted Asynchronously-Parallel Multi-Acquisition for Known/Unknown Constrained Bayesian Optimization on High-Performing Computing Architecture
,”
Reliab. Eng. Syst. Saf.
, https://arxiv.org/abs/2003.09436
41.
Bentley
,
J. L.
,
1975
, “
Multidimensional Binary Search Trees Used for Associative Searching
,”
Commun. ACM
,
18
(
9
), pp.
509
517
. 10.1145/361002.361007
42.
Hastie
,
T.
,
Rosset
,
S.
,
Zhu
,
J.
, and
Zou
,
H.
,
2009
, “
,”
Stat. Interface
,
2
(
3
), pp.
349
360
. 10.4310/SII.2009.v2.n3.a8
43.
Breiman
,
L.
,
2001
, “
Random Forests
,”
Mach. Learn.
,
45
(
1
), pp.
5
32
. 10.1023/A:1010933404324
44.
Hearst
,
M. A.
,
Dumais
,
S. T.
,
Osuna
,
E.
,
Platt
,
J.
, and
Scholkopf
,
B.
,
1998
, “
Support Vector Machines
,”
IEEE Intell. Syst. Appl.
,
13
(
4
), pp.
18
28
. 10.1109/5254.708428
45.
Suykens
,
J. A.
, and
Vandewalle
,
J.
,
1999
, “
Least Squares Support Vector Machine Classifiers
,”
Neural Process. Lett.
,
9
(
3
), pp.
293
300
. 10.1023/A:1018628609742
46.
Rasmussen
,
C. E.
,
2004
, “Gaussian Processes in Machine Learning,”
,
Springer
, pp.
63
71
.
47.
LeCun
,
Y.
,
Bengio
,
Y.
, and
Hinton
,
G.
,
2015
, “
Deep Learning
,”
Nature
,
521
(
7553
), p.
436
. 10.1038/nature14539
48.
Hansen
,
N.
, and
Ostermeier
,
A.
,
2001
, “
Completely Derandomized Self-Adaptation in Evolution Strategies
,”
Evol. Comput.
,
9
(
2
), pp.
159
195
. 10.1162/106365601750190398
49.
Hansen
,
N.
,
Müller
,
S. D.
, and
Koumoutsakos
,
P.
,
2003
, “
Reducing the Time Complexity of the Derandomized Evolution Strategy With Covariance Matrix Adaptation (CMA-ES)
,”
Evol. Comput.
,
11
(
1
), pp.
1
18
. 10.1162/106365603321828970
50.
Kandasamy
,
K.
,
Dasarathy
,
G.
,
Schneider
,
J.
, and
Poczos
,
B.
,
2017
, “
Multi-Fidelity Bayesian Optimisation With Continuous Approximations
,”
Proceedings of the 34th International Conference on Machine Learning-Volume 70
, pp.
1799
1808
.
51.
Xiong
,
S.
,
Qian
,
P. Z.
, and
Wu
,
C. J.
,
2013
, “
Sequential Design and Analysis of High-Accuracy and Low-Accuracy Computer Codes
,”
Technometrics
,
55
(
1
), pp.
37
46
. 10.1080/00401706.2012.723572
52.
Surjanovic
,
S.
, and
Bingham
,
D.
,
Virtual Library of Simulation Experiments: Test Functions and Datasets
, https://www.sfu.ca/~ssurjano/optimization.html
53.
Datta
,
D.
, and
Figueira
,
J. R.
,
2011
, “
A Real-Integer-Discrete-Coded Particle Swarm Optimization for Design Problems
,”
Appl. Soft Comput.
,
11
(
4
), pp.
3625
3633
. 10.1016/j.asoc.2011.01.034
54.
McCann
,
S.
,
Ostrowicki
,
G. T.
,
Tran
,
A.
,
Huang
,
T.
,
Bernhard
,
T.
,
Tummala
,
R. R.
, and
Sitaraman
,
S. K.
,
2017
, “
Determination of Energy Release Rate Through Sequential Crack Extension
,”
ASME J. Electron. Packag.
,
139
(
4
), p.
041003
. 10.1115/1.4037334
55.
Darveaux
,
R.
,
2000
, “
Efiect of Simulation Methodology on Solder Joint Crack Growth Correlation
,”
Proceedings of the 50th Electronic Components & Technology Conference, IEEE
, pp.
1048
1058
.
56.
Huang
,
D.
,
Allen
,
T. T.
,
Notz
,
W. I.
, and
Miller
,
R. A.
,
2006
, “
Sequential Kriging Optimization Using Multiple-Fidelity Evaluations
,”
Struct. Multidiscip. Optim.
,
32
(
5
), pp.
369
382
. 10.1007/s00158-005-0587-0
57.
Gauthier
,
B.
, and
Pronzato
,
L.
,
2014
, “
Spectral Approximation of the IMSE Criterion for Optimal Designs in Kernel-Based Interpolation Models
,”
SIAM/ASA J. Uncertainty Quantif.
,
2
(
1
), pp.
805
825
. 10.1137/130928534
58.
Gauthier
,
B.
, and
Pronzato
,
L.
,
2017
, “
,”
Comput. Stat. Data Anal.
,
113
, pp.
375
394
. 10.1016/j.csda.2016.10.018
59.
Silvestrini
,
R. T.
,
Montgomery
,
D. C.
, and
Jones
,
B.
,
2013
, “
Comparing Computer Experiments for the Gaussian Process Model Using Integrated Prediction Variance
,”
Qual. Eng.
,
25
(
2
), pp.
164
174
. 10.1080/08982112.2012.758284
60.
Hofiman
,
M. D.
,
Brochu
,
E.
, and
de Freitas
,
N.
,
2011
, “
Portfolio Allocation for Bayesian Optimization
,”
UAI, Citeseer
, pp.
327
336
.
61.
Travaglino
,
S.
,
Murdock
,
K.
,
Tran
,
A.
,
Martin
,
C.
,
Liang
,
L.
,
Wang
,
Y.
, and
Sun
,
W.
,
2020
, “
Computational Optimization Study of Transcatheter Aortic Valve Leafiet Design Using Porcine and Bovine Leafiets
,”
ASME J. Biomech. Eng.
,
142
(
1
), p.
011007
. 10.1115/1.4044244
62.
Lawrence
,
N. D.
,
2004
, “
Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data
,”
Advances in Neural Information Processing Systems
, pp.
329
336
.
63.
Tran
,
A.
,
He
,
L.
, and
Wang
,
Y.
,
2018
, “
An Eficient First-Principles Saddle Point Searching Method Based on Distributed Kriging Metamodels
,”
ASCE-ASME J. Risk Uncertainty Eng. Syst., Part B: Mech. Eng.
,
4
(
1
), p.
011006
. 10.1115/1.4037459
64.
Tran
,
A.
,
Furlan
,
J. M.
,
Pagalthivarthi
,
K. V.
,
Visintainer
,
R. J.
,
Wildey
,
T.
, and
Wang
,
Y.
,
2019
, “
WearGP: A Computationally Eficient Machine Learning Framework for Local Erosive Wear Predictions Via Nodal Gaussian Processes
,”
Wear
,
422
, pp.
9
26
. 10.1016/j.wear.2018.12.081
65.
Snelson
,
E.
, and
Ghahramani
,
Z.
,
2006
, “
Sparse Gaussian Processes Using Pseudo-Inputs
,”
Advances in Neural Information Processing Systems
, pp.
1257
1264
.
66.
Kersting
,
K.
,
Plagemann
,
C.
,
Pfafi
,
P.
, and
Burgard
,
W.
,
2007
, “
Most Likely Heteroscedastic Gaussian Process Regression
,”
Proceedings of the 24th International Conference on Machine Learning, ACM
, pp.
393
400
.
67.
Snoek
,
J.
,
Rippel
,
O.
,
Swersky
,
K.
,
Kiros
,
R.
,
Satish
,
N.
,
Sundaram
,
N.
,
Patwary
,
M. M. A.
,
Prabhat
,
M.
, and