Response Surface Models

This section provides details on the Response Surface models used in Isight.

Overview
Polynomial Term Selection in Response Surface Models
R2 Analysis of Response Surface Models

Overview

Response Surface Models (RSM) in Isight use polynomials of low order (from 1 to 4) to approximate the response of an actual analysis code. To construct a model, a number of exact analyses using the simulation codes must be performed. Alternatively, a data file with a set of analyzed design points can be used. Therefore, the Response Surface Models can be used in optimization and sensitivity studies with a small computational expense because evaluation only involves calculating the value of a polynomial for a given set of input values. The model accuracy is highly dependent on the amount of data used for its construction (the number of data points), the shape of the exact response function that is approximated, and the volume of the design space in which the model is constructed. In a sufficiently small volume of the design space, any smooth function can be approximated by a quadratic polynomial with good accuracy. For highly nonlinear functions, polynomials of the third or fourth order can be used. If the model is used outside the design space where it was constructed, its accuracy is impaired and the model must be refined.

A maximum order model (fourth order or Quartic model) is represented by a polynomial of the following form:

\tilde{F} (x) = α_{0} + \sum_{i = 1}^{N} b_{i} x_{i} + \sum_{i = 1}^{N} c_{i i} x_{i}^{2} + \sum_{i j (i < j)} c_{i j} x_{i} x_{j} + \sum_{i = 1}^{N} d_{i} x_{i}^{3} + \sum_{i = 1}^{N} e_{i} x_{i}^{4} .

where

$N$ is the number of model inputs,

$x_{i}$ is the set of model inputs, and

$a, b, c, d, e$ are the polynomial coefficients.

A lower-order model (linear, quadratic, cubic) includes only lower-order polynomial terms (only linear, quadratic, or cubic terms correspondingly). The third and fourth order models in Isight do not have any mixed polynomial terms (interactions) of order 3 and 4. Only pure cubic and quartic terms are included to reduce the amount of data required for model construction.

Coefficients of the polynomial ( $a, b, c, d, e$ ) are determined by solving a linear system of equations (one equation for each analyzed design point).

The Response Surface Model construction is controlled by the following options:

The order of the model polynomial (referred to as the polynomial order).
The sub-set of polynomial terms selected.

If you select this option, you can select a sub-set of polynomial terms using one of the four available term selection methods:
- Sequential Replacement
- Stepwise Regression (Efroymson’s algorithm)
- Two-at-a-time Replacement
- Exhaustive Search
For more information about term selection, see Polynomial Term Selection in Response Surface Models.

The number of design points (if Random Points is used for initialization).
The size of the design space around the baseline point in which the initial random designs are generated (if Random Points is used for initialization).

The size of the design space can be set individually for each input parameter. The bounds of the design sub-space can be entered directly (absolute values) or calculated by Isight by applying lower and upper bounds to the baseline value of each parameter (relative to baseline).

Sampling data points needed for initialization of the Response Surface Model approximations can be obtained using one of the available sampling methods in Isight. The typical initialization mode for a Response Surface Model, if no previous data are available, is Random Points. In this case Isight generates the required number of random designs inside the specified boundaries and runs an exact analysis for each of those designs. Obtained data are used for calculating polynomial coefficients of the model. A least squares fit is used to calculate the coefficients.

The recommended number of sampling points for initialization is twice the number of polynomial coefficients, which for a linear polynomial is ( $N + 1$ ), for a quadratic polynomial is $(N + 1) (N + 2) / 2$ , for a cubic polynomial is $(N + 1) (N + 2) / 2 + N$ , and for a quartic polynomial is $(N + 1) (N + 2) / 2 + 2 N$ , where $N$ is the number of input variables.

Polynomial Term Selection in Response Surface Models

You can choose polynomial term selection when you are working with Response Surface Models (RSM). If you do not use polynomial term selection, Isight calculates all the coefficients.

Polynomial term selection has several benefits:

improves prediction reliability of the model,
eliminates predictor variables with little or no effect on the output,
reduces variance of the model, and
selects the best model when a limited number of design points are available.

The basic idea of the polynomial term selection is as follows:

Given a set of $k$ predictor variables $X_{1}, X_{2}, X_{3}, \dots, X_{k},$ select a subset of $p (p < k)$ predictor variables that minimizes the Residual Sum of Squares (RSS):

R S S = {\sum_{i = 1}^{n} (Y_{i} - \sum_{j = 1}^{p} b_{j} X_{i, j})}^{2} .

The best combination of the polynomial terms is selected so that the Residual Sum of Squares is minimized. Because the residuals can be nonzero only when the model has at least one degree of freedom, minimization of the RSS implies that the maximum number of polynomial terms selected must be lower than the number of design points used for the RSM. Otherwise, the RSS will be exactly zero and no term selection will be possible.

The four term selection methods available in Isight have the following features:

Sequential Replacement. The Sequential Replacement algorithm is the simplest and fastest algorithm, but it does not guarantee the best model. This method is a variation of the Forward selection algorithm and has the following steps:
1. Start with the constant term, select the next best term.
2. At every step of the forward selection, for every previously selected term find the best replacement that will decrease the RSS and swap the variables.
3. Select the next best term and add it to the model.
4. Repeat the procedure until the maximum allowed number of terms is selected.

Stepwise Regression (Efroymson’s algorithm). The Stepwise Regression algorithm can sometimes be as fast as the Sequential Replacement algorithm, but it also does not guarantee the best model. The Stepwise Regression algorithm uses 4.0 as the default “F-ratio-to-add-term” and “F-ratio-to-delete-term” values. You can control the values when you create a new RSM. These values will affect the selection process. This method is a variation of the Forward selection method and has the following steps:
1. Start with the constant term, select the next best term.
2. At every step of the Forward selection, add the next best term if it sufficiently decreases the RSS using the following criterion
  $R S S = {\sum_{i = 1}^{n} (Y_{i} - \sum_{j = 1}^{p} b_{j} X_{i, j})}^{2} .$
3. At every step of the Forward selection, check if one of the selected terms can be dropped without appreciably increasing the RSS using the following criterion
  $R = \frac{R S S_{p - 1} - R S S_{p}}{R S S_{p} / (n - p - 1)} < F_{d e l e t e .}$
4. Repeat the process until no more terms satisfy the first criterion or until the maximum desired number of terms is selected.
Two-at-a-time Replacement. The Two-at-a-time Replacement algorithm is more expensive than the two previous algorithms and has a much better chance of finding the best model. This method is a variation of the Forward selection method and has the following steps:
1. Start with the constant term, select the next best term.
2. At every step of the Forward selection, consider all possible replacements of 1 or 2 terms from the previously selected terms.
3. Find the best replacement combination that will decrease the RSS and swap the variables.
4. Select the next best term and add it to the model.
5. Repeat the procedure until the maximum allowed number of terms is selected.
Exhaustive Search. The Exhaustive Search algorithm is the most expensive algorithm of the algorithms available. It guarantees finding the best model at the cost of a high computational time. The number of design points and the number of selected terms can greatly affect the computational cost and can make this algorithm a nonviable option for large data sets and a large number of inputs. This method is a systematic approach to finding the best combination of terms from all possible combinations. It has the following basic steps:
1. Generate all possible combinations of terms up to the maximum allowed number of terms.
2. Calculate RSS values for all polynomials.
3. Select the best combination of terms to minimize the RSS.

R2 Analysis of Response Surface Models

$R^{2}$ analysis is a measure of how well the model polynomial approximates the actual function at the design points used for its construction. Isight’s Response Surface Model automatically performs $R^{2}$ analysis of the approximated functions when the number of distinct designs used for the response surface model is greater than the number of model coefficients.

The $R^{2}$ value of 1.00 indicates that values of the model polynomial, and values of the response function, are identical at all the design points. It is always possible to perfectly fit $N$ points using a polynomial with $N + 1$ coefficients. Therefore, a perfect value of the $R^{2}$ coefficient does not necessarily indicate that the actual function will match the model polynomial everywhere in the design space, unless the number of points used for analysis is considerably greater (3–10 times) than the number of polynomial coefficients. Information about $R^{2}$ analysis is reported inside the coefficients data for each output polynomial.