About Cross-Validation

Cross-validation is an approximation model validation technique for assessing how accurately a predictive model will perform in practice.

Related Topics
Creating User-Defined Approximations

Typically, cross-validation is performed in multiple steps. Each step of cross-validation involves partitioning a sample of data into complementary subsets, performing the fit on one subset (called the training set), and validating the analysis on the other subset (called the validation set or the testing set). The specific type of cross-validation employed by Isight is called leave-one-out cross-validation. In this approach, only one point is used as the testing set during each step of the cross-validation procedure. You can change the number of cross-validation points. For better cross-validation accuracy, you can use all the sampling points as cross-validation points. Each cross-validation point adds an additional step in the cross-validation procedures. Therefore, selecting too many points can cause a significant time delay.

The following steps briefly describe what occurs during each step of the cross-validation process:

  1. An approximation model is created (coefficients calculated, term selection performed in case of RMS, etc.) using all the sampling data points.
  2. One point is randomly selected and removed from the sampling data.
  3. The approximation model is re-fit using the reduced sampling data set. The structure of the approximation model is held unchanged during the re-fitting of the model (i.e., if term selection was done for RMS, the same polynomial terms are used; only the coefficient values are re-calculated).
  4. The new approximation model is used to predict the output values for the removed data point.
  5. The exact and approximate (predicted) output values are compared and the errors are calculated and recorded for the current cross-validation point.
  6. The removed data point is put back into the sampling data set.
  7. Steps 2-6 are repeated a specified number of times (once for each cross-validation point).