   An innovator in modeling software and services        TaylorFit Software Product Model Development Services Customer Applications Nonlinear Modeling Issues    Polynomial Regression There are a couple of problems that are more or less unique to polynomial regression, and each of them has approaches for controlling them. These are nonlinear bias, polynomial wiggle, and explosive behavior. Nonlinear Bias Nonlinear bias is an inevitable result of using nonlinear models with data containing noise. For example, we have an excellent model to "predict" the area of a circle, A, using the diameter, D: A = pi D2/4. A circle 2 meters in diameter is accurately computed to have an area A = 3.14159 square meters. Now, suppose we have two measurements. One measurement is off by 10% on the high side, and the other is 10% low (i.e. 2.2 m and 1.8 m). They average to the true value of 2.0 m, and thus are an unbiased sample. However, the areas computed with these two values are 3.17301 m2 and 2.54469 m2. They average to 3.17301 m2, which is 1% too high! The fact that the formula for the area of a circle exhibits nonlinear bias does not move us to replace it with a linear approximation. That would produce a different type of bias of a worse sort, in which the model is inherently incapable of describing the behavior of the underlying relationship. Instead, we generally live with the problem as the better solution available. Furthermore, there are mathematical methods available that can compensate for nonlinear bias. The same is true for Multivariate Polynomial Regression models. Although including nonlinear effects in a model introduces nonlinear bias, this is preferable to ignoring the nonlinear effects. Polynomial Wiggle If a set of data containing noise is fit to a polynomial with too few degrees of freedom, the resulting polynomial can produce errors much greater than those of the data. This situation is called polynomial wiggle. The extreme case is interpolation, in which there are as many parameters in the polynomial as there are data. Recall from the Basic Introduction to MPR that the degrees of freedom, df, is equal to the number of data points, n, minus the number of parameters, p (df = n - p). Thus interpolation is the situation in which df = 0. A low degrees of freedom may be desired if noise is sufficiently small. Otherwise the wiggle problem may occur. To illustrate, we start with the equation y = ex (see Figure below). The dark blue line is the plot of the equation over the range from 0.0 to 0.4. Next we picked four points along the line, plus one that is displaced by noise (e.g. measurement error). With five points it is possible to exactly fit (interpolate) a fourth-order polynomial. The red line shows the resulting polynomial. Notice that it goes through each point exactly. You can see that the largest error in the curve is greater than the individual error at the one "wrong" point. This is polynomial wiggle. This problem can be minimized by ensuring that there are adequate degrees of freedom. If we fit the same five points using a quadratic polynomial, we obtain the model shown by the green line. Although the model no longer fits the four "good" points exactly, the overall fit is much improved. A quadratic model has three parameters, so the degrees of freedom in this example is 5 - 3 = 2. A related control measure is to keep the maximum exponent low, say 2 or 3. Explosive Behavior Another problem is explosive behavior, which is characterized by extremely high errors when the model is used outside the range of the data used in generating the model. The Figure below shows an example based on the same exponential equation as shown above in the discussion of polynomial wiggle. The blue line, again, is a plot of the exact exponential equation. The red diamonds show points that have had noise added to each of them, representing measurement error. Again, we took the five points and fitted a fourth-order polynomial to them. This resulted in the red line, which goes through each data point exactly. However, as soon as you use the model to make a prediction outside the range of the data points used to create this model, the prediction becomes wildly wrong. The preferred way to deal with this problem is simply not to use the model outside the range of the data. This is true for any empirical model, not just for polynomials. Even linear models can be very unreliable outside the range for which they were validated. When the TaylorFit software saves the model specification in a "pfm" file, it also saves the ranges of all the variables used in the model, which the user can use to check that he or she is within the proper range. A second, but riskier, strategy is the same as the treatment for polynomial wiggle: Maintain adequate degrees of freedom. When we fit a quadratic equation to the five points (df = 2) we get a model shown by the green line in the Figure. However, this only widens the range before the model explodes. It's interesting to note that if we fit the fourth-order polynomial to the data without noise, we get a good fit over a much wider range than the data, even wider than the quadratic fit to the noisy data. But eventually, it, too, explodes. Of course the Taylor Theorem proves that it is possible to attain any degree of accuracy over any range desired, if one chooses a polynomial of high enough degree. This is the basis for the claim that the MPR model is capable of describing any functional relationship. Copyright 2003 Simetrica LLC. All rights reserved. Simetrica Homepage | About Us | Contact | Products | Modeling Development Services | Client Case Studies | Email Feedback 