General Guidelines¶

Scaling Datasets¶

For best results, always scale your data sets to the order of one. You may do this prior to reading in your data file, or after your data file has been imported using the scaling feature in CurveExpert Basic (see Operating on Data).

Imagine a data set with x values ranging from 1000 to 10000 and a regression model where the term exp(a*x) is involved. Unless a very good initial guess for a is given, chances are that an exponential to a very large power (eg. exp(1000)) will be taken in the course of the nonlinear regression algorithm. The calculation will overflow, and the regression will fail as a result. Even if the regression happens not to fail, the regression algorithm will have an exceedingly tough time finding the correct parameters, since a small change in the free parameters will cause a tremendous change in the size of the term. The moral of the story: always scale your data!

As another example, if you have data that describes atmospheric pressure at different elevations, you might have (in metric units) a data set that looks like:

x=[-100, 0, 100, 500, 1000, 4000] meters
y=[102000,101325,101000,100500,100000,99900] Pascals

Using the Data->Scale feature in the Data menu of CurveExpert Basic, you can scale this data using a scale factor of 0.001 on the x data, and 0.00001 on the y data. So, you would then have the following data set:

x=[-0.1, 0, 0.1, 0.500, 1, 4] kilometers
y=[1.02,1.01325,1.01,1.005,1.0,0.999] bars

The second example is much more likely to allow nonlinear regressions to converge, and also will allow higher order polynomial fits to be performed with more accuracy. If you have a data set that seems particularly ill-behaved, scaling can help solve this problem.

Note that CurveExpert Basic is able to perform correctly on data in any scale, as long as the calculations do not overflow or underflow. So, if a data set is giving problems, scaling it should be the first action to take.

Set the tolerance parameter reasonably¶

Don’t set the tolerance parameter (in Edit->Preferences->Regression) too low. In regression modeling, not much advantage is to be gained by setting a very strict tolerance. Its main purpose in life is to prevent the nonlinear regression algorithm from converging on local minima, not to make the calculated parameters more accurate.

Data should be appropriate to the model¶

Make sure that the data is appropriate to the model. Especially look out for using logarithmic or exponential families of models with data that contains zeros or negatives. For example, it is not possible, in any shape or form, to obtain a negative or zero with the basic exponential model (y=ae^(bx), assuming a is positive; the inverse problem exists for a negative a). So, it is not wise to use a model that cannot reflect the trends in the data.

Sometimes, it is unavoidable¶

Some curve fits are simply ill-behaved, i.e. prone to divergence. For some data sets, it may not be possible to converge certain nonlinear regression models.