General Guidelines¶

Scaling Datasets¶

For best results, always scale your data sets to the order of one. You may do this prior to reading in your data file, or after your data file has been imported using the scaling feature in CurveExpert Professional (see Operating on Data).

Imagine a data set with x values ranging from 1000 to 10000 and a regression model where the term exp(a*x) is involved. Unless a very good initial guess for a is given, chances are that an exponential to a very large power (eg. exp(1000)) will be taken in the course of the nonlinear regression algorithm. The calculation will overflow, and the regression will fail as a result. Even if the regression happens not to fail, the regression algorithm will have an exceedingly tough time finding the correct parameters, since a small change in the free parameters will cause a tremendous change in the size of the term. The moral of the story: always scale your data!

As another example, if you have data that describes atmospheric pressure at different elevations, you might have (in metric units) a data set that looks like:

x=[-100, 0, 100, 500, 1000, 4000] meters
y=[102000,101325,101000,100500,100000,99900] Pascals

Using the Data->Scale feature in the Data menu of CurveExpert Professional, you can scale this data using a scale factor of 0.001 on the x data, and 0.00001 on the y data. So, you would then have the following data set:

x=[-0.1, 0, 0.1, 0.500, 1, 4] kilometers
y=[1.02,1.01325,1.01,1.005,1.0,0.999] bars

The second example is much more likely to allow nonlinear regressions to converge, and also will allow higher order polynomial fits to be performed with more accuracy. If you have a data set that seems particularly ill-behaved, scaling can help solve this problem.

Note that CurveExpert Professional is able to perform correctly on data in any scale, as long as the calculations do not overflow or underflow. So, if a data set is giving problems, scaling it should be the first action to take.

Weighting¶

Weighting has to do with defining how “important” a particular point is in the optimization calculation; i.e., accounting for the fact that in some cases, certain data in the dataset is more precise than other data. Thus, a greater importance should be placed on the more precise points. More precisely defined points (those with less uncertainty) should be given greater influence (a greater weighting) over the final fitted parameters.

There are six choices for weighting in CurveExpert Professional:

Default: No weighting unless a standard deviation column is defined in the dataset; in that case, weight by the square of the standard deviation column. This is equivalent to weighting by the uncertainty (if available).
None: No weighting under any circumstance, even if a standard deviation column is present.
Y: Weight each data point by the absolute value of its Y (dependent variable) value.
Y^2: Weight each data point by the square of its Y (dependent variable) value.
X: Weight each data point by the absolute value of its X (independent variable) value.
X^2: Weight each data point by the square of its X (independent variable) value.

Note that if there is more than one independent variable, weighting by X and X^2 will not be available.

Set the tolerance parameter reasonably¶

Don’t set the tolerance parameter (in Edit->Preferences->Regression) too low. In regression modeling, not much advantage is to be gained by setting a very strict tolerance. Its main purpose in life is to prevent the nonlinear regression algorithm from converging on local minima, not to make the calculated parameters more accurate.

Data should be appropriate to the model¶

Make sure that the data is appropriate to the model. Especially look out for using logarithmic or exponential families of models with data that contains zeros or negatives. For example, it is not possible, in any shape or form, to obtain a negative or zero with the basic exponential model (y=ae^(bx), assuming a is positive; the inverse problem exists for a negative a). So, it is not wise to use a model that cannot reflect the trends in the data.

For interpolations, data points must be sorted¶

If you are using an interpolation-type curve fit, the data points must be sorted by ascending x. Use the Data->Sort feature in CurveExpert Professional to sort your data points correctly.

Sometimes, it is unavoidable¶

Some curve fits are simply ill-behaved, i.e. prone to divergence. For some data sets, it may not be possible to converge certain nonlinear regression models.