ndCurveMaster

Scientific Solutions for Data Analysis and Curve Fitting

Optimizing Complex Nonlinear Regression Models using Heuristic and Random Search Methods

Finding complex multivariable nonlinear regression models (3D, 4D, 5D, 6D, etc.) and selecting the best-fitting functions results in a large number of possible combinations. Searching through all these combinations using an exact algorithm is both computationally expensive and time-consuming. Therefore, ndCurveMaster employs heuristic techniques for curve fitting and data analysis and incorporates scientific algorithms based on machine learning, such as random search, to address this challenge. In this process, the best complex nonlinear regression models are determined through randomization and iterative searching using the following methods:

Although these methods facilitate the discovery of better models, the use of a heuristic algorithm means that even when employing the same dataset, the method of finding the best models, as well as the models themselves, can differ each time. Therefore, repeated searches implemented in the program effectively help the user find a solution very close to the optimal outcome.

Back to Top

Enhanced Multicollinearity Detection in ndCurveMaster

ndCurveMaster provides a multicollinearity detection feature to enhance the quality of the models created through the use of:

Variance Inflation Factor (VIF) for Multicollinearity Detection

The VIF index is widely used for detecting multicollinearity (more information can be found on Wikipedia). There's no strict VIF value to determine the presence of multicollinearity. VIF values above 10 are often considered an indication of multicollinearity, but values above 2.5 in weaker models may cause concern. ndCurveMaster calculates the VIF values for each model, which are displayed in the last column of the regression analysis table for each predictor, as shown below:

ndCurveMaster Variance Inflation Factor VIF

In addition, ndCurveMaster offers a search option for models with a VIF limit value. The user can select the "VIF cannot exceed" checkbox to only display models that do not exceed the selected VIF value. The default VIF limit value is 10, as shown below:

ndCurveMaster VIF button

The user can adjust the limit as needed.

Utilizing the Pearson Correlation Matrix in ndCurveMaster

The Pearson Correlation Matrix window displays Pearson Correlation coefficients between each pair of variables, as shown below:

ndCurveMaster Pearson Correlation matrix

Examining the correlations between variables is the simplest way to detect multicollinearity. In general, an absolute correlation coefficient of more than 0.7 between two or more variables indicates the presence of multicollinearity.

Back to Top

Preventing Overfitting in Multivariable Nonlinear Regression Models

Overfitting occurs when a statistical model has too many parameters compared to the sample size used to build it. This is a common problem in machine learning, but it is not usually applicable to regression models. However, ndCurveMaster's advanced algorithms make it possible to build complex multi-variable models, which can also be susceptible to overfitting.

In regression analysis with one independent variable, overfitting can be easily detected by examining the graph:

ndCurveMaster Overfitting Curve

But in nonlinear regression analysis of many variables it is not possible to detect overfitting in this way. Therefore, ndCurveMaster has implemented an overfitting detection technique using the test set method. The software randomly selects part of the data for the test set:

ndCurveMaster Overfitting Program Option

and uses the remaining data for regression analysis. Overfitting is detected by comparing the root mean square (RMS) errors of the test set and the entire dataset.

The following example shows a multi-variable regression model: Y = a0 + a1*exp(x1) + a2*x2-8 + a3*x35.6 + a4*x4-1 + a5*x59 + a6*x64.1 + a7*exp(x6)3 + a8*x516 + a9*(1/2)x4 + a10*x2-6 + a11*x11.9 + a12*x35.2 + a13*x1-11 + a14*(ln(x3))8 + a15*exp(x5)-1 + a16*x41.9 + a17*x616 + a18*x210 + a19*exp(x3)5 + a20*(ln(x6))2 + a21*x4-4 + a22*exp(x5)2 + a23*x44.2 + a24*x515 + a25*x615 + a26*x312

Standard statistical analysis of the entire dataset may not detect overfitting. However, ndCurveMaster can also check the RMS errors of the test set and the entire dataset. In the example below, the test set RMS error is 9.55 and the dataset RMS error is 0.138. ndCurveMaster detects overfitting because the test set error is 68.775 times the dataset error.

ndCurveMaster Overfitting Detect

The overfitting is clearly demonstrated in the graph below, where blue points represent the entire dataset and yellow points represent the test set:

ndCurveMaster Overfitting Detect Chart

The graph shows that the fit of the entire dataset looks perfect, but the test set points do not fit well.

Back to Top