QuickStart guide

  1. Run ndCurveMaster
  2. Click on the Open button
  3. Select a dataset file and click OK
  4. In the Input Data window select Y and X variables. You can also click Select All, to select all X variables
  5. Click OK to load a dataset
  6. Next you will see the main form of ndCurveMaster. Click on the Advanced Search button to start automated fitting.
  7. After searching, you will get the best model.


Back to Top


Toolbar

Toolbar

Detailed description of these options can be found here:

Back to Top


Open a Data Set

Click on the Open button and select the required file. These can be:

You can also click on the "Reopen" button and select previously loaded files.

This will bring up the Input Data window:

the Input Open window

In this window you may select:

then click OK.

1 Selecting this option is not recommended for processing an extremely large data set when using a low-performance computer.


Back to Top


Start model fitting to data

After opening the data set you may use the following options:

Back to Top


Polynomial regression fitting

The ndCurveMaster program may be used to fit any polynomial model for any number of variables, such as:
y = a0 + a1 · x1 + a2 · x2 + a3 · x3 + a4 · x1 · x2 + a5 · x1 · x3 + a6 · x2 · x3 + a7 · x1 · x2 · x3 + a8 · x1^2 + a9 · x2^2 + a10 · x3^2 + a11 · x1^3 + a12 · x2^3 + a13 · x3^3 + a14 · x1^2 · x2 + a15 · x1 · x2^2 + a16 · x1^2 · x3 + a17 · x1 · x3^2 + a18 · x2 · x3^2 + a19 · x2^2 · x3 + a20 · x1^2 · x2 · x3 + a21 · x1 · x2^2 · x3 + a22 · x1 · x2 · x3^2

After loading the data set, first click the “Load” button on the function list toolbar:

Then, select a polynomial function file in the “Open File” dialog box:

The functions are described in detail here.

Select the “Random search using only power functions, such as x^2, x^3, ... , x^6” checkbox in “Settings” to narrow down the search and find solutions among polynomial functions:

Click on the "Random Search" button to begin the search.
Back to Top


Model fitting using the "Advanced Search" option

The easiest way to fit a model is click on the Advanced Search button to apply the fully automatic search method. This method utilizes a machine learning algorithm. Details about this method can be found here.
Before using this method, you can set a general regression model by using the Manually Reduce or/and Manually Expand options. You can also configure your search preferences by using the "Settings" option.

If any equations are marked red after a search using this method:

Back to Top


Model fitting by using the "Random Search" and "Randomly Iterated Search" method

An alternative method is a combination of Random Search and Randomly Iterated Search methods. An example method is as follows:

By clicking on the Manually Reduce/Expand or Auto Reduce/Expand buttons in any order you can improve, freely expand or reduce the model. It is strongly recommended that the Randomly Iterated Search option is used after every expansion and reduction operation.

New models are added to the collection after following the above steps.

You can click on ANY previous model from this collection to improve, expand or reduce it once again.

It is strongly recommended to click on the first model and then click fit again, this is due to the heuristic technique used by ndCurveMaster for curve fitting. This improves and enables the discovery of better models even if you repeat fitting each time:

Therefore it is advised to try to fit models a number of times from the first linear model. Consequently, new better models will be created and added to the model collection. Finally you may then choose the best model.
Another method is to use the Random Search and Randomly Iterated Search method separately.

Back to Top


Randomly Iterated Search

The Randomly Iterated Search method uses an algorithm where predictors are randomized and base function are iterated.
This algorithm is fast and efficient but the solution is limited due to the iteration.
This algorithm will finish searching itself when the correlation coefficient value reaches its maximum.

Back to Top


Random Search

The Random Search method offers a search algorithm in which variables and base functions are fully randomized.
This method is slower and takes much more time than Randomly Iterated Search method but the solution is unlimited due to the randomization process.
This algorithm will not finish searching itself. This search can only be manually stopped by pushing the ESC key.
You can search for an unlimited time when using this method.

Back to Top


Advanced Search

The Advanced Search method utilizes a machine learning algorithm for the discovery of equations. This method is a combination of Random Search and Randomly Iterated Search methods. A set of models found by using the Random Search method within a selected period of time is the first step.
Three top models characterized by the lowest RMSE error are searched in detail using the Randomly Iterated Search method at the next step, with the number of iterations being equal to the number of predictors in this 2nd step. Search preferences can be configured in the Settings menu.

Back to Top


Manual Reduction

The Manually Reduce option enables the user to manually select "a0" constant and predictor variables:

Manual Reduction

Back to Top


Manual Expansion

The Manually Expand option enables the user to manually add a new nonlinear predictor variable, as follows:

Back to Top


Auto Reduce

The Auto Reduce option results in the automatic deletion all predictors from the model which may create statistically insignificant errors by the use of backward elimination.

Back to Top


Auto Expand

The Auto Expand option enables the user to automatically add a new nonlinear equation and expand the model by the use of the Randomly Iterated Search method.

Back to Top


Lock

The “Lock” option enables the locking of the predictor(s) while searching. The selected predictor(s) will not be modified during a search.

Back to Top


Export and load function from file

Click the "Export" or "Load" buttons on the toolbar above the collection of functions to save or load functions from disk:

Depending on the search method selected in the "Input Data" window, function files have the following extensions: mf0, mf1, ..., mf5. The table below provides a detailed breakdown:


Extension Search method selected in the “Input data” window
.mf0 fast search using power, exponential and logarithmic functions
.mf1 fast search using only power functions with rational exponents
.mf2 detail search using power, exponential and logarithmic functions
.mf3 detail search using power, exponential, logarithmic and trigonometric functions
.mf4 detail search using only power functions
.mf5 detail search using only power functions in the range from x^-3.5 to x^3.5

By default, function files are stored in the "Functions" folder. A collection of polynomial functions may also be found in this folder; a description of these functions can be found in the Table below:

File Function
poly2.mf* y = a0 + a1 · x1 + a2 · x1^2
poly3.mf* y = a0 + a1 · x1 + a2 · x1^2 + a3 · x1^3
poly4.mf* y = a0 + a1 · x1 + a2 · x1^2 + a3 · x1^3 + a4 · x1^4
poly5.mf* y = a0 + a1 · x1 + a2 · x1^2 + a3 · x1^3 + a4 · x1^4 + a5 · x1^5
poly6.mf* y = a0 + a1 · x1 + a2 · x1^2 + a3 · x1^3 + a4 · x1^4 + a5 · x1^5 + a6 · x1^6
poly22.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2 + a5 · x2^2
poly12.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x2^2
poly21.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2
poly22Full.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2 + a5 · x2^2 + a6 · x1^2 · x2 + a7 · x1 · x2^2 + a8 · x1^2 · x2^2
poly33.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2 + a5 · x2^2 + a6 · x1^3 + a7 · x2^3 + a8 · x1^2 · x2 + a9 · x1 · x2^2
poly13.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x2^2 + a5 · x2^3 + a6 · x1 · x2^2
poly31.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2 + a5 · x1^3 + a6 · x1^2 · x2
poly23.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2 + a5 · x2^2 + a6 · x2^3 + a7 · x1^2 · x2 + a8 · x1 · x2^2
poly32.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2 + a5 · x2^2 + a6 · x1^3 + a7 · x1^2 · x2 + a8 · x1 · x2^2
poly33Full.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2 + a5 · x2^2 + a6 · x1^2 · x2 + a7 · x1 · x2^2 + a8 · x1^2 · x2^2 + a9 · x1^3 + a10 · x2^3 + a11 · x1^3 · x2 + a12 · x1^3 · x2^2 + a13 · x1 · x2^3 + a14 · x1^2 · x2^3 + a15 · x1^3 · x2^3
poly14.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x2^2 + a5 · x2^3 + a6 · x1 · x2^2 + a7 · x1 · x2^3 + a8 · x2^4
poly24.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2 + a5 · x2^2 + a6 · x2^3 + a7 · x1^2 · x2 + a8 · x1 · x2^2 + a9 · x1^2 · x2^2 + a10 · x1 · x2^3 + a11 · x2^4
poly34.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2 + a5 · x2^2 + a6 · x1^3 + a7 · x2^3 + a8 · x1^2 · x2 + a9 · x1 · x2^2 + a10 · x1^3 · x2 + a11 · x1^2 · x2^2 + a12 · x1 · x2^3 + a13 · x2^4
poly41.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2 + a5 · x1^3 + a6 · x1^2 · x2 + a7 · x1^4 + a8 · x1^3 · x2
poly42.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2 + a5 · x2^2 + a6 · x1^3 + a7 · x1^2 · x2 + a8 · x1 · x2^2 + a9 · x1^4 + a10 · x1^3 · x2 + a11 · x1^2 · x2^2
poly43.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2 + a5 · x2^2 + a6 · x1^3 + a7 · x2^3 + a8 · x1^2 · x2 + a9 · x1 · x2^2 + a10 · x1^4 + a11 · x1^3 · x2 + a12 · x1^2 · x2^2 + a13 · x1 · x2^3
poly44.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2 + a5 · x2^2 + a6 · x1^3 + a7 · x2^3 + a8 · x1^2 · x2 + a9 · x1 · x2^2 + a10 · x1^4 + a11 · x1^3 · x2 + a12 · x1^2 · x2^2 + a13 · x1 · x2^3 + a14 · x2^4
poly44Full.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2 + a5 · x2^2 + a6 · x1^2 · x2 + a7 · x1 · x2^2 + a8 · x1^2 · x2^2 + a9 · x1^3 + a10 · x2^3 + a11 · x1^3 · x2 + a12 · x1^3 · x2^2 + a13 · x1 · x2^3 + a14 · x1^2 · x2^3 + a15 · x1^3 · x2^3 + a16 · x1^4 + a17 · x2^4 + a18 · x1^4 · x2 + a19 · x1^4 · x2^2 + a20 · x1^4 · x2^3 + a21 · x1 · x2^4 + a22 · x1^2 · x2^4 + a23 · x1^3 · x2^4 + a24 · x1^4 · x2^4
poly15.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x2^2 + a5 · x2^3 + a6 · x1 · x2^2 + a7 · x1 · x2^3 + a8 · x2^4 + a9 · x1 · x2^4 + a10 · x2^5
poly25.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2 + a5 · x2^2 + a6 · x2^3 + a7 · x1^2 · x2 + a8 · x1 · x2^2 + a9 · x1^2 · x2^2 + a10 · x1 · x2^3 + a11 · x2^4 + a12 · x1^2 · x2^3 + a13 · x1 · x2^4 + a14 · x2^5
poly35.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2 + a5 · x2^2 + a6 · x1^3 + a7 · x2^3 + a8 · x1^2 · x2 + a9 · x1 · x2^2 + a10 · x1^3 · x2 + a11 · x1^2 · x2^2 + a12 · x1 · x2^3 + a13 · x2^4 + a14 · x1^3 · x2^2 + a15 · x1^2 · x2^3 + a16 · x1 · x2^4 + a17 · x2^5
poly45.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2 + a5 · x2^2 + a6 · x1^3 + a7 · x2^3 + a8 · x1^2 · x2 + a9 · x1 · x2^2 + a10 · x1^4 + a11 · x1^3 · x2 + a12 · x1^2 · x2^2 + a13 · x1 · x2^3 + a14 · x2^4 + a15 · x1^4 · x2 + a16 · x1^3 · x2^2 + a17 · x1^2 · x2^3 + a18 · x1 · x2^4 + a19 · x2^5
poly51.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2 + a5 · x1^3 + a6 · x1^2 · x2 + a7 · x1^4 + a8 · x1^3 · x2 + a9 · x1^5 + a10 · x1^4 · x2
poly52.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2 + a5 · x2^2 + a6 · x1^3 + a7 · x1^2 · x2 + a8 · x1 · x2^2 + a9 · x1^4 + a10 · x1^3 · x2 + a11 · x1^2 · x2^2 + a12 · x1^5 + a13 · x1^4 · x2 + a14 · x1^3 · x2^2
poly53.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2 + a5 · x2^2 + a6 · x1^3 + a7 · x2^3 + a8 · x1^2 · x2 + a9 · x1 · x2^2 + a10 · x1^4 + a11 · x1^3 · x2 + a12 · x1^2 · x2^2 + a13 · x1 · x2^3 + a14 · x1^5 + a15 · x1^4 · x2 + a16 · x1^3 · x2^2 + a17 · x1^2 · x2^3
poly54.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2 + a5 · x2^2 + a6 · x1^3 + a7 · x2^3 + a8 · x1^2 · x2 + a9 · x1 · x2^2 + a10 · x1^4 + a11 · x1^3 · x2 + a12 · x1^2 · x2^2 + a13 · x1 · x2^3 + a14 · x2^4 + a15 · x1^5 + a16 · x1^4 · x2 + a17 · x1^3 · x2^2 + a18 · x1^2 · x2^3 + a19 · x1 · x2^4
poly55.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2 + a5 · x2^2 + a6 · x1^3 + a7 · x2^3 + a8 · x1^2 · x2 + a9 · x1 · x2^2 + a10 · x1^4 + a11 · x1^3 · x2 + a12 · x1^2 · x2^2 + a13 · x1 · x2^3 + a14 · x2^4 + a15 · x1^5 + a16 · x1^4 · x2 + a17 · x1^3 · x2^2 + a18 · x1^2 · x2^3 + a19 · x1 · x2^4 + a20 · x2^5
poly55Full.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x1 · x2 + a4 · x1^2 + a5 · x2^2 + a6 · x1^2 · x2 + a7 · x1 · x2^2 + a8 · x1^2 · x2^2 + a9 · x1^3 + a10 · x2^3 + a11 · x1^3 · x2 + a12 · x1^3 · x2^2 + a13 · x1 · x2^3 + a14 · x1^2 · x2^3 + a15 · x1^3 · x2^3 + a16 · x1^4 + a17 · x2^4 + a18 · x1^4 · x2 + a19 · x1^4 · x2^2 + a20 · x1^4 · x2^3 + a21 · x1 · x2^4 + a22 · x1^2 · x2^4 + a23 · x1^3 · x2^4 + a24 · x1^4 · x2^4 + a25 · x1^5 + a26 · x2^5 + a27 · x1^5 · x2 + a28 · x1^5 · x2^2 + a29 · x1^5 · x2^3 + a30 · x1^5 · x2^4 + a31 · x1 · x2^5 + a32 · x1^2 · x2^5 + a33 · x1^3 · x2^5 + a34 · x1^4 · x2^5 + a35 · x1^5 · x2^5
poly222.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x3 + a4 · x1 · x2 + a5 · x1 · x3 + a6 · x2 · x3 + a7 · x1 · x2 · x3 + a8 · x1^2 + a9 · x2^2 + a10 · x3^2
poly333.mf* y = a0 + a1 · x1 + a2 · x2 + a3 · x3 + a4 · x1 · x2 + a5 · x1 · x3 + a6 · x2 · x3 + a7 · x1 · x2 · x3 + a8 · x1^2 + a9 · x2^2 + a10 · x3^2 + a11 · x1^3 + a12 · x2^3 + a13 · x3^3 + a14 · x1^2 · x2 + a15 · x1 · x2^2 + a16 · x1^2 · x3 + a17 · x1 · x3^2 + a18 · x2 · x3^2 + a19 · x2^2 · x3 + a20 · x1^2 · x2 · x3 + a21 · x1 · x2^2 · x3 + a22 · x1 · x2 · x3^2

Back to Top

Settings

In the Settings window, you can configure your search preferences for all search methods as well as the Advanced Search method.

In the "General Preferences" box you can configure settings for all search methods and select:

The Advanced Search method is a complex version of the Random Search method in the first phase and the Randomly Iterated Search method in the second search phase.
In the "Advanced Search Preferences" box you can configure this type of search and select:

Back to Top


Results

The list of equations

the Collection of best models window

The collection consists of models, as well as the description of every model with coefficients, error and statistical parameters.

You can review every model from the collection.

You can copy a collection to the clipboard or save to a csv file by using the Copy and Save buttons. You can copy only the selected model by the use of the Copybutton.

All calculation results are available for each model from the collection of models window.
Back to Top


The "Statistics" window

The Statistics window will present statistical analysis for the model:

the Statistics window

Insignificant predictors are marked red in the Statistics window and "attempt removal" can be seen.

The most significant predictor is blue in this window.

A locked predictor is shown in italics

If all equations in the model are significant, the "Auto Reduce" option is not available.

The last column in the regression analysis table presents variance inflation factor (VIF):

VIF

The VIF index is commonly used for detection of multicollinearity (more details can be found here).
Back to Top


The "Pearson Correlation Matrix" window

The Pearson Correlation Matrix window shows Pearson Correlation coefficients between each pair of variables:

the Pearson Correlation Matrix window

Examining the correlations of variables is the simplest way to detect multicollinearity. In general, an absolute correlation coefficient of more than 0.7 between two variables (or more) indicates the presence of multicollinearity.
Back to Top


The "Data" window

the Normal data view window

the Full data view window

Back to Top


The "Graphs" window

Fit-line curve:

Scatterplot:

Residuals:

Standardized Residuals:

Histogram:

Back to Top


Machine learning in ndCurveMaster

ndCurveMaster utilizes a machine learning method for the discovery of equations. This method is a combination of random and iterated search. In the first step, a set of models are found through a random search within a selected period of time. Three top models that offer the lowest RMSE error are searched in detail using randomly iterated searches as a next step. The user can configure this type of search and select:

Back to Top


The detection of multicollinearity

ndCurveMaster offers multicollinearity detection option through the use of a variance inflation factor (VIF) to improve the quality of models which are developed. The VIF index is commonly used for the detection of multicollinearity (more details can be found here). There is no formal VIF value to determine the presence of multicollinearity. Any VIF values that exceed 10 are often regarded as an indication of multicollinearity, but in weaker model values above 2.5 this may be a cause for concern.

ndCurveMaster calculates the VIF values of each model. These VIF values are shown in the last column of the regression analysis table for each predictor, see below:

VIF

In addition, ndCurveMaster offers a search facility for models with a VIF limit value. The user can select a “VIF cannot exceed” checkbox to only display models that do not exceed the selected VIF value. The default VIF limit value is 10, see below:

The user can increase or lower the limit.

Detailed explanation on identifying and fixing multicollinearity can be found here.

Back to Top


Heuristic techniques

ndCurveMaster uses heuristic techniques for curve fitting and implements scientific algorithms.

Finding the best combinations in 3D/4D/5D/6D/..nD models and selecting the best-fitting functions from the function set results in a large number of possible variants. Searching through all possible variants by using an exact algorithm is computationally expensive and time consuming. A heuristic approach has been implemented in ndCurveMaster software to solve this problem.

The best nonlinear functions and variable combinations are selected through randomization and looped searching through the use of the following methods:

These methods improve the discovery of better models. The disadvantage of these heuristic techniques is that the solution is not as optimal as the exact approach. But multiple searches allow the user to find a solution which is closer to an optimal result.

The effect of using the heuristic algorithm is that even when you use the same data set each time:

Therefore it would be advisable to try to repeatedly fit models to the same data.

Back to Top


Overfitting Detection

Overfitting occurs when the statistical model has too many parameters in relation to the size of the sample from which it was constructed. This phenomenon is a problem found primarily in machine learning and will not usually apply in the case of regression models.

But ndCurveMaster offers advanced algorithms that allow the user to build complicated multivariable models to accurately describe empirical data. Overfitting may occur under these conditions.

In regression analysis with one independent variable this setting means you can easily detect overfitting in the graph:

But in statistical analysis of many variables it is not possible to detect overfitting in this way.

Therefore, an overfitting detection technique has been implemented in ndCurveMaster. The test set method is used to detect overfitting. ndCurveMaster may randomly choose part of the data and use it in a test set:

Next ndCurveMaster performs regression using the remaing data. And finally ndCurveMaster can detect overfitting by comparing test set and dataset RMS errors.

Here is an example multivariable regression model:

Y = a0 + a1 · x1^(-1/2) + a2 · (ln(x3))^8 + a3 · x1^0.45 · (ln(x4))^2 + a4 · exp(x1) · x2^1.3 · x3^0.95 + a5 · exp(x2)^1.5 · x3^0.45 · ln(x4)

Standard statistical analysis referring to the data set not detecting overfitting:

But ndCurveMaster can also check test data and data set RMS errors:

The test set RMS error is 6885.69 and the dataset RMS error only equals 3421.39.
ndCurveMaster detects overfitting as the test set error is 2.01 times the dataset error.

The use of overfitting is clearly shown the graph below. The blue points represent the dataset and the red - the test set:

The results from the graph mean that the fit of the data set points looks perfect but the test set points do not.

Back to Top


Scaling Datasets

For best results, try scaling your data sets. Imagine a data set with x values ranging from -10 000 to 100 000 and a regression model where the term 2^x is involved. The calculation will overflow because 2^100000, and the regression will fail as a result. Therefore ndCurveMaster cannot use 2^x formula in this case.

If data set looks like this:
X = [-10 000, -5 000, 0, 500, 1000 ,10 000, 100 000] metres, you can scale this data to the following data set:
X = [-10, -5, 0, 0.5, 1, 10, 100] kilometres.

Detailed information about normalization can be found here: en.wikipedia.org/wiki/Normalization_(statistics)

Back to Top


List of functions:

List of functions in the "fast search using power, exponential and logarithmic functions" search method

Back to Top


List of functions in the "fast search using only power functions with rational exponents" search method

Back to Top


List of functions in the "detailed search using power, exponential and logarithmic functions" search method

Back to Top


List of functions in the "detailed search using power, exponential, logarithmic and trigonometric functions" search method

Back to Top


List of functions in the "detailed search using only power functions" search method

Back to Top


List of functions in the "detailed search using only power functions in the range from x^-3.5 to x^3.5" search method

Back to Top