Nonlinear data modeling is a routine task in data science and analytics domain. It is extremely rare to find a natural process whose outcome varies linearly with the independent variables.

Therefore, we need an easy and robust methodology to quickly fit a measured data set against a set of variables assuming that the measured data could be a complex nonlinear function. This should be a fairly common tool in the repertoire of a data scientist or machine learning engineer.

There are few pertinent questions to consider: That is OK only when one can visualize the data clearly (feature dimension is 1 or 2). It is lot tougher for feature dimensions 3 or higher.

And it’s complete waste of time if there is cross-coupling within the features influencing the outcome. Let me show this by plots, It is easy to see that plotting only takes you so far.

For high-dimensional mutually-interacting data set, you can draw completely wrong conclusion if you try to look at the output vs. one input variable plot at a time. And, there is no easy way to visualize more than 2 variables at a time. Read more from…

thumbnail courtesy of