Homepage     About Us     Contact   
  An innovator in modeling software and services
   Do the Polynomial Puzzle!
    TaylorFit Software Product Model Development Services Customer Applications   
What kind of data do I need ?
 Engineering
  Primer on Modeling
  Our Products
  Other Applications
  MPR for Time Series Analysis
  Data Types Needed for MPR
  Example Applications
  Down Load Users' Manual FREE
  Logical Capabilities of MPR
  Other Modeling Methods
  Bibliography
This is a frequently asked question. The short answer is as follows:
Any numerical data that can be arranged in rows and columns can be modeled by MPR.

Each row represents a single "data point." Each data point is a single measurement of the dependent variable and its associated independent variables. Each column represents a different variable. To take the retail store example, each store is represented by a row in the dataset. Each variable (gross margin, supplies-to-equipment ratio, market share, etc.) occupies a different column.

There should be more data points than variables (more rows than columns), although there can be exceptions to this. The final model can have no more terms or coefficients than there are data points in the dataset used for fitting.

If some data points have missing variables, the data point must either be removed, or the missing data must be filled in. A simple way to do this is to fill it in with the mean of the rest of the data points for that variable.

If the dataset is a time-series, then there can be no missing data points. That is, no measurements can have been skipped. If there are missing data, then they must be filled in as described above, or by interpolating between the data before and after.

The data must be numerical. However, some qualitative variables can be transformed into numbers. In the retail stores example, some variables had a yes/no quality. These could be represented numerically as a 1 or a 0. Similarly, male/female, treated/untreated, or any other two-way distinction could be represented this way. This is called coded variables or dummy variables.

Sometimes the dependent variable can be a dummy variable. An example is the wavemaker machine example, where failure of the machine was coded as a 1, and nonfailure was coded as 0. Then, when a prediction is made, the output should be rounded off to 0 or 1. There is a different type of regression designed for this situation called logistic regression. MPR can be used in place of logistic regression. It then brings along its advantage of being capable of describing nonlinearities including interactions.