Multiple Linear Regression Folio Analysis Results

When accessed from a multiple linear regression folio, the Analysis Summary window will contain detailed information about analysis results, including information that describes how each predictor affects response that is currently selected on the control panel.

If the current response data has been analyzed, you can open the window by clicking the View Analysis Summary icon on the control panel.

If the current response data has not been analyzed, the icon will still be available so you can view the folio's analysis history.

Select an item in the Available Report Items panel to display it on the spreadsheet. Each item is described next.

Analysis Results

The Analysis of Variance (ANOVA) table provides general information about the effects of the predictors on the selected response. This information may be presented for individual predictors and or for all the predictors treated as a single group, depending on your analysis setting on the control panel.

ANOVA Table Columns

Source of Variation is the source that caused the difference in the observed output values. The main effects of each predictor will be listed individually or grouped as "main effects," depending on whether you selected to use individual or grouped terms in the analysis (specified on the Analysis Settings page of the control panel). Sources displayed in red are considered to be significant.
The number of Degrees of Freedom for the Model is the number of regression coefficients for the effects included in the analysis (e.g., two coefficients might be included in the regression table for a given main effect). The number of degrees of freedom for the Residual is the total number of observations minus the number of parameters being estimated.
Sum of Squares is the amount of difference in observed output values caused by this source of variation.
Mean Squares is the average amount of difference caused by this source of variation. This is equal to Sum of Squares/Degrees of Freedom.
F Ratio is the ratio of Mean Squares of this source of variation and Mean Squares of pure error. A large value in this column indicates that the difference in the output caused by this source of variation is greater than the difference caused by noise (i.e., this source affects the output).
P Value (alpha error or type I error) is the probability that an equal amount of variation in the output would be observed in the case that this source does not affect the output. This value is compared to the risk level (alpha) that you specify on the Analysis Settings page of the control panel. If the p value is less than alpha, this source of variation is considered to have a significant effect on the output. In this case, the term and its p value will be displayed in red.
The following values are shown underneath the ANOVA table:
- S is the standard error of the noise. It represents the magnitude of the response variation caused by noise.
- R-sq is the percentage of total difference that is attributable to the factors under consideration. It is equal to Sum of Squares(factor)/Total Sum of Squares.
- R-sq(adj) is an R-sq value that is adjusted for the number of parameters in the model.

PRESS is the prediction error sum of squares, which provides a measure of the model’s validity. The lower the PRESS value, the better the model’s predictive ability.
R-sq(pred) is a measure of how well the model predicts new observations. It is equal to 1-PRESS/Total Sum of Squares. The larger the value, the more accurate the model’s predictions are likely to be.

The Regression table provides specific information on the contribution of predictor to the variation in the response and an analysis of the significance of this contribution.

Regression Table Columns

Term is the factor under consideration. Terms displayed in red are considered to be significant. In cases where there is no error in the model, significant effects are determined according to Lenth’s method and the term names are displayed in red and followed by an asterisk (*).

Coefficient is the regression coefficient of the term, which represents the contribution of the term to the variation in the response.
Standard Error is the standard deviation of the regression coefficient.
Low Confidence and High Confidence are the lower and upper confidence bounds on the regression coefficient.
T Value is the normalized regression coefficient, which is equal to Coefficient/Standard Error.
P Value (alpha error or type I error) is the probability that an equal amount of variation in the output would be observed in the case that this term does not affect the output. This value is compared to the risk level (alpha) that you specify on the Anaysis Settings page of the control panel. If the p value is less than alpha, this source of variation is considered to have a significant effect on the output. In this case, the term and its p value will be displayed in red.

Variance Inflation Factor is a measure of the correlation, if any, between the term (predictor) and the other predictors. The lower the value, the less likely it is that the predictors are correlated. If the correlation of a predictor with other predictors is extremely high, that predictor should be removed from the model. If predictors are 100% correlated, they are aliased and will automatically be removed from the model. This will be noted directly above the Regression Table.

The Regression Equation information is presented using multiple tables.

Regression Equation

Additional Results

All of the following tables provide information that was generated from the main calculations. The available tables will vary depending on the design type you are working with. The results that could be available include:

Alias Structure

This item is available for all designs with at least two factors. It describes the alias structure for the design, taking into account only the terms you've selected to include in the analysis. Together with your engineering knowledge, you can use this information to help determine whether any important interaction information was lost due to aliasing. When aliased terms exist, the following areas will be shown:

- Terms selected to be in the model lists all the terms that are considered for inclusion in the regression model (i.e., the selections in the Select Terms window).
- Terms included in the model lists all the selected terms that are included in the model. The alias structure determines which terms are excluded.
- Alias Structure lists the aliased effects based on the selected terms. For example, A • B = A • B + C • D means the interaction effect A • B is aliased because it is indistinguishable from effect C • D. Therefore, the model cannot include both interaction terms; it will include only one (e.g., A • B).

Alias Summary

Var/Cov Matrix

Diagnostic Information

This table is available for one factor R-DOE designs and all other designs with two or more factors. It displays various analysis results for each run and highlights significant values. The following columns are included:

- Run Order is the randomized order, generated by the software, in which it is recommended to perform the runs to avoid biased results. Note that any changes made to the Run Order column on the Data tab will be reflected here.
- Standard Order is the basic order of runs, as specified in the design type, without randomization. Note that any changes made to the Standard Order column on the Data tab will be reflected here.
- Actual Value (Y) is the observed response value for the run, as entered in the response column on the Data tab.
- Predicted Value (YF) is the response value predicted by the model given the factor settings used in the run.
- Residual (or "regular residual") is the difference between the actual value (Y) and the predicted value (YF) for the run.
- Standardized Residual is the regular residual for the run divided by the constant standard deviation across all runs.
- Studentized Residual is the regular residual for the run divided by an estimate of its standard deviation.
- External Studentized Residual is the regular residual for the run divided by an estimate of its standard deviation, where the run in question is omitted from the estimation.
- Leverage is a measure of how much the run influences the predicted values of the model, stated as a value between 0 and 1, where 1 indicates that the actual response value of the run is exactly equal to the predicted value (i.e. the predicted value is completely dependent upon the observed value).
- Cook’s Distance is a measure of how much the output is predicted to change if the run is deleted from the analysis.

Values that are considered to be significant, or outliers, are displayed in red. For the residual columns, significant or critical values are those that fall outside the residual’s upper or lower bounds, calculated based on the specified alpha (risk) value.

The ReliaWiki resource portal has more information on how significant values are determined for the Leverage and Cook's Distance columns at: http://www.reliawiki.org/index.php/Multiple_Linear_Regression_Analysis.

Least Squares Means

Multiple Linear Regression Folio Analysis Results

Analysis Results

Additional Results

Related Topics and Links