Use of Regression to Calculate Sum of Squares
This appendix explains the reason behind the use of regression in Weibull++ DOE folios in all calculations related to the sum of squares. A number of textbooks present the method of direct summation to calculate the sum of squares. But this method is only applicable for balanced designs and may give incorrect results for unbalanced designs. For example, the sum of squares for factor in a balanced factorial experiment with two factors, and , is given as follows:
where represents the levels of factor
,
represents the levels of factor
, and
represents the number of samples for each combination of
and
. The term
is the mean value for the
th level of factor
,
is the sum of all observations at the
th level of factor
and
is the sum of all observations.
The analogous term to calculate in the case of an unbalanced design is given as:
where is the number of observations at the
th level of factor
and
is the total number of observations. Similarly, to calculate the sum of squares for factor
and interaction
, the formulas are given as:
Applying these relations to the unbalanced data of the last table, the sum of squares for the interaction
is:
which is obviously incorrect since the sum of squares cannot be negative. For a detailed discussion on this refer to
Searle(1997, 1971).
The correct sum of squares can be calculated as shown next. The
and
matrices for the design of the last table can be written as:
Then the sum of squares for the interaction
can be calculated as:
where is the hat matrix and
is the matrix of ones. The matrix
can be calculated using
where
is the design matrix,
, excluding the last column that represents the interaction effect
. Thus, the sum of squares for the interaction
is:
This is the value that is calculated by the DOE folio (see the first figure below, for the experiment design and the second figure below for the analysis).