One of the questions an instrutor dreads most from a mathematically
unsophisticated audience is, "What exactly is degrees of freedom?" It's
not that there's no answer. The mathematical answer is a single phrase,
"The rank of a quadratic form." The problem is translating that to an
audience whose knowledge of mathematics does not extend beyond high
school mathematics. It is one thing to say that degrees of freedom is an
index and to describe how to calculate it for certain situations, but
none of these pieces of information tells what degrees of freedom
**means**.

As an alternative to "the rank of a quadratic form", I've always enjoyed Jack Good's 1973 article in the American Statistician "What are Degrees of Freedom?" 27, 227-228, in which he equates degrees of freedom to the difference in dimensionalities of parameter spaces. However, this is a partial answer. It explains what degrees of freedom is for many chi-square tests and the numerator degrees of freedom for F tests, but it doesn't do as well with t tests or the denominator degrees of freedom for F tests.

At the moment, I'm inclined to define **degrees of freedom**
as **a way of keeping score.** A data set
contains a number of observations, say, *n*. They constitute
*n* individual pieces of information. These pieces of information
can be used either to estimate parameters or variability. In general,
each item being estimated costs one degree of freedom. The remaining
degrees of freedom are used to estimate variability. All we have to do
is count properly.

**A single sample:** There are *n* observations. There's one
parameter (the mean) that needs to be estimated. That leaves *n-1*
degrees of freedom for estimating variability.

**Two samples:** There are *n _{1}+n_{2}*
observations. There are two means to be estimated. That leaves

**One-way ANOVA with g groups:** There are

The primary null hypothesis being tested by one-way ANOVA is that the
*g* population means are equal. The null hypothesis is that there
is a single mean. The alternative hypothesis is that there are *g*
individual means. Therefore, there are *g-1*--that is *g
(H _{1})* minus

There is another way of viewing the numerator degrees
of freedom for the F ratio. The null hypothesis says there is no
variability in the *g* population means. There are *g* sample
means. Therefore, there are *g-1* degrees of freedom for assessing
variability among the *g* means.

**Multiple regression with p predictors:** There are

The null hypothesis tested in the ANOVA table is that all of
coefficients of the predictors are 0. The null hypothesis is that there
are no coefficients to be estimated. The alternative hypothesis is that
there are *p* coefficients to be estimated. herefore, there are
*p-0* or *p* degrees of freedom for testing the null
hypothesis. This accounts for the Regression degrees of freedom in the
ANOVA table.

There is another way of viewing the Regression degrees of freedom. The
null hypothesis says the expected response is the same for all values of
the predictors. Therefore there is one parameter to estimate--the common
response. The alternative hypothesis specifies a model with *p+1*
parameters--*p* regression coefficients plus an intercept.
Therefore, there are *p*--that is *p+1* (H_{1}) minus
*1* (H_{0})--regression degrees of freedom for testing the
null hypothesis.

Okay, so where's the quadratic form? Let's look at the variance of
a single sample. If * y* is an n by 1 vector of observations,
then

The number of degrees of freedom is equal to the rank of the n by n matrix