While studying regression in my first year of PhD, me and my classmates were perplexed about AIC for a brief moment. Let us revisit AIC for a moment:

AIC, the Akaike Information Criterion, is a measure of risk used for model selection. The following are equivalent ways to explain what we want to do with AIC; We choose S (model) to

**(A)** maximize “goodness of fit” minus “complexity” (conceptual)

** (B)** minimize RSS/σ² with a complexity penalty. (commonly used and taught)

We were puzzled when we discovered that, with σ² estimated as s² = RSS/(n-1), the AIC criterion in the form of **(B)** from above suddenly becomes n-1 +2|S|, which grows exactly linearly in n and |S| with no upper bound! Wha..what? After a quick frenzy of googling and brainstorming, we found in R help documentation that:

“ `extractAIC`

uses for *-2 log L* the formulae *RSS/s – n* (corresponding to Mallows’ *Cp*) in the case of known scale *s* and *n log (RSS/n)* for unknown scale.“

Then, we had to think for moment. The truth revealed:

It turns out the -2*log(Likelihood) assuming normal errors contains the term **n*log(σ)** that we seem to commonly ignore because it does not involve anything about the fit, or because we often assume **σ** is known. However, for the normal model where we estimate σ² using RSS/(n-1), the commonly used AIC criterion above **(B)** is useless, and we must consider the entire expression for AIC shown in **(C)**.

This is a simple problem, but it itches me. I have not yet consulted anyone about this, although I suspect there is a better reason behind the common dismissal of the n*log(σ) term. (Because AIC for normal models is used everyday, all day.)

### Like this:

Like Loading...

*Related*