Chapter 4 of 5

Evaluating the Model Fit

How well does the line explain the data?

We found the best line. But how good is it? We need a way to measure what the line explains vs. what it misses.

Market values vary a lot. Some of that variation is because of annual salary (which our line captures), and some is due to other factors we don't model.

To measure this, we split the total variation into two parts: what the line explains and what it doesn't.

Step 1: How much do market values vary overall? Measure each point's distance from the average market value.

\text{TSS} = \sum(y_i - \bar{y})^2

TSS (Total Sum of Squares) 0

Each purple line shows how far a player's market value is from the average. TSS is the sum of all those gaps, squared.

Step 2: How much does our line improve over the mean? The explained variation.

\text{SSR} = \sum(\hat{y}_i - \bar{y})^2

SSR (Sum of Squares due to Regression) 0

Each green line shows the distance between where our line predicts (ŷ) and the mean (ȳ). This is what the line adds over just guessing the average.

Step 3: What's left over? The gaps between predictions and actual values — our old friends, the residuals.

\text{SSE} = \sum(y_i - \hat{y}_i)^2

SSE (Sum of Squared Errors) 0

Each red line is a residual — the part our line couldn't explain. These are the same errors from Chapter 2, now squared and summed.

TSS = SSR + SSE. The total variation splits neatly into explained + unexplained.

\text{TSS} = \text{SSR} + \text{SSE}

TSS (Total) 914

Decomposition

SSR

SSE

SSR (explained) = 732

SSE (unexplained) = 182

Every point's total gap from the mean splits into two parts: the part the line explains + the part left over.

R² — the line explains this fraction of the total market value variation.

R^2 = 1 - \frac{\text{SSE}}{\text{TSS}} = \frac{\text{SSR}}{\text{TSS}}

80.1%

0% (no fit) 100% (perfect fit)

R² = 0.0%

Annual salary explains 80.1% of the variation in market value. The remaining 19.9% is unexplained.

R² tells us the proportion explained. But how far off are our predictions in millions of euros?

SSE = 182 … but that number is in squared millions (M€²). Hard to interpret! We need to convert it back to regular millions.

\text{RMSE} = \sqrt{\frac{\text{SSE}}{n}} = \sqrt{\frac{182}{15}}

√ (

SSE 182 M€²

n 15 mean squared error

) =

RMSE (Root Mean Square Error) €0.0M back to M€

On average, our predictions are off by about €3.5M. Unlike R², RMSE is in the same units as the data — so you can directly feel how big the errors are.