Chapter 4 of 5

Evaluating the Model Fit

How well does the line explain the data?

We found the best line. But how good is it? We need a way to measure what the line explains vs. what it misses.

House prices vary a lot. Some of that variation is because of house size (which our line captures), and some is due to other factors we don't model.

To measure this, we split the total variation into two parts: what the line explains and what it doesn't.


Step 1: How much do prices vary overall? Measure each point's distance from the average price.

100200300400500600House Size (sqft)$0$50k$100k$150k$200k$250k$300kPrice ($)ȳ
TSS=(yiyˉ)2\text{TSS} = \sum(y_i - \bar{y})^2
TSS (Total Sum of Squares) 0

Each purple line shows how far a house's price is from the average. TSS is the sum of all those gaps, squared.


Step 2: How much does our line improve over the mean? The explained variation.

100200300400500600House Size (sqft)$0$50k$100k$150k$200k$250k$300kPrice ($)ȳOLS
SSR=(y^iyˉ)2\text{SSR} = \sum(\hat{y}_i - \bar{y})^2
SSR (Explained) 0

Each green line shows the distance between where our line predicts (ŷ) and the mean (ȳ). This is what the line adds over just guessing the average.


Step 3: What's left over? The gaps between predictions and actual values — our old friends, the residuals.

100200300400500600House Size (sqft)$0$50k$100k$150k$200k$250k$300kPrice ($)OLS
SSE=(yiy^i)2\text{SSE} = \sum(y_i - \hat{y}_i)^2
SSE (Unexplained) 0

Each red line is a residual — the part our line couldn't explain. These are the same errors from Chapter 2, now squared and summed.


TSS = SSR + SSE. The total variation splits neatly into explained + unexplained.

TSS=SSR+SSE\text{TSS} = \text{SSR} + \text{SSE}
TSS (Total) 68,833,829,694
Decomposition
SSR
SSE
SSR (explained) = 57,211,259,948
SSE (unexplained) = 11,622,569,746

Every point's total gap from the mean splits into two parts: the part the line explains + the part left over.


— the line explains this fraction of the total price variation.

R2=1SSETSS=SSRTSSR^2 = 1 - \frac{\text{SSE}}{\text{TSS}} = \frac{\text{SSR}}{\text{TSS}}
83.1%
0% (no fit) 100% (perfect fit)
R² = 0.0%

House size explains 83.1% of the variation in price. The remaining 16.9% is unexplained.


R² tells us the proportion explained. But how far off are our predictions in dollars?

SSE = 11,622,569,746 … but that number is in squared dollars ($ ²). Hard to interpret! We need to convert it back to regular dollars.

RMSE=SSEn=11,622,569,74615\text{RMSE} = \sqrt{\frac{\text{SSE}}{n}} = \sqrt{\frac{11,622,569,746}{15}}
SSE 11,622,569,746 dollars²
divide by n 774,837,983 mean squared error
√ (take root) $0 back to dollars
RMSE (Root Mean Square Error) $0

On average, our predictions are off by about $27,836. Unlike R², RMSE is in the same units as the data — so you can directly feel how big the errors are.

Built with SvelteKit + D3.js