Predicting Home Sales Price in the Midwest (Part 1)
- Griffen Herrera
- Feb 22, 2023
- 4 min read
Updated: Feb 23, 2023
The scenario of this project is a city tax assessor is interested in creating a model to predict a residential home sale price in Midwestern cities based on the characteristics of the home and surrounding property.
The dataset collected is based on 522 transactions of home sales during 2002.
Response variable (Y) is the sale price of a residence.
Explanatory variable (X1) is the number of bedrooms.
Explanatory variable (X2) is the number of bathrooms.
Explanatory variable (X3) is the size of the garage.
We are now going to look at the model estimation and my interpretation based off the results.

β0 = -45,886.3
β 1 = 935.4 (partial slope of bedroom number)
β 2 = 67,818.9 (partial slope of bathroom number)
β 3 = 67,332.3 (partial slope of garage size)
Below shows the estimated regression model in equation form.

Some take aways based off this equation is for every additional bedroom the price of the residence is expected to increase by $935.40 while keeping bathrooms number and garage size constant. Meanwhile for every additional bathroom the price of the residence is expected to increase by $67,818.90 while keeping bedrooms number and garage size constant. Lastly for the increase in the garage size the price of the residence is expected to increase by $67,332.30 while keeping bedrooms number and bathrooms number constant.
There is 54% of variability (based off of the adjusted R squared value) in the price of residence is explained by number of bathrooms, number of bedrooms and garage size.
From the results I got in R it is predicted the sale price for 3 bedrooms, 3 bathrooms and 2-car garage house to be around $295,041.20. While calculating the confidence intervals for sales price, I am 95% confident that the true mean sale price for a house with 3 bedrooms, 3 bathrooms and 2-car garages are likely to be in between $284,025.70 and $306,056.60.

Based off the results below I am 95% confident that the prediction interval for the sales price of the house with 3 bedrooms, 3 bathrooms and 2-car garages are likely to be in between $111,422.3 and $478,660.

In order to find out which individual variable has the significance in driving the price of a house up, there must be hypothesis testing to find this insight. First I will test the significance of each partial slope of each variable at α=0.05. See the results below.

Hypothesis Testing:
H0: β1=0, Ha: β1≠0
H0: β2=0, Ha: β2≠0
H0: β3=0, Ha: β3≠0
H0: β1=β2= β3 =0, Ha: βj≠0
Based on the T-test result, we obtained a T-statistic = 0.188 < T(0.975, 518)=1.965, and p-value=0.8507, which is much larger than α=0.05. I don't have enough evidence to reject H0, so the bedroom number is not a significant predictor in the full model. For the bathroom variable we use the same T-test result, I obtained a T-statistic = 13.168 > T(0.975, 518)=1.965, and p-value less than 2 e^-16, which is much smaller than α=0.05. I ultimately have enough evidence to reject H0 and conclude Ha, the number of bathrooms is a significant predictor. Lastly for the garage size variable we use the same T-test result, I obtained the T-statistic=9.383 > T (0.975, 518) =1.965, and p-value less than 2 e^-16, which is much smaller than α=0.05. I have enough evidence to reject H0 and conclude Ha, the garage size is a significant predictor.
Now I will conduct the F-Test for overall model significance. Based off the results above I have F(0.05,3, 518)= 2.622 and F-statistic=206.9, which is much larger than F-critical value. Furthermore, with P-values reaching 0, which is much smaller than alpha=0.05. I have enough evidence to reject the null hypothesis and conclude Ha. The overall model is significant.
Then I start to question are the number of bathrooms (X2) and garage size (X3) jointly significant in the MRL model? While using a partial F-test and choosing α=0.05 we shall see what the answer will be.

Hypothesis Testing:
H0: β2= β3 =0
Ha: β2 or β3≠0
Based on the result above of a partial F-test, we obtained P-value=2.2e^-16, which is much smaller than alpha=0.05, I have enough evidence to reject H0 and concluded Ha, the number of bathrooms and garage size are jointly significant in this model. From these results I want to take a closer look at the number of bedrooms as the predictor in a restricted model.

Looking at the results above I see the partial slope of the bedroom number was found to be insignificant (P-value=0.8507>Alpha=0.05) with T-test in the overall model. However, when an individual T-test is performed with just bedroom number (X1) as predictor with the sale price (Y), bedroom number found to be a significant predictor (P value <2.2e^-16 smaller than alpha=0.05). Now I want to see all of the variables in a visualization and hopefully it will show some insight.

Based off the scatterplot above I see the number of bedrooms and number of bathrooms appeared to be more correlated. To confirm that hunch I will now look at the correlation matrices to see if there is an actual correlation between those two variables.

Overall I see that number of bedrooms is highly correlated to the number of bathrooms. Lastly I want to see if removing the number of bathrooms from the model would have an effect on the significance on the other variables.

By removing the bathroom data from the model, number of bedrooms now appears to be significant in the model (P=4.19e^-12< alpha=0.05) and the overall model also appears to be significant (P=2.2e^-16< alpha=0.05).
In conclusion, there is evidence to support that the number of bedrooms, the number of bathrooms and the size of the garage have a significant positive influence on the price of a residence collectively in Midwest cities based on the data provided. There was a multicollinearity problem with the model; however, by removing one of the highly correlated predictors (number of bathrooms), the resulting predictive model (with just bedroom numbers and garage size) is more reasonable.
Comments