COMMON MISTEAKS MISTAKES IN USING STATISTICS: Spotting and Avoiding Them

Introduction Types of Mistakes Suggestions Resources Table of Contents About Glossary Blog

Overinterpreting High R²

1. Just what is considered high R² varies from field to field. In many areas of the social and biological sciences, an R² of about 0.50 or 0.60 is considered high. Yet Cook and Weisberg¹ give an example of a simulated data set with 50 predictors and 100 observations, where the response was independent of all the predictors (so all regressors have coefficient zero in the true mean function), but R² = 0.59.²

2. High R² can also occur when overfitting. The R² for the example of overfitting by a quartic curve was 1.00, since the curve went through all the points.

3. A regression model giving apparently high R² may not be as good a fit as might be obtained by a transformation. For example, fitting a linear regression to the following data (DC output of a windmill vs windspeed) will give R² = 0.87. (Data in blue, regression line in red.)

Windmill DC output vs windspeed, with regression line

This for some purposes might be a good enough fit. However, since the data indicate a clear curved trend, it is likely that a better fit can be found by a suitable transformation. Since the predictor windspeed is a rate (miles per hour), one possibility is that the reciprocal (hours per mile) might be a natural choice of transformation. Trying this gives R² = 0.98, and the plot below shows that indeed a linear fit for the transformed data makes more sense than for the untransformed data.

DC outpuy vs hours per mile, plus regression line, showing very good fit.

Notes:
1. R.D. Cook and S. Weisberg (1999), Applied Regression Including Computing and Graphics, Wiley, p. 281.
2. The p-value of the F-statistic for significance of the overall regression was 0.13, however. But six of the terms were individually significant at the .05 level.

Last updated June 13, 2014

COMMON MISTEAKS MISTAKES IN USING STATISTICS: Spotting and Avoiding Them

Introduction Types of Mistakes Suggestions Resources Table of Contents About Glossary Blog

Overinterpreting High R2

Overinterpreting High R²