www.john-weber.com  

Chapter 2: Examining Relationships

Section 2.4: Cautions about Correlation and Regression

Correlation and regression are powerful tools for describing the relationship between two variables. We must be aware of the limitations:

  1. Correlation and regression describe only linear relationships.
  2. Correlation and regression are NOT resistant measures.
  3. Always graph data before interpreting correlation and regression

Other cautions include:

Extrapolation

Extrapolation is the use of a regression line for prediction outside the range of values of the explanatory variable. However, few relationships are linear for all values of x.

Using averaged data

Correlations based on averaged data are usually too high when applied to individuals. You should NOT apply the results of averaged data to individuals.

Lurking variables

A lurking variable is a variable that has an important effect on the relationship among the variables in a study but is not included among the variables studied. Correlation and regression can be misleading if you ignore important lurking variables.

Association is not causation

Correlation is not causation. An association between an explanatory variable x and a response variable y, even if it is very strong, is not by itself good evidence that changes in x actually cause changes in y. In some cases, a strong association may be explained by a lurking variable and not by some connection between x and y.

The best way to get good evidence of causation is to do an experiment (see section 3.2).

There are some criteria for establishing a strong case for causation (but not foolproof) when an experiment cannot be done:

  1. The association is strong.
  2. The association is consistent over many studies.
  3. Higher values of the explanatory variable are associated with higher values of the response variable.
  4. The alleged cause precedes the effect in time.
  5. The alleged cause is probable.


Back to John Weber's MATH 1431 Page
Back to john-weber.com