Using Technology to Teach

Correlation and Linear Regression


Objective:

1. Students will be able to analyze the correlation between data using the TI-82 and GSP.

2. Students will be able to investigate the regression line of a set of data using the TI-82 and GSP.

Materials:

TI-82 or TI-83

Graph Paper

Straight Edge

Warm-up:

A scientist records three sets of data from a set of experiments. He wants to know if the data in each set are related in some way. See if you can help him out.

1. Plot the points (1, 3), (3, 7), (1/2, 2), (0, 1), (-1, -1).

2. Plot the points (2, 2), (2, 3), (1, 1), (3, 2), (3, 4) on a separate graph.

3. Plot the points (1, 7), (3, 1), (5, 5), (8, 1), (9, 6), (7, 8), (2, 3), (10, 0), (5, 3) on a separate graph.

4. Answer the following questions for 1, 2, and 3. Does each set of points appear related in some way? If so, how? (Hint: Try a line)

Activities:

Correlation: a linear relationship in which each implies or compliments the other

For example, all of the points in data set 1 are on the line y = 2x+1. All of the points in data set 2 are near the line y = 2x-2. If we choose a particular x, we can predict the value of the y for data set 1 by plugging that x into y = 2x+1. We can do the same for data set 2 by plugging the given x into y = 2x-2.

1. Discuss as a group how you would measure how strong a correlation is?

Correlation Coefficient: a numerical measure (denoted by r) of how strong a correlation is (i.e. a measure of how strongly related the x and y variables are; how close the data is to lying on a line)

Properties of the Correlation Coefficient, r:

The value of r is always between -1 and 1. -1 < r < 1.

The magnitude of r indicates the strength of a linear relation whereas it's sign indicates it's direction. The closer the magnitude of r is to 1, the stronger the relation between the data. The closer the magnitude of r is to 0, the weaker the relation between the data. If the sign on r is negative, the points in the data are related by a line with negative slope. If the sign on r is positive, the points in the data are related by a line with positive slope.

Formula for the Correlation Coefficient, r:

2. Find the correlation coefficient for data set 1 using the above formula.

3. Find the correlation coefficient for data set 2 using your TI-82.

4. Find the correlation coefficient for data set 3 using your TI-82.

5. Do our values for r satisfy the properties of the correlation coefficient? Do they confirm what we said earlier about the strength of the relations? Now that we know how strong these relations are, how would we find the lines that relate each of them?

Regression Line: a straight line that can be used to predict the value of an unknown y for a given x; the line that best fits the data

6. Using GSP, predict the regression equation for data set 2. What do you notice about the sum of the distances between the data points and this line?

7. Using GSP, predict the regression equation for data set 3. What do you notice about the sum of the distances between the data points and this line?

Equation for the Regression Line (Equation of the line fitted by the least squares):

y^ = B^0 + B^1X

x is the given value (the independent variable). B^0 is the y-intercept of the regression line. B^1 is the slope of the regression line.

8. Calculate and graph the regression line for data set 2 using the formula. How does this compare with your prediction in question 6?

9. Calculate and graph the regression line for data set 3 using your TI-82. How does this compare with your prediction in question 7?

Assessment:

A researcher wants to know if there is a relationship between age and the amount of television a person watches per day. He conducts a survey of different age groups and the average amount of television they watch and recorded the data in the chart below. The researcher didn't take statistics in high school so he doesn't know how to find the relation between his data. See if you can help him out. First construct a scatterplot of the data. Then determine how well the data is related by calculating the correlation coefficient. Determine how the data is related by calculating and graphing the regression line. First use the formulas from class to calculate the correlation coefficient and the regression line and then use your TI-82 to check your answer and view the graph.

 Age, x

Average hours of television viewed per day

 18

3.9

24

2.6

36

2

40

2.3

58

1.2


Return to Portfolio