Data Analysis/Least Squares Solution

Day 3

 

Objectives

The objective today will be for students to divulge into the least squares solution and what it actually means and how it is computed. Real world problems will be explored using Fathom to see if a relationship exists between variables. The main focus on this day is for students to explore using Fathom so that conjectures can be made on an application that the students create.


Goals

1. Students will be able to understand what is meant by least squares solution and how it is computed.

2. Students will develop a clear understanding of correlation and outliers.

3. Students will be able to create, analyze, explain and make conjectures about a scatter plot and what can be concluded from the information that it provides.

4. Investigations into real world applications will be explored.


Materials

1. Fathom software


1. Review what was learned on the previous day from the least squares solution and what it means and tells us. Make sure that students understand that it provides the equation of a line and we can derive the slope from the line that corresponds to the data set being explored. Ask the students to think about the following.

Questions for students:

a) What is an outlier and what can it do to a data set? Should outliers always be included when computing the least squares solution?

b) Give me an example of a relationship between any two attributes that would be easy to make predictions based off of data that was already collected?

Ask for responses to the questions posed and ask for explanations so that reasoning can be verified.

2. Finding the least squares solution will now be found by hand. The following data set will be brought to the screen and how the least squares solution is found will be explained. The following data set will be used to find the solution. The data points that are used are as follows: (60, 125), (64, 134); (72, 205); (63, 178); (66, 165); (67, 170); (74, 195).

 

 

3. The first step is to find the mean of the values that were inputted. It must be explained that the least squares solution and best-fit line will always go through these mean values.

Compute the mean of Weight - 125+134+205+178+165+170+195/7 = 167.43 (Y)

Compute the mean of Height - 60, 64, 72, 63, 66, 67, 74 = 66.57 (X)

Explain the equation for the line of least squares is found using the equation of a line y = mx + b where m is the slope. The slope is the key ingredient and can be found using the following:

m = [(X - x)(Y - y)]/(X - x)2

4. Prior to class type in the values needed to calculate the least squares solution. I have set up the following table to be able to calculate the line of the equation for the least squares solution. Least squares solution table.

We find the following sums: (X-x)^2 = 147.72; (Y-y)^2 = 5289.8; (X-x)(Y-y) = 743.7

Solving using these numbers we can plug back into the equation

So, m = 743.7/147.72 = 5.03. As we said before we know that the mean must be included to find the line so we can use this to find b.

m = 5.03

x = 66.57

y = 167.43

167.43 = 5.03 (66.57) + b

167.43 = 334.84 + b, b = -167.41

so an approximation of the equation of the line is y = 5.03x -167.41

As you can see from our approximation by hand we came up with 5.03 as opposed to the graph, which calculated a more accurate number of 5.00 and for the height we calculated - 167.41 but the graph calculated -165. Explain to the class that when calculating by hand and rounding that you will arrive at an approximation close to what the actual least squares solution. The reasoning behind using Fathom and maybe a calculator to find least squares solution is because data sets can be huge and it would be nearly impossible for these type of sets to accurately be calculated by hand.

5. Revisit the term outlier again by introducing a new student to the class. This new student is coming from California and has advanced ahead six grade levels and is considered a genius for his age. Here are the attributes for Jack, the new student in the class:

Jack has the following measurements: Arm span - 34.5; Wrist Circumference - 2.75; Forearm width - 3.25; and Height - 41.0. With the introduction of this new student the new box plot for height looks like this:

6. Ask the students what the single dot to the far left represents?. What does new student do to this group of data? Is this considered an outlier according to the graph? Should outliers always be included in the data set? Ask for feedback to these questions.

7. Talk more about outliers and give a definitive definition and explanation. You can tell the students that an outlier is an observation that is numerically distant from the rest of the data. Statistics that are derived from data sets that include outliers will often be misleading. For example, if one is calculating the average temperature of 10 objects in a room, and most are between 20-25° Celsius, but an oven is at 350° C, the median of the data may be 23 but the mean temperature will be 55. In this case, the median better reflects the temperature of a randomly sampled object than the mean. Outliers may be indicative of data points that belong to a different population than the rest of the sample set. However, a small number of outliers are expected in normal distributions.

7. Using the classroom laptops at their desk have the students create a small data set of their own. Make sure they create a collection, a table and then a graph. Ask students to include an outlier in the data set. Walk around the room to verify that the students have created an outlier and can see what happens when points in the table are changed.


Rationale

By having students understand the least squares solution we can complete this unit talking about linear models and what they can tell us. Correlation between variables can be validated and the strength of predictions can be verified. Also observing what outliers can do to a data set will enable students to see if this data should be included or not. The third day of the lesson is mainly for students to explore using Fathom and seeing what happens to graphs when data points change.


Assessment/Homework

Using Fathom students will be asked to develop a collection of data, develop a Fathom table of that data and graph the least squares solution. The least squares solution also will have to be calculated by hand. At least eight data points must be used and the data must be fairly accurate according to the attributes that are being used. Some examples will be given such as Year/100 m dash world record time, Income/Number of years of education and Population of a City/Number of schools. Also included must be three examples of an outlier that could occur in the data set and three predictions for data that was not included in the set that they established.

 

DAY ONE LESSON

DAY TWO LESSON

MAIN UNIT PAGE