www.john-weber.com  

Chapter 3: Producing Data

Introduction

Recall in Chapters 1 and 2, we analyzed data using graphs and numerical summaries. Now, we want to discuss how to collect data to answer our own questions.

Definitions:

populationthe entire group of individuals from which we want information.
samplea part of the population used when population is too big. The sample MUST be representative of the population.

Types of studies:

observational studyresearcher observes individuals and measures variables of interest but does not attempt to influence the responses.
experimenta researcher deliberately imposes some treatment on individuals in order to observe their responses – used to understand cause–and–effect.

Two variables (explanatory variables or lurking variables) are confounded when their effects on a response variable cannot be distinguished from each other.

Section 3.1: Designing Samples

Samples are used instead of the population due to time restrictions, cost and convenience. We collect data on the sample and draw conclusions about the population.

Key ideas about sample design:

Simple random sample

This is a sample chosen by chance –. (see definition on p. 171 of text). This type of sample is not biased because all individuals have an equal chance to be chosen. The selection of the sample includes using random numbers.

Random number generators:

  1. Table B of text – we will NOT use this table in the course.
  2. Phone book!
  3. TI-83 calculator.

Choosing an SRS: (see class activity #8)

  1. Assign a numerical label to every individual in the population.
  2. Use a random number generator to select the labels at random.

Other sampling designs

  1. Probability sample – gives each member of the population a known chance to be selected. SRS is a probability sample.
  2. Stratified random sample – can provide more exact data about a population by selecting random samples in identified strata. Here are the steps:
    1. divide a population into groups (called strata) of similar individuals
    2. choose a SRS in each stratum
    3. combine all the SRSs to form a full sample
  3. Multistage sample.

SRS is a building block of more elaborate sampling designs. But the analysis of more complex designs is beyond the scope of this course.

Cautions about sample surveys

  1. In order to find a random sample of the entire population, we need to know the entire population. However, this is typically difficult. Thus, most samples suffer from some degree of undercoverage (i.e., when some groups of the population are left out of the sample).
  2. Nonresponses – occurs when individuals cannot be contacted or refuses to participate.
  3. Response bias (can be reduced by interviewer training):
    1. when an individual responds with what s/he thinks the interviewer wants
    2. when an individual lies (esp., about illegal or unpopular behaviour)
    3. when an individual relates past events as occurring more recently
  4. Wording effects – confusing or leading questions can introduce bias.

NEVER trust results of surveys until you read the exact questions and know the number of non-responses.

Inference about the population

Random samples eliminate some bias but it is unlikely that results of a sample are exactly the same as the entire population. The larger the sample, the more accurate the results.


Back to John Weber's MATH 1431 Page
Back to john-weber.com