www.john-weber.com
Chapter 3: Producing Data
Introduction
Recall in Chapters 1 and 2, we analyzed data using graphs and numerical summaries. Now, we
want to discuss how to collect data to answer our own questions.
Definitions:
population | the entire group of individuals from which we want information. |
sample | a part of the population used when population is too big. The sample
MUST be representative of the population. |
Types of studies:
observational study | researcher observes individuals and measures variables of interest
but does not attempt to influence the responses. |
experiment | a researcher deliberately imposes some treatment on individuals in order
to observe their responses – used to understand cause–and–effect. |
Two variables (explanatory variables or lurking variables) are confounded when their effects on a
response variable cannot be distinguished from each other.
Section 3.1: Designing Samples
Samples are used instead of the population due to time restrictions, cost and convenience. We collect data
on the sample and draw conclusions about the population.
Key ideas about sample design:
- sample design is the method used to choose the sample from the population. Poor designs produce misleading
conclusions.
- A design is biased if it systematically favors certain outcomes.
- Poor designs:
- voluntary response sample: consists of people who choose themselves by responding to a general appeal. An
example is Wolf Blitzers question of the day. These
are biased because people with strong opinions are most likely to respond.
- convenience sample: consists of people who are easy to reach. This is also biased depending on the
characteristics of the sample.
Simple random sample
This is a sample chosen by chance –. (see definition on p. 171 of text). This type of sample is not
biased because all individuals have an equal chance to be chosen. The selection of the sample includes using
random numbers.
Random number generators:
- Table B of text – we will NOT use this table in the course.
- Phone book!
- TI-83 calculator.
Choosing an SRS: (see class activity #8)
- Assign a numerical label to every individual in the population.
- Use a random number generator to select the labels at random.
Other sampling designs
- Probability sample – gives each member of the population a known chance to be selected. SRS is a
probability sample.
- Stratified random sample – can provide more exact data about a population by selecting random
samples in identified strata. Here are the steps:
- divide a population into groups (called strata) of similar individuals
- choose a SRS in each stratum
- combine all the SRSs to form a full sample
- Multistage sample.
SRS is a building block of more elaborate sampling designs. But the analysis of more complex designs is
beyond the scope of this course.
Cautions about sample surveys
- In order to find a random sample of the entire population, we need to know the entire population. However,
this is typically difficult. Thus, most samples suffer from some degree of undercoverage (i.e., when some
groups of the population are left out of the sample).
- Nonresponses – occurs when individuals cannot be contacted or refuses to participate.
- Response bias (can be reduced by interviewer training):
- when an individual responds with what s/he thinks the interviewer wants
- when an individual lies (esp., about illegal or unpopular behaviour)
- when an individual relates past events as occurring more recently
- Wording effects – confusing or leading questions can introduce bias.
NEVER trust results of surveys until you read the exact questions and know the number of non-responses.
Inference about the population
Random samples eliminate some bias but it is unlikely that results of a sample are exactly the same as
the entire population. The larger the sample, the more accurate the results.
Back to John Weber's MATH 1431 Page
Back to john-weber.com