__Day 1__

__Descriptive Statistics Review:__

**Shape**: Mound
Shaped, Uniform, Bimodal, Symmetrical, and Skewed (left or right)

**Center**: Mean,
Median and Mode.

**Mean**: Total of
the sum of the numbers/ Total number of entries, for example for the data set of the following: 4,5,6,8,12
the mean would be 4+5+6+8+12=35/5= 7

**Median**: The middle
of the data. For example if your sample has n=30 then your median
value is the 15th number. Since our data set above has an odd
number of values we take the middle value of 6. If we had an even
amount of data values then we would take the middle two numbers
add them together and divide by two.

**Mode**: The number
that appears the most often within the sample.

What is the mode of the following data set: 5,6,7,7,8? 7 is the correct answer because it is the number that appears the most within the set.

__Day 2__

** Spread :**
Range, Standard Deviation and Variance.

**Range**: the difference
between the maximum value of the sample and the minimum value
of the sample.

**Standard Deviation**:
measures how much each value in a set of data differs from the
mean.

__Day 3__

** Data set:**
a data set consisting of observations on a single attribute in
a univariate data set. A

__Types of Data:__

__1. Categorical: __if
the individual responses are categorical responses. For example
a person's hometown would be a categorical data set.

__2. Numerical:__
If each observation is a number. So the GPA of each individual
would be an example of a numerical response.

In some studies, you will focus on two different
attributes. For example, the GPA for someone and the number of
classes they are enrolled for that person might be recorded for
each individual in a group. The resulting data set would consist
of a pair of numbers, such as (3.4,4). This is called the **bivariate
data set**.

__Day 4__

__Types of Samples:__

1. __Simple Random Sample: __a sample chosen
in such a way that every sample of n objects has an equal chance
of being selected.

2. __Stratified Random Sample:__ Divide
population into different strata. Select a simple random sample
for each strata. (for example: Divide the USA into different regions
as strata and then use a simple random sample to choose each state.)

3. __Systematic Sample: __Sampling every
Kth item (for example: using every 10th item within the sample.)

__Day 5__

__Types of Bias within Sampling:__

__Bias: __some
part of the sample population that is systematically favored.

__1. Selection Bias: __Tendency
for samples to differ from the corresponding population as a result
of systematic exclusion of some part of the population.

__2. Measurement Bias: __Tendency
for samples to differ from the corresponding population because
the method of observation tends to produce values that differ
from the true value.

__3. Nonresponse Bias:__
Tendency for samples to differ from the corresponding population
because the data is not obtained from all individuals selected
for inclusion in the sample.

__4. Response Bias:__
Sample where the survey is a leading question.

Homework Problems for Sampling

__Day 6__

** Displaying Catergorical Data:** (1) Frequency Distributions (2) Bar charts (3) Pie
charts

** (1)Frequency Distribution for categorical
data: **is a table that displays
the possible categories along with the associated frequencies
or relative frequencies.

The **Frequency** for a particular category
is the number of times the category appears in the data set.

The **Relative Frequency** for a particular
category is the fraction or proportion of the time that the category
appears in the data set. It is calculated as:

relative frequency= frequency/# of observations in the data set

** 2. Bar Charts:**
use when constructing categorical data.

a. Draw a horizontal line, and write the category names or lables below the line at regularly spaced intervals.

b. Draw a vertical line, and label the scale using either frequency or relative frequency.

c. Place a rectangular bar above each categorical label. The height is determined by the category's frequency, and all bars should have the same width. With the same width, both the height and the area of the bar are proportional to the relative frequency.

** 3. Pie Charts:**
use with Categorical data with a relatively small number of possible
categories. Pie charts are most useful for illustrating proportions
of the whole data set for various categories.

Construct by:

1. Drawing a circle to represent the entire data set.

2. For each category, calculate the "slice"size. This is done by computing slice size equal to category relative frequency multiplied by 360.(since there are 360 degrees in a circle)

3. Draw a slice of appropriate size for each category. This can be tricky, so most pie charts are generated using a graphing calculator or a statistical software package.

__Day 7__

__Probability__

** P(E):**
number of outcomes favorable to E/number of outcomes in the sample
space

__Basic properties of Probability:__

1. For any event E, 0<P(e)<1

2. If S is the sample space for an experiment, P(s)=1

3. It two events E and F are disjoint, then P(E or F)=P(E) +P(F)

4. For any event E, P(E) + P (not E) =1 so

P(not E)=1 -P(E) and P(E)=1-P(not E)

**Counting Principle**:
If you can do one task in m ways and for each of these, you can
do another task n ways, then the number of the ways the two tasks
can be done is m*n ways then mn ways= m * n ways

**Combinations (order does matter)** : Number of ways you can select a committee. For example:
( 5 choose 3)

**Permutations (order doesn't matter) **: For Example 5*4*3=60 possible choices for the committee.

__Day 8__

**Probability of Mutually Exclusive Events:
**If two events, A and B are mutually
exclusive, then the probability that either A or B occurs is the
sum of their probabilities.

P(A or B)= P(A) +P(B)-P(A and B)

**General Multiplication Rule:** P(E/F)= P ( E and F)/P(F)

For any two events E and F, P(E and F)=P(E/F)P(F)

__Conditional Probability__

P(A/B) read the Probability of A given B. =P(A and B) divided by the P(B).

P (A)= .7 P(B)=.6

P(A and B)=.54

P(A/B)=.54/.70=.771

Homework Problems for Probability

__Day 9__

A **Confidence Interval** for a population
characteristic is an interval of plausible values for the characteristic.
It is constructed so that, with a chosen degree of confidence,
the value of the characteristic will be captured inside the interval.

The **Confidence Level** associated with
a confidence interval estimate is the success rate of the method
used to construct the interval.

__Large Sample Confidence Interval for
Pie__

The general formula for a confidence interval for a population proportion pie when:

1. p is the sample proportion from a random sample, and

2. is the sample size n is large (np>10) and n(1-p)>10 is

p + or - (z critical value) (square root of p(1-p)/n)

The desired confidence level determines which Z critical value is used. The three most commonly used confidence intervals are 90%, 95%, and 99%, use Z critical values 1.645,1.96,and 2.58.

__One sample Z Confidence Interval for
Mu__

The general formula for a confidence interval for a population mean mu when:

1. x is the sample mean from a random sample

2. the sample size n is large (generally n.>30), and

3. sigma, the population standard deviation is known is

x plus or minus (z critical value) (sigma/square root of n)

__Day 10__

__Hypothesis Testing__

A test of **hypotheses** is a method for
using sample data to decided between wo competing claims(hypotheses)
about a population characteristic.

Mu=1000 where mu is the mean number of characters in an email message.

pi is less than .01 where pi is the proportion of e-mail messages that are undeliverable.

One hypotheses might be mu=1000 and the other mu is not equal to 1000.

The **Null Hypothesis, **denoted Hnot, is
a claim about a population characteristic that is initially assumed
to be true.

The **Alternative Hypothesis, **Denoted
Ha, is the competing claim.

In doing a test of Hnot versus Ha, the hypothesis Hnot will be rejected in favor of Ha, only if sample evidence strongly suggests that Hnot is false. If the sample does not contain such evidence, Hnot will not be rejected. The two possible conclusions are then Reject Hnot or fail to reject Hnot.

The form on a null hypothesis is:

Hnot: population characteristic=hypothesized value where the hypothesized value is a specific number determined by the problem context.

The alternative hypothesis will have one of the following three forms:

1. Ha: population characteristic > hypothesized value

2. Ha: pop. char. < hyp. value

3. Ha: pop. char not equal to hyp. value.

__Errors in Hypothesis Testing:__

**Type I error**:
the error of rejecting Hnot when Hnot is true.

**Type II error**:
The error of failing to reject Hnot when Hnot if false.

The probability of a **type I error** is
denoted by **Alpha** and is called the level of significance
of the test. Thus, a test with alpha =.01 is said to have a level
of significance of .01 or to be a level .01 test.

The probability of a **type II error** is
denoted by **B.
**