7964
Lecture 8


In Lecture 7, we discussed the binomial and the poisson probability distributions. Here, we'll discuss random variables, as well as the all-important Normal Distribution.



I. Discrete and Continuous Random Variables


II. The Normal Distribution

In reality, of course, a uniform distribution is unusual, and generally would not represent interesting questions.

Let's consider the normal distribution.

The following (taken from Hale) is a normal distribution:



The normal distribution is actually a family of individual normal distributions. Each normal distribution looks different--in terms of peak and spread--based on two properties or parameters.


How does the normal distribution relate to the binomial distribution? Well, if X is a binomial random variable with a large n, then X is also approximately a normal random variable. In other words, the normal distribution approximates the binomial distribution.


III. The Empirical Rule for the Normal Distribution

According to the empirical rule, and as illustrated above, all normal density curves satisfy the following property:

Also,

Click here to see an example of a normal distribution: heights of American women.


IV. Z-scores

Z-scores are essentially standardized scores for X (given that X has a normal distribution).


V. Statistics and Sampling Distributions

Let's step back and define a few terms, before we go on to discuss why the normal distribution is so very important.


Sampling Distributions for Sampling Proportions

Suppose we conduct a binomial experiment with n trials, and get successes on x of the trials. Or, suppose we measure a categorical variable for a representative sample of n individuals, and x of them have responses in a certain category. In each case, we can calculate p = the sample proportion, = x/n. For the first example, it's the percentage of trials that had that particular outcome (i.e., the percentage of red cars out of 200 cars); in the second case, it might be the proportion of individuals who answered "yes" to a question about support for the President.

Another example: suppose we wanted to know what proportion of a large population carries the gene for a certain disease. We could sample 25 people, and use the sample proportion from that sample to estimate the true parameter -- the true proportion. Suppose that in reality, in truth, 40% of the population carries the gene.

Consider four different samples of 25 people taken from this population. Remember that we are trying to estimate the proportion of the population with the gene (that is, we are trying to estimate the parameter proportion), based on the sample statistic or sample proportion. We do not know the population proportion (that is, we do not know the parameter). Here is what we would have concluded about the proportion of people who carry the gene, given four possible samples with X as specified:

In practice, only one sample is collected--and there is no way to tell determine for sure whether or not the sample is an accurate reflection of the population. However, researchers have claculated what to expect for the vast majority of possible samples.

VI. Normal Curve Approximation Rules: Sample Proportions & Means