# Math 225

## Introduction to Biostatistics

### Problems in Lecture #7

1. The Poisson distribution applies when the random variable is a count of random events where:
1. No two events can occur at exactly the same place (or time).
2. The number of events in disjoint regions are independent.
3. The distribution of the number of events in a region is proportional to the size of the region.

2. The Poisson distribution is useful for approximating the binomial distribution when p is small.

3. The Poisson distribution has a single parameter, mu.

4. The Poisson probability of exactly x events is:

P(X=x) = e^(-mu) * mu^x / x! for x = 0, 1, 2, ...

5. The mean of the Poisson distribution is mu. If you graph the discrete distribution, it balances at mu.

6. The standard deviation of the Poisson distribution is sqrt(mu). A "typical" distance between a realized observation and "mu" is about the size of the standard deviation.

7. #### Poisson Problems

In a laboratory experiment, there is a small chance of a bacterial colony appearing at a precise location on an agar plate. Furthermore, locations of different bacterial colonies are independent of each other. If you expect on average to see 0.02 bacterial colonies per cm2, how likely is it to find 0, 1, 2, ... bacterial colonies on a 100 cm2 agar plate?

Solution: The number bacterial colonies may have a Poisson distribution.

The expected number, or mean is 100*(0.02) = 2.

P(X=0) = e^(-2) * 2^0 / 0! = 0.1353

P(X=1) = e^(-2) * 2^1 / 1! = 0.2707

P(X=2) = e^(-2) * 2^2 / 2! = 0.2707

P(X=3) = e^(-2) * 2^3 / 3! = 0.1804

and so on.

8. In the previous problem, what is the balancing point (mean or expected value) of the distribution? What is the size of a typical deviation from this mean (standard deviation)?

Solution: The mean is 2, the standard deviation is sqrt(2) = 1.414.

9. #### Normal Distribution

The normal distribution is the continuous probability distribution popularly known as the ``bell-shaped curve''. For continuous random variables, probability is represented by the area under a curve. The total area under the curve is 1. The probability that a random variable falls between numbers a and b is the area under the curve between a and b.

10. #### Parameters of the Normal Distribution

The normal distribution is described by two parameters. The mean "mu" is the location of the balancing point of the distribution, which by symmetry, is also the median. The standard deviation "sigma" represents the distance from the mean to either point where the curve becomes steepest, one below the mean and one above.
11. #### The Standard Normal Curve

The standard normal curve is the normal distribution with mean mu=0 and standard deviation sigma=1. Probabilities for any normal distribution may be determined from the standard normal curve by the standardization formula

`z = (x-mu)/sigma`.

In particular, the probability a normal random variable `x` with mean mu and standard deviation sigma is between numbers `a` and `b` is equal to the probability that a standard normal random variable `z` is between `(a-mu)/sigma` and `(b-mu)/sigma`.

12. #### The 68-95-99.7 Rule

It is useful to keep a few benchmark figures in mind.
1. For any normal curve, the area within one standard deviation of the mean is about 68%.
2. For any normal curve, the area within two standard deviations of the mean is about 95%.
3. For any normal curve, the area within three standard deviations of the mean is about 99.7%.

13. #### The Standard Normal Table

The standard normal table tells the area under a normal curve to the left of a number z. The number z is rounded to two decimal places. The ones place and the tenths place are listed at the left side of the table. The hundredths place is listed across the top. Areas are listed in the center of the table.

The normal table here does not have information for negative z values. Because of symmetry, this is not necessary.

14. #### Sample Standard Normal Table Calculations

It is a good idea to sketch a normal curve and shade in the desired area. Let the sketch guide the arithmetic.

Area to the left of a positive z.
P(Z < 2.05) = 0.9798.

Area to the left of a negative z.
P(Z < -2.05) = P(Z > 2.05) = 1 - 0.9798 = 0.0202.

Area between two positive z scores.
P(1.23 < Z < 2.05) = P(Z < 2.05) - P(Z < 1.23) = 0.9798 - 0.8907 = 0.0891.

Area between a positive and a negative z score.
P(-1.23 < Z < 2.05) = P(Z < 2.05) - P(Z < -1.23) = P(Z < 2.05) - P(Z > 1.23) = 0.9798 - (1 - 0.8907) = 0.8705.

Area outside two z scores.
P(|Z| > 1.23) = P(Z < -1.23) + P(Z > 1.23) = 2*P(Z > 1.23) = 2*(1 - 0.8907) = 0.2186.

15. #### Using the normal table backwards.

Percentiles.
Find the number z such that the area to the left of z is 0.7000.

The area to the left of z=0.52 is 0.6985 and the area to the left of z=0.53 is 0.7019.

The z-score we want is about half-way in between these, say z = 0.525.

Percentiles for a negative z.
Find the number z such that the area to the left of z is 0.1000.

This z will be negative. The corresponding positive z has an area to the right of 0.1000, or an area to the left of 0.9000.

The closest such z is z=1.28.

Therefore, the 10th percentile is at about z=-1.28.

Finding the cut-offs of a middle area.
Find the number z such that the area between -z and z is 0.7000.

The middle 70% leaves 30% left over, half on each side. Thus -z is at the 15th percentile and z is at the 85th percentile.

The closest such z is z=1.04.

Therefore, the area between -1.04 and 1.04 is about 70%.

16. #### Problems on other normal distributions.

To solve any problem for an arbitrary normal curve, translate it to a problem with the standard normal curve. Standardize with the formula

z = (x-mu)/sigma

or

x = mu + z*sigma

For these problems, assume that mu = 500 and sigma = 100.

Area to the left.
P(X < 350) = P(Z < (350-500)/100) = P(Z < -1.50) = P(Z > 1.50) = 1 - 0.9332 = 0.0668.

Area between two values.
P(450 < X < 720) = P((450-500)/100 < Z < (720-500)/100) = P(-0.50 < Z < 2.20) = P(Z < 2.20) - P(Z > 0.50) = 0.9861 - (1-0.6915) = 0.6776.

Area outside two values.
P(|X-500| > 161) = P(X < 339) + P(X > 661) = P(Z < (339-500)/100) + P(Z > (661-500)/100) = P(Z < -1.61) + P(Z > 1.61) = 2*P(Z > 1.61) = 2*(1-0.9463) = 0.1074.

Percentiles.
Find the value x that cuts off the top 40% of the values.

If x cuts off the top 40%, it cuts off the bottom 0.6000 area.

The z score is close to 0.25, so the x score is 0.25 standard dviations above the mean.

x = 500 + 0.25(100) = 525.

Middle area.
Find the values x and y that cut off the middle 40% of the values.

There is 60% left over, 30% on ech side. Thus, x is the 30th percentile and y is the 70th percentile.

The z-score of y is 0.525 and the z-score of x is -0.525.

x = 500 - 0.525(100) = 447.5 and y = 500 + 0.525(100) = 552.5.

17. z-scores tell the number of standard dviations an observation is from the mean.