9. Continuous Probability Distributions

General Terminology and Notation

A continuous random variable is one for which the range (set of possible values) is an interval (or a collection of intervals) on the real number line. Continuous variables have to be treated a little differently than discrete ones, the reason being that $P(X=x)$ has to be zero for each $x$, in order to avoid mathematical contradiction. The distribution of a continuous random variable is called a continuous probability distribution. To illustrate, consider the simple spinning pointer in Figure spinner.
spinner.eps
Spinner: a device for generating a continuous random variable (in a zero-gravity, virtually frictionless environment)

where all numbers in the interval (0,4] are equally likely. The probability of the pointer stopping precisely at the number $x$ must be zero, because if not the total probability for $R=\{x:0<x\leq4\}$ would be infinite, since the set $R$ is non-countable. Thus, for a continuous random variable the probability at each point is 0. This means we can no longer use the probability function to describe a distribution. Instead there are two other functions commonly used to describe continuous distributions.


Cumulative Distribution Function:

For discrete random variables we defined the c.d.f., MATH. This function still works for continuous random variables. For the spinner, the probability the pointer stops between 0 and 1 is 1/4 if all values $x$ are equally ``likely"; between 0 and 2 the probability is 1/2, between 0 and 3 it is 3/4; and so on. In general, $F(x) = x/4$ for $0 < x \leq4$.

Also, $F(x) = 0$ for $x \leq0$ since there is no chance of the pointer stopping at a number $\leq0$, and $F(x) = 1$ for $x > 4$ since the pointer is certain to stop at number below $x$ if $x > 4$.

Most properties of a c.d.f. are the same for continuous variables as for discrete variables. These are:

1. $F(-\infty)=0;$and $F(\infty)=1$
2. $F(x)$ is a non-decreasing function of $x$
3. MATH.

Note that, as indicated before, for a continuous distribution, we have MATH. Also, since the probability is 0 at each point: MATH (For a discrete random variable, each of these 4 probabilities could be different.). For the continuous distributions in this chapter, we do not worry about whether intervals are open, closed, or half-open since the probability of these intervals is the same.

Probability Density Function (p.d.f.): While the c.d.f. can be used to find probabilities, it does not give an intuitive picture of which values of $x$ are more likely, and which are less likely. To develop such a picture suppose that we take a short interval of $X$-values, $[x,x+\Delta x]$. The probability $X$ lies in the interval is MATH To compare the probabilities for two intervals, each of length $\Delta x$, is easy. Now suppose we consider what happens as $\Delta x$ becomes small, and we divide the probability by $\Delta x$. This leads to the following definition.

Definition

The probability density function (p.d.f.) $f(x)$ for a continuous random variable $X$ is the derivative MATH where $F(x)$ is the c.d.f. for $X$.

We will see that $f(x)$ represents the relative likelihood of different $x$-values. To do this we first note some properties of a p.d.f. It is assumed that $f(x)$ is a continuous function of $x$ at all points for which $0<F(x)<1$.


Properties of a probability density function

  1. MATH
    This follows from the definition of $f(x)$.

  2. $f(x) \geq0$
    (Since $F(x)$ is non-decreasing).

  3. MATH
    This is because MATH.

  4. MATH.
    This is just property 1 with $a = - \infty$.

To see that $f(x)$ represents the relative likelihood of different outcomes, we note that for $\Delta x$ small, MATH Thus, $f(x)\neq P(X=x)$ but $f(x)\Delta x$ is the approximate probability that $X$ is inside the interval $[x,x+\Delta x]$. A plot of the function $f(x)$ shows such values clearly and for this reason it is very common to plot the p.d.f.'s of continuous random variables.


Example: Consider the spinner example, where MATH Thus, the p.d.f. is $f(x)=F^{\prime}(x)$, or MATH and outside this interval the p.d.f. is $0.$Figure uniformpdf shows the probability density function $f(x)$; for obvious reasons this is called a "uniform" distribution.
p139.ps
Uniform p.d.f.

Remark: Continuous probability distributions are, like discrete distributions, mathematical models. Thus, the distribution assumed for the spinner above is a model, though it seems likely it would be a good model for many real spinners.


Remark: It may seem paradoxical that $P(X=x)=0$ for a continuous r.v. and yet we record the outcomes $X=x$ in real "experiments" with continuous variables. The catch is that all measurements have finite precision; they are in effect discrete. For example, the height $60+\pi$ inches is within the range of the height $X$ of people in a population but we could never observe the outcome $X=60+\pi$ if we selected a person at random and measured their height.

To summarize, in measurements we are actually observing something like MATH where $\Delta$ may be very small, but not zero. The probability of this outcome is not zero: it is (approximately) $f(x) \Delta$.

We now consider a more complicated mathematical example of a continuous random variable Then we'll consider real problems that involve continuous variables. Remember that it is always a good idea to sketch or plot the p.d.f. $f(x)$ for a r.v.


Example:

Let MATH be a p.d.f.

Find

  1. $k$

  2. $F(x)$

  3. MATH


Solution:

  1. Set MATH to solve for $k$. When finding the area of a region bounded by different functions we split the integral into pieces.


example9_1.eps

(We normally wouldn't even write down the parts with $\int0 dx$) MATH

  1. Doing the easy pieces, which are often left out, first:

MATHMATH

(see shaded area below)

example9_1b.eps

MATH

i.e.

MATH

As a rough check, since for a continuous distribution there is no probability at any point, $F(x)$ should have the same value as we approach each boundary point from above and from below.

e.g. MATH

This quick check won't prove your answer is right, but will detect many careless errors.

  1. MATH


Defined Variables or Change of Variable:
When we know the p.d.f. or c.d.f. for a continuous random variable $X$ we sometimes want to find the p.d.f. or c.d.f. for some other random variable $Y$ which is a function of $X$. The procedure for doing this is summarized below. It is based on the fact that the c.d.f. $F_{Y}(y)$ for $Y$ equals $P(Y\leq y)$, and this can be rewritten in terms of $X$ since $Y$ is a function of $X$. Thus:

  1. Write the c.d.f. of $Y$ as a function of $X$.

  2. Use $F_{X}(x)$ to find $F_{Y}(y)$. Then if you want the p.d.f. $f_{Y}(y)$, you can differentiate the expression for $F_{Y}(y)$.

  3. Find the range of values of $y$.


Example: In the earlier spinner example, MATH Let $Y=1/X$. Find $f(y)$.


Solution:

MATH For step (2), we can do either: MATH (As $x$ goes from 0 to 4, $y=\frac{1}{x}$ goes between $\infty$ and $\frac {1}{4}$.) MATH Generally if $F_{X}(x)$ is known it is easier to substitute first, then differentiate. If $F_{X}(x)$ is in the form of an integral that can't be solved, it is usually easier to differentiate first, then substitute $f_{X}(x)$.

Extension of Expectation, Mean, and Variance to Continuous Distributions

Definition

When $X$ is continuous, we still define MATH

With this definition, all of the earlier properties of expectation and variance still hold; for example with $\mu=E(X),$ MATH

(This definition can be justified by writing MATH as a limit of a Riemann sum and recognizing the Riemann sum as being in the form of an expectation for discrete random variables.)

Example: In the spinner example with MATH MATH


Example: Let $X$ have p.d.f. MATH Then MATH


Problems:

  1. Let $X$ have p.d.f. MATH Find

    1. $k$

    2. the c.d.f., $F(x)$

    3. MATH

    4. the mean and variance of $X$.

    5. let $Y = X^{2}$. Derive the p.d.f. of $Y$.

  2. A continuous distribution has c.d.f. MATH for $x > 0$, where $n$ is a positive constant.

    1. Evaluate $k$.

    2. Find the p.d.f., $f(x)$.

    3. What is the median of this distribution? (The median is the value of $x$ such that half the time we get a value below it and half the time above it.)

Continuous Uniform Distribution

Just as we did for discrete r.v.'s, we now consider some special types of continuous probability distributions. These distributions arise in certain settings, described below. This section considers what we call uniform distributions.

Physical Setup:

Suppose $X$ takes values in some interval [a,b] (it doesn't actually matter whether interval is open or closed) with all subintervals of a fixed length being equally likely. Then $X$ has a continuous uniform distribution. We write MATH.
Illustrations:

  1. In the spinner example $X \sim U (0,4]$.

  2. Computers can generate a random number $X$ which appears as though it is drawn from the distribution $U(0,1)$. This is the starting point for many computer simulations of random processes; an example is given below.

The probability density function and the cumulative distribution function:

Since all points are equally likely (more precisely, intervals contained in $[a,b]$ of a given length, say 0.01, all have the same probability), the probability density function must be a constant MATH for some constant $k$. To make MATH, we require $k=\frac{1}{b-a}$. MATHMATH


Mean and Variance:MATH


Example: Suppose $X$ has the continuous p.d.f. MATH (This is called an exponential distribution and is discussed in the next section. It is used in areas such as queueing theory and reliability.) We'll show that the new random variable MATH has a uniform distribution, $U(0,1)$. To see this, we follow the steps in Section 9.1:
MATH

Since MATH we get

MATH

(The range of $Y$ is (0,1) since $X>0$.) Thus MATH and so $Y\sim U(0,1)$.


Many computer software systems have ``random number generator" functions that will simulate observations $Y$ from a $U(0,1)$ distribution. (These are more properly called pseudo-random number generators because they are based on deterministic algorithms. In addition they give observations $Y$ that have finite precision so they cannot be exactly like continuous $U(0,1)$ random variables. However, good generators give $Y $'s that appear indistinguishable in most ways from $U(0,1)$ r.v.'s.) Given such a generator, we can also simulate r.v.'s $X$ with the exponential distribution above by the following algorithm:

  1. Generate $Y \sim U(0, 1)$ using the computer random number generator.

  2. Compute $X = - 10 ~l n ~Y$.

Then $X$ has the desired distribution. This is a particular case of a method described in Section 9.4 for generating random variables from a general distribution. In $R$ software the command $runif(n)$ produces a vector consisting of $n$ independent $U(0,1)$ values.


Problem:

  1. If $X$ has c.d.f. $F(x)$, then $Y = F(X)$ has a uniform distribution on [0,1]. (Show this.) Suppose you want to simulate observations from a distribution with MATH, by using the random number generator on a computer to generate $U[0,1)$ numbers. What value would $X$ take when you generated the random number .27125?

Exponential Distribution

The continuous random variable $X$ is said to have an exponential distribution if its p.d.f. is of the form MATH where $\lambda>0$ is a real parameter value. This distribution arises in various problems involving the time until some event occurs. The following gives one such setting.


Physical Setup: In a Poisson process for events in time let $X$ be the length of time we wait for the first event occurrence. We'll show that $X$ has an exponential distribution. (Recall that the number of occurrences in a fixed time has a Poisson distribution. The difference between the Poisson and exponential distributions lies in what is being measured.)


Illustrations:

  1. The length of time $X$ we wait with a Geiger counter until the emission of a radioactive particle is recorded follows an exponential distribution.

  2. The length of time between phone calls to a fire station (assuming calls follow a Poisson process) follows an exponential distribution.


Derivation of the probability density function and the c.d.f.

$F(x)=P(X\leq x)$ = $P$ (time to $1^{\QTR{rm}{st}}$ occurrence $\leq x$)
= $1-P$ (time to $1^{\QTR{rm}{st}}$ occurrence $>x$ )
= $1-P$ (no occurrences in the interval $(0,x)$)

Check that you understand this last step. If the time to the first occurrence $> x$, there must be no occurrences in $(0, x)$, and vice versa.

We have now expressed $F(x)$ in terms of the number of occurrences in a Poisson process by time $x$. But the number of occurrences has a Poisson distribution with mean $\mu=\lambda x$, where $\lambda$ is the average rate of occurrence. MATH Since MATHfor $x>0$. ThusMATH which is the formula we gave above.

Alternate Form: It is common to use the parameter $\theta= 1/\lambda$ in the exponential distribution. (We'll see below that $\theta= E(X)$.) This makes MATH

Exercise:

Suppose trees in a forest are distributed according to a Poisson process. Let $X$ be the distance from an arbitrary starting point to the nearest tree. The average number of trees per square metre is $\lambda$. Derive $f(x)$ the same way we derived the exponential p.d.f. You're now using the Poisson distribution in 2 dimensions (area) rather than 1 dimension (time).

Mean and Variance:

Finding $\mu$ and $\sigma^{2}$ directly involves integration by parts. An easier solution uses properties of gamma functions, which extends the notion of factorials beyond the integers to the positive real numbers.

Definition

The Gamma Function: MATH is called the gamma function of $\alpha$, where $\alpha>0$.

Note that $\alpha$ is 1 more than the power of $x$ in the integrand. e.g. MATH. There are 3 properties of gamma functions which we'll use.

  1. MATH for $\alpha>1$
    Proof: Using integration by parts, MATH and provided that $\alpha>1,$ MATH Therefore MATH

  2. MATH if $\alpha$ is a positive integer.
    Proof: It is easy to show that $\Gamma(1)=1.$ Using property 1. repeatedly, we obtain MATHetc.
    Generally,. $\Gamma(n+1)=n!$ for integer $n.$

  3. MATH
    (This can be proved using double integration.)

Returning to the exponential distribution: MATH Let $y=\frac{x}{\theta}$. Then $dx=\theta dy$ and MATH


Note: Read questions carefully. If you're given the average rate of occurrence in a Poisson process, that is $\lambda$. If you're given the average time you wait for an occurrence, that is $\theta$.

To get MATH, we first find

MATH


Example:

Suppose #7 buses arrive at a bus stop according to a Poisson process with an average of 5 buses per hour. (i.e. $\lambda=5$/hr. So $\theta=\frac{1}{5}$ hr. or 12 min.) Find the probability (a) you have to wait longer than 15 minutes for a bus (b) you have to wait more than 15 minutes longer, having already been waiting for 6 minutes.


Solution:

  1. MATH
    = MATH

  2. If $X$ is the total waiting time, the question asks for the probability MATHMATH Does this surprise you? The fact that you're already waited 6 minutes doesn't seem to matter. This illustrates the ``memoryless property'' of the exponential distribution: MATH Fortunately, buses don't follow a Poisson process so this example needn't cause you to stop using the bus.


Problems:

  1. In a bank with on-line terminals, the time the system runs between disruptions has an exponential distribution with mean $\theta$ hours. One quarter of the time the system shuts down within 8 hours of the previous disruption. Find $\theta$.

  2. Flaws in painted sheets of metal occur over the surface according to the conditions for a Poisson process, at an intensity of $\lambda$ per $m^{2}$. Let $X$ be the distance from an arbitrary starting point to the second closest flaw. (Assume sheets are of infinite size!)

    1. Find the p.d.f., $f(x)$.

    2. What is the average distance to the second closest flaw?

A Method for Computer Generation of Random Variables.

Most computer software has a built-in "pseudo-random number generator" that will simulate observations $U$ from a $U(0,1)$ distribution, or at least a reasonable approximation to this uniform distribution. If we wish a random variable with a non-uniform distribution, the standard approach is to take a suitable function of $U.$ By far the simplest and most common method for generating non-uniform variates is based on the inverse cumulative distribution function. For arbitrary c.d.f. $F(x)$, define $F^{-1}(y)=$min$~\{x;F(x)\geq y\}$. This is a real inverse (i.e. MATH) in the case that the c.d.f. is continuous and strictly increasing, so for example for a continuous distribution. However, in the more general case of a possibly discontinuous non-decreasing c.d.f. (such as the c.d.f. of a discrete distribution) the function continues to enjoy at least some of the properties of an inverse. $F^{-1}$ is useful for generating a random variables having c.d.f. $F(x)$ from $U,$ a uniform random variable on the interval $[0,1].$


Theorem

If $F$ is an arbitrary c.d.f. and $U~$ is uniform on $[0,1]~$ then the random variable defined by $X=F^{-1}(U)$ has c.d.f. $~F(x)~$.

Proof:

The proof is a consequence of the fact that MATH You can check this graphically be checking, for example, that if $[U<F(x)]$ then $[F^{-1}(U)\leq x]$ (this confirms the left hand "MATH Taking probabilities on all sides of this, and using the fact that MATH, we discover that $P[X\leq x]=F(x).$


inversetransform.ps
Inverting a c.d.f.

The relation $X=F^{-1}(U)$ implies that $F(X)\geq U$ and for any point $z<X,$ $F(z)<U.$ For example, for the rather unusual looking piecewise linear cumulative distribution function in Figure inversetransform, we find the solution $X=F^{-1}(U)$ by drawing a horizontal line at $U$ until it strikes the graph of the c.d.f. (or where the graph would have been if we had joined the ends at the jumps) and then $X$ is the $x-coordinate$ of this point. This is true in general, $X$ is the coordinate of the point where a horizontal line first strikes the graph of the c.d.f. We provide one simple example of generating random variables by this method, for the geometric distribution.


Example: A geometric random number generator

For the Geometric distribution, the cumulative distribution function is given by MATH

Then if $U$ is a uniform random number in the interval $[0,1],$ we seek an integer $X$ such that MATH (you should confirm that this is the value of $X$ at which the above horizontal line strikes the graph of the c.d.f) and solving these inequalities gives MATH so we compute the value of MATH and round down to the next lower integer.




Exercise: An exponential random number generator.

Show that the inverse transform method above results in the generator for the exponential distribution MATH

Normal Distribution

Physical Setup:

A random variable $X$ defined on MATH has a normal distribution if it has probability density function of the form MATH where MATH and $\sigma> 0$ are parameters. It turns out (and is shown below) that $E(X) = \mu$ and Var$(X)=\sigma^{2}$ for this distribution; that is why its p.d.f. is written using the symbols $\mu$ and $\sigma$. We write MATH to denote that $X$ has a normal distribution with mean $\mu$ and variance $\sigma^{2}$ (standard deviation $\sigma$).

The normal distribution is the most widely used distribution in probability and statistics. Physical processes leading to the normal distribution exist but are a little complicated to describe. (For example, it arises in physics via statistical mechanics and maximum entropy arguments.) It is used for many processes where $X$ represents a physical dimension of some kind, but also in many other settings. We'll see other applications of it below. The shape of the p.d.f. $f(x)$ above is what is often termed a ``bell shape'' or ``bell curve'', symmetric about $0$ as shown in Figure normpdf.(you should be able to verify the shape without graphing the function)


normpdf.ps
The standard normal probability density function

Illustrations:

  1. Heights or weights of males (or of females) in large populations tend to follow normal distributions.

  2. The logarithms of stock prices are often assumed to be normally distributed.

The cumulative distribution function: The c.d.f. of the normal distribution $N(\mu,\sigma^{2})$ is MATH as shown in Figure normcdf. This integral cannot be given a simple mathematical expression so numerical methods are used to compute its value for given values of $x,\mu$ and $\sigma$. This function is included in many software packages and some calculators.


normalcdf.eps
The standard normal c.d.f.

In the statistical packages $R$ and $S$-Plus we get $F(x)$ above using the function MATH.

Before computers, people produced tables of probabilities $F(x)$ using mechanical calculators. Fortunately it is necessary to do this only for a single normal distribution: the one with $\mu= 0$ and $\sigma= 1$. This is called the ``standard" normal distribution and denoted $N(0,1)$.

Its easy to see that if MATH then the "new" r.v. $Z=(X-\mu)/\sigma$ is distributed as $Z\sim N(0,1)$. (Just use the change of variables methods in Section 9.1.) We'll use this to compute $F(x)$ and probabilities for $X$ below, but first we show that $f(x)$ integrates to 1 and that $E(X)=\mu$ and Var$(X)=\sigma^{2}$. For the first result, note thatMATH


Mean, Variance, Moment generating function: Recall that an odd function, $f(x)$, has the property that $f(-x)=-f(x)$. If $f(x)$ is an odd function then MATH, provided the integral exists.

Consider MATH

Let $y=x-\mu$. Then MATH
where MATH is an odd function so that MATH. But since MATH, this implies MATH and so $\mu$ is the mean. To obtain the variance, MATH We can obtain a gamma function by letting MATH. MATH Then MATH and so $\sigma^{2}$ is the variance. We now find the moment generating function of the $N(\mu,\sigma^{2})$ distribution. If $X$ has the $N(\mu ,\sigma^{2})$ distribution, then MATH where the last step follows since MATH is just the integral of a MATH probability density function and is therefore equal to one. This confirms the values we already obtained for the mean and the variance of the normal distribution MATH from which we obtain MATH


Finding Normal Probabilities Via $N(0,1)$ Tables As noted above, $F(x)$ does not have an explicit closed form so numerical computation is needed. The following result shows that if we can compute the c.d.f. for the standard normal distribution $N(0,1)$, then we can compute it for any other normal distribution $N(\mu,\sigma^{2})$ as well.


Theorem

Let MATH and define $Z=(X-\mu)/\sigma$. Then $Z\sim N(0,1)$ and MATH

Proof: The fact that $Z\sim N(0,1)$ has p.d.f. MATH follows immediately by change of variables. Alternatively, we can just note thatMATH


A table of probabilities MATH is given on the last page of these notes. A space-saving feature is that only the values for $z > 0$ are shown; for negative values we use the fact that $N(0,1)$ p.d.f. is symmetric about 0.

The following examples illustrate how to get probabilities for $Z$ using the tables.


Examples: Find the following probabilities, where $Z\sim N(0,1)$.

  1. MATH

  2. MATH

  3. MATH

  4. MATH

  5. MATH

Solution:

  1. Look up 2.11 in the table by going down the left column to 2.1 then across to the heading .01. We find the number .9826. Then MATH. See Figure exercisenormal1.

    exercise_normal1.ps

  2. MATH

  3. MATH

  4. Now we have to use symmetry: MATH See Figure exercisenormal2.

    exercise_normal2.ps

  5. MATH
    MATH
    MATH

In addition to using the tables to find the probabilities for given numbers, we sometimes are given the probabilities and asked to find the number. With $R$ or $S$-Plus software , the function qnorm $(p,\mu,\sigma)$ gives the 100 $p$-th percentile (where $0<p<1)$. We can also use tables to find desired values.


Examples:

  1. Find a number $c$ such that MATH

  2. Find a number $d$ such that MATH

  3. Find a number $b$ such that MATH

Solutions:

  1. We can look in the body of the table to get an entry close to .8500. This occurs for $z$ between 1.03 and 1.04; $z = 1.04$ gives the closest value to .85. For greater accuracy, the table at the bottom of the last page is designed for finding numbers, given the probability. Looking beside the entry .85 we find $z = 1.0364$.

  2. Since MATH we have MATH. There is no entry for which $F(z) = .10$ so we again have to use symmetry, since $d$ will be negative.

    MATH
    MATH
    MATH
    MATH
    MATH


    exercise_normal3.eps

    The key to this solution lies in recognizing that $d$ will be negative. If you can picture the situation it will probably be easier to handle the question than if you rely on algebraic manipulations.


Exercise: Will $a$ be positive or negative if MATH? What if MATH?

  1. If MATH we again use symmetry.

    exercise_normal4.eps

    The probability outside the interval $(-b,b)$ must be .05, and this is evenly split between the area above $b$ and the area below $-b$. MATH

Looking in the table, $b = 1.96$.

To find MATH probabilities in general, we use the theorem given earlier, which implies that if MATH then MATH where $Z\sim N(0,1)$.


Example: Let $X\sim N(3,25)$.

  1. Find MATH

  2. Find a number $c$ such that MATH.


Solution:

  1. MATH

  2. MATH
    exercise_normal5.eps

MATH Gaussian Distribution: The normal distribution is also known as the Gaussian Note_1 distribution. The notation MATH means that $X$ has Gaussian (normal) distribution with mean $\mu$ and standard deviation $\sigma $. So, for example, if $X\sim N(1,4)$ then we could also write $X\sim G(1,2)$.


Example: The heights of adult males in Canada are close to normally distributed, with a mean of 69.0 inches and a standard deviation of 2.4 inches. Find the 10th and 90th percentiles of the height distribution. (Recall that the a-th percentile is such that a% of the population has height less that this value.)


Solution: We are being told that if $X$ is the height of a randomly selected Canadian adult male, then $X\sim G(69.0,2.4)$, or equivalently $X\sim N(69.0,5.76)$. To find the 90th percentile $c$, we useMATH From the table we see $P(Z\leq1.2816)=.90$ so we need MATH which gives $x=72.08$ inches. Similarly, to find $c$ such that $P(X\leq c)=.10$ we find that MATH, so we need MATH or $c=65.92$ inches, as the 10th percentile.


Linear Combinations of Independent Normal Random Variables

Linear combinations of normal r.v.'s are important in many applications. Since we have not covered continuous multivariate distributions, we can only quote the second and third of the following results without proof. The first result follows easily from the change of variables method.

  1. Let MATH and $Y = a X + b$, where $a$ and $b $ are constant real numbers. Then MATH

  2. Let MATH and MATH be independent, and let $a$ and $b $ be constants.
    Then MATH.
    In general if MATH are independent and $a_{i}$ are constants,
    then MATH.

  3. Let MATH be independent MATH random variables.
    Then MATH and MATH.

    Actually, the only new result here is that the distributions are normal. The means and variances of linear combinations of r.v.'s were previously obtained in section 8.3.


Example: Let $X\sim N(3,5)$ and $Y\sim N(6,14)$ be independent. Find MATH.

Solution: Whenever we have variables on both sides of the inequality we should collect them on one side, leaving us with a linear combination. MATH


Example: Three cylindrical parts are joined end to end to make up a shaft in a machine; 2 type A parts and 1 type B. The lengths of the parts vary a little, and have the distributions: MATH and MATH. The overall length of the assembled shaft must lie between 46.8 and 47.5 or else the shaft has to be scrapped. Assume the lengths of different parts are independent. What percent of assembled shafts have to be scrapped?


Exercise: Why would it be wrong to represent the length of the shaft as 2A + B? How would this length differ from the solution given below?


Solution: Let $L$, the length of the shaft, be $L=A_{1}+A_{2}+B$.

Then MATH and so MATH

i.e. 23.18% are acceptable and 76.82% must be scrapped. Obviously we have to find a way to reduce the variability in the lengths of the parts. This is a common problem in manufacturing.

Exercise: How could we reduce the percent of shafts being scrapped? (What if we reduced the variance of $A$ and $B$ parts each by 50%?)
Example: The heights of adult females in a large population is well represented by a normal distribution with mean 64 in. and variance 6.2 in$^{2}$.

  1. Find the proportion of females whose height is between 63 and 65 inches.

  2. Suppose 10 women are randomly selected, and let $\bar{X}$ be their average height ( i.e. MATH, where MATH are the heights of the 10 women). Find MATH.

  3. How large must $n$ be so that a random sample of $n$ women gives an average height $\bar{X}$ so that MATH?


Solution:

  1. $X\sim N(64,6.2)$ so for the height $X$ of a random woman,MATH

  2. MATH so MATH

  3. If MATH then MATH

iff $.402\sqrt{n}=1.96$. (This is because MATH. So MATH iff MATH which is true if MATH, or $n\geq23.77$. Thus we require $n\geq24$ since $n$ is an integer.


Remark: This shows that if we were to select a random sample of $n=24$ persons, then their average height $\bar{X}$ would be with 1 inch of the average height $\mu$ of the whole population of women. So if we did not know $\mu$ then we could estimate it to within $\pm1$ inch (with probability .95) by taking this small a sample.

Exercise: Find how large $n$ would have to be to make MATH.
These ideas form the basis of statistical sampling and estimation of unknown parameter values in populations and processes. If MATH and we know roughly what $\sigma$ is, but don't know $\mu$, then we can use the fact that MATH to find the probability that the mean $\bar{X}$ from a sample of size $n$ will be within a given distance of $\mu $.


Problems:

  1. Let $X \sim N (10,4)$ and $Y \sim N (3,100)$ be independent. Find the probability

    1. $8.4 < X < 12.2$

    2. $2Y > X$

    3. $\overline{Y} < 0$ where $\overline{Y}$ is the sample mean of 25 independent observations on $Y$.

  2. Let $X$ have a normal distribution. What percent of the time does $X$ lie within one standard deviation of the mean? Two standard deviations? Three standard deviations?

  3. Let $X \sim N (5,4)$. An independent variable $Y$ is also normally distributed with mean 7 and standard deviation 3. Find:

    1. The probability $2X$ differs from $Y$ by more than 4.

    2. The minimum number, $n$, of independent observations needed on $X$ so that
      MATH is the sample mean)

Use of the Normal Distribution in Approximations

The normal distribution can, under certain conditions, be used to approximate probabilities for linear combinations of variables having a non-normal distribution. This remarkable property follows from an amazing result called the central limit theorem. There are actually several versions of the central limit theorem. The version given below is one of the simplest.

Central Limit Theorem (CLT):

The major reason that the normal distribution is so commonly used is that it tends to approximate the distribution of sums of random variables. For example, if we throw $n$ fair dice and $S_{n}$ is the sum of the outcomes, what is the distribution of $S_{n}?$ The tables below provide the number of ways in which a given value can be obtained. The corresponding probability is obtained by dividing by $6^{n}.$ For example on the throw of $n=1$ dice the probable outcomes are 1,2,...,6 with probabilities all $1/6$ as indicated in the first panel of the histogram in Figure clt.


clt.ps
The probability histogram of the sum of $n$ discrete uniform {1,2,3,4,5,6}Random variables

If we sum the values on two fair dice, the possible outcomes are the values 2,3,...,12 as shown in the following table and the probabilities are the values below:

Values 2 3 4 5 6 7 8 9 10 11 12
Probabilities$\times36$ 1 2 3 4 5 6 5 4 3 2 1

The probability histogram of these values is shown in the second panel. Finally for the sum of the values on three independent dice, the values range from 3 to 18 and have probabilities which, when multiplied by $6^{3}$ result in the values

1 3 6 10 15 21 25 27 27 25 21 15 10 6 3 1

to which we can fit three separate quadratic functions one in the middle region and one in each of the two tails. The histogram of these values shown in the third panel of Figure clt. and already resembles a normal probability density function.In general, these distributions show a simple pattern. For $n=1$, the probability function is a constant (polynomial degree 0). For $n=2,$ two linear functions spliced together. For $n=3$, the histogram can be constructed from three quadratic pieces (polynomials of degree $n-1).$ These probability histograms rapidly approach the shape of the normal probability density function, as is the case with the sum or the average of independent random variables from most distributions. You can simulate the throws of any number of dice and illustrate the behaviour of the sums on at the url http://www.math.csusb.edu/faculty/stanton/probstat/clt.html.

Let MATH be independent random variables all having the same distribution, with mean $\mu$ and variance $\sigma^{2}$. Then as $n\rightarrow\infty$, MATH and MATH This is actually a rough statement of the result since, as MATH, both the MATH and MATH distributions fail to exist. (The former because both $n\QTR{group}{\mu}$ and MATH, the latter because MATH.) A precise version of the results is:

Theorem

If MATH be independent random variables all having the same distribution, with mean $\mu$ and variance $\sigma^{2}$, then as $n\rightarrow\infty$, the cumulative distribution function of the random variable MATH approaches the $N(0,1)$ c.d.f. Similarly, the c.d.f. of MATH approaches the standard normal c.d.f.

Although this is a theorem about limits, we will use it when $n$ is large, but finite, to approximate the distribution of $\sum X_{i}$ or $\overline{X}$ by a normal distribution, so the rough version of the theorem in (cltsum) and (cltmean) is adequate for our purposes.

Notes:

  1. This theorem works for essentially all distributions which $X_{i}$ could have. The only exception occurs when $X_{i}$ has a distribution whose mean or variance don't exist. There are such distributions, but they are rare.

  2. We will use the Central Limit Theorem to approximate the distribution of sums MATH or averages $\bar{X}$. The accuracy of the approximation depends on $n$ (bigger is better) and also on the actual distribution the $X_{i}$'s come from. The approximation works better for small $n$ when $X_{i}$'s p.d.f. is close to symmetric.

  3. If you look at the section on linear combinations of independent normal random variables you will find two results which are very similar to the central limit theorem. These are:

    For $X_{1},\cdots,X_{n}$ independent and MATH, MATH, and MATH.

Thus, if the $X_{i}$'s themselves have a normal distribution, then $\sum X_{i}$ and $\overline{X}$ have exactly normal distributions for all values of $n$. If the $X_{i}$'s do not have a normal distribution themselves, then $\sum X_{i}$ and $\overline{X}$ have approximately normal distributions when $n$ is large. From this distinction you should be able to guess that if the $X_{i}$'s distribution is somewhat normal shaped the approximation will be good for smaller values of $n$ than if the $X_{i}$'s distribution is very non-normal in shape. (This is related to the second remark in (2)).

Example: Hamburger patties are packed 8 to a box, and each box is supposed to have 1 Kg of meat in it. The weights of the patties vary a little because they are mass produced, and the weight $X$ of a single patty is actually a random variable with mean $\mu=0.128$ kg and standard deviation $\sigma=0.005$ kg. Find the probability a box has at least 1 kg of meat, assuming that the weights of the 8 patties in any given box are independent.

Solution: Let $X_{1},\dots,X_{8}$ be the weights of the 8 patties in a box, and MATH be their total weight. By the Central Limit Theorem, $Y$ is approximately MATH; we'll assume this approximation is reasonable even though $n=8$ is small. (This is likely ok because $X$'s distribution is likely fairly close to normal itself.) Thus MATH and MATH (We see that only about 95% of the boxes actually have 1 kg or more of hamburger. What would you recommend be done to increase this probability to 95%?)

Example: Suppose fires reported to a fire station satisfy the conditions for a Poisson process, with a mean of 1 fire every 4 hours. Find the probability the $500^{\QTR{rm}{th}}$ fire of the year is reported on the $84^{\QTR{rm}{th}}$ day of the year.

Solution: Let $X_{i}$ be the time between the MATH and $i^{\QTR{rm}{th}}$ fires ($X_{1}$ is the time to the $1^{\QTR{rm}{st}}$ fire). Then $X_{i}$ has an exponential distribution with $\theta=1/\lambda=4$ hrs, or $\theta=1/6$ day. Since MATH is the time until the 500th fire, we want to find MATH. While the exponential distribution is not close to normal shaped, we are summing a large number of independent exponential variables. Hence, by the central limit theorem, $\sum X_{i}$ has approximately a MATH distribution, where $\mu=E(X_{i})$ and MATH.

For exponential distributions, $\mu=\theta=1/6$ and MATH so MATHMATH

Example: This example is frivolous but shows how the normal distribution can approximate even sums of discrete r.v.'s. In an orchard, suppose the number $X$ of worms in an apple has probability function:

$x$ 0 1 2 3
$f(x)$ .4 .3 .2 .1

Find the probability a basket with 250 apples in it has between 225 and 260 (inclusive) worms in it.


Solution: MATH By the central limit theorem, MATH has approximately a MATH distribution, where $X_{i}$ is the number of worms in the $i^{\QTR{rm}{th}}$ apple.
i.e. MATH

While this approximation is adequate, we can improve its accuracy, as follows. When $X_{i}$ has a discrete distribution, as it does here, $\sum X_{i}$ will always remain discrete no matter how large $n$ gets. So the distribution of $\sum X_{i}$, while normal shaped, will never be precisely normal. Consider a probability histogram of the distribution of $\sum X_{i}$, as shown in Figure p167. (Only part of the histogram is shown.)


p167.ps

The area of each bar of this histogram is the probability at the $x $ value in the centre of the interval. The smooth curve is the p.d.f. for the approximating normal distribution. Then MATH is the total area of all bars of the histogram for $x$ from 225 to 260. These bars actually span continuous $x$ values from 224.5 to 260.5. We could then get a more accurate approximation by finding the area under the normal curve from 224.5 to 260.5.

i.e. MATH MATH Unless making this adjustment greatly complicates the solution, it is preferable to make this "continuity correction".


Notes:

  1. A continuity correction should not be applied when approximating a continuous distribution by the normal distribution. Since it involves going halfway to the next possible value of $x$, there would be no adjustment to make if $x$ takes real values.

  2. Rather than trying to guess or remember when to add .5 and when to subtract .5, it is often helpful to sketch a histogram and shade the bars we wish to include. It should then be obvious which value to use.

Example: Normal approximation to the Poisson Distribution

Let $X$ be a random variable with a Poisson$(\lambda)$ distribution and suppose $\lambda$ is large. For the moment suppose that $\lambda$ is an integer and recall that if we add $\lambda$ independent Poisson random variables, each with parameter $1,$ then the sum has the Poisson distribution with parameter $\lambda.$ In general, a Poisson random variable with large expected value can be written as the sum of a large number of independent random variables, and so the central limit theorem implies that it must be close to normally distributed. We can prove this using moment generating functions. In Section 7.5 we found the moment generating function of a Poisson random variable $X$ MATH Then the standardized random variable is MATH and this has moment generating function MATH This is easier to work with if we take logarithms,MATH Now as MATH MATH and MATH so MATH Therefore the moment generating function of the standardized Poisson random variable $Z$ approaches $e^{t^{2}/2},$ the moment generating function of the standard normal and this implies that the Poisson distribution approaches the normal as MATH

Normal approximation to the Binomial Distribution

It is well-known that the binomial distribution, at least for large values of $n,$ resembles a bell-shaped or normal curve. The most common demonstration of this is with a mechanical device common in science museums called a "Galton board" or "Quincunx" Note_2 which drop balls through a mesh of equally spaced pins (see Figure balldrop and the applet at http://javaboutique.internet.com/BallDrop/). Notice that if balls either go to the right or left at each of the 8 levels of pins, independently of the movement of the other balls, then $X=$number of moves to right has a $Bin(8,\frac{1}{2})$ distribution. If the balls are dropped from location $0$ (on the $x-$axis) then the ball eventually rests at location $2X-8$ which is approximately normally distributed since $X$ is approximately normal.
ball_drop.jpg
A "Galton Board" or "Quincunx"

The following result is easily proved using the Central Limit Theorem.$\bigskip$

Theorem

Let $X$ have a binomial distribution, $Bi(n,p)$. Then for $n$ large, the r.v. MATH $\bigskip$

Proof: We use indicator variables $X_{i}(i=1,\dots,n)$ where $X_{i}=1$ if the $i$th trial in the binomial process is an "$S$" outcome and 0 if it is an "$F$" outcome. Then MATH and we can use the CLT. Since MATH we have that as $n\rightarrow\infty$ MATH is $N(0,1)$, as stated. \framebox[0.10in]{}



An alternative proof uses moment generating functions and is essentially a proof of this particular case of the Central Limit Theorem. Recall that the moment generating function of the binomial random variable $X$ is MATH As we did with the standardized Poisson random variable, we can show with some algebraic effort that the moment generating function of $W$ MATH proving that the standardized binomial random variable $W$ approaches the standard normal distribution.$\bigskip$

Remark: We can write the normal approximation either as $W\sim N(0,1)$ or as MATH.$\bigskip$

Remark: The continuity correction method can be used here. The following numerical example illustrates the procedure.

Example: If (i) MATH, use the theorem to find the approximate probability $P(4\leq X\leq12)$ and (ii) if $X\sim Bi(100,.4)$ find the approximate probability $P(34\leq X\leq48)$. Compare the answer with the exact value in each case.$\bigskip$

Solution (i) By the theorem above, $X\sim N(8,4.8)$ approximately. Without the continuity correction, MATH where $Z\sim N(0,1)$. Using the continuity correction method, we get MATH The exact probability is MATH, which (using the $R$ function $pbinom(~~)$) is .963. As expected the continuity correction method gives a more accurate approximation.

(ii) $X\sim N(40,24)$ approximately so without the continuity correction MATH With the continuity correction MATH The exact value, MATH, equals .866 (to 3 decimals). The error of the normal approximation decreases as $n$ increases, but it is a good idea to use the CC when it is convenient.

Example: Let $p$ be the proportion of Canadians who think Canada should adopt the US dollar.

  1. Suppose 400 Canadians are randomly chosen and asked their opinion. Let $X$ be the number who say yes. Find the probability that the proportion, $\frac{X}{400}$, of people who say yes is within .02 of $p$, if $p$ is .20.

  2. Find the number, $n$, who must be surveyed so there is a 95% chance that $\frac{X}{n}$ lies within .02 of $p$. Again suppose $p$ is .20.

  3. Repeat (b) when the value of $p$ is unknown.


Solution:

  1. MATH. Using the normal approximation we take MATH

    If $\frac{X}{400}$ lies within $p \pm.02$, then MATH, so $72 \leq X \leq88$. Thus, we find

    MATH

  2. Since $n$ is unknown, it is difficult to apply a continuity correction, so we omit it in this part. By the normal approximation, MATH Therefore, MATH MATH is the condition we need to satisfy. This givesMATH Therefore, MATH and so $.05\sqrt{n}=1.9600$ giving $n=1536.64.$ In other words, we need to survey 1537 people to be at least 95% sure that $\frac{X}{n}$ lies within .02 either side of $p$.

  3. Now using the normal approximation to the binomial, approximately MATHand so MATH We wish to find $n$ such that MATH As is part (b),MATH Solving for $n,$ MATH Unfortunately this does not give us an explicit expression for $n$ because we don't know $p$. The way out of this dilemma is to find the maximum value MATH could take. If we choose $n$ this large, then we can be sure of having the required precision in our estimate, $\frac{X}{n}$, for any $p$. It's easy to see that $p(1-p)$ is a maximum when $p=\frac{1}{2}$. Therefore we take MATH i.e., if we survey 2401 people we can be 95% sure that $\frac{X}{n}$ lies within .02 of $p$, regardless of the value of $p$.


Remark: This method is used when poll results are reported in the media: you often see or hear that "this poll is accurate to with 3 percent 19 times out of 20". This is saying that $n$ was big enough so that MATH was 95%. (This requires $n$ of about 1067.)


Problems:

  1. Tomato seeds germinate (sprout to produce a plant) independently of each other, with probability 0.8 of each seed germinating. Give an expression for the probability that at least 75 seeds out of 100 which are planted in soil germinate. Evaluate this using a suitable approximation.

  2. A metal parts manufacturer inspects each part produced. 60% are acceptable as produced, 30% have to be repaired, and 10% are beyond repair and must be scrapped. It costs the manufacturer $10 to repair a part, and $100 (in lost labour and materials) to scrap a part. Find the approximate probability that the total cost associated with inspecting 80 parts will exceed $1200.


Problems on Chapter 9

  1. The diameters $X$ of spherical particles produced by a machine are randomly distributed according to a uniform distribution on [.6,1.0] (cm). Find the distribution of $Y$, the volume of a particle.

  2. A continuous random variable $X$ has p.d.f. MATH

    1. When people are asked to make up a random number between 0 and 1, it has been found that the distribution of the numbers, $X$, has p.d.f. close to MATH (rather than the $U[0,1]$ distribution which would be expected). Find the mean and variance of $X$.

    2. For 100 ``random'' numbers from the above distribution find the probability their sum lies between 49.0 and 50.5.

    3. What would the answer to (b) be if the 100 numbers were truly $U[0,1]$?

  3. Let $X$ have p.d.f. MATH, and let MATH. Find the p.d.f. of $Y$.

  4. A continuous random variable $X$ which takes values between 0 and 1 has probability density function MATH

    1. For what values of $\alpha$ is this a p.d.f.? Explain.

    2. Find MATH and $E(X)$

    3. Find the probability density function of $T = 1/X$.

  5. The magnitudes of earthquakes in a region of North America can be modelled by an exponential distribution with mean 2.5 (measured on the Richter scale).

  6. A certain type of light bulb has lifetimes that follow an exponential distribution with mean 1000 hours. Find the median lifetime (that is, the lifetime $x$ such that 50% of the light bulbs fail before $x$).

  7. The examination scores obtained by a large group of students can be modelled by a normal distribution with a mean of 65% and a standard deviation of 10%.

  8. The number of litres $X$ that a filling machine in a water bottling plant deposits in a nominal two litre bottle follows a normal distribution MATH, where $\sigma= .01$ (litres) and $\mu$ is the setting on the machine.

  9. A turbine shaft is made up of 4 different sections. The lengths of those sections are independent and have normal distributions with $\mu$ and $\sigma$: (8.10, .22), (7.25, .20),
    (9.75, .24), and (3.10, .20). What is the probability an assembled shaft meets the specifications $28 \pm.26$?

  10. Let $X \sim G (9.5,2)$ and MATH be independent.

    Find:

    1. MATH

    2. $P (X + 4 Y > 0) $

    3. a number $b$ such that $P (X > b) = .90$.

  11. The amount, $A$, of wine in a bottle MATH (Note: $l$ means liters.)

    1. The bottle is labelled as containing $1l$. What is the probability a bottle contains less than $1l$?

    2. Casks are available which have a volume, $V$, which is $N(22l, .16l^{2})$. What is the probability the contents of 20 randomly chosen bottles will fit inside a randomly chosen cask?

  12. In problem 8.18, calculate the probability of passing the exam, both with and without guessing if (a) each $p_{i}$ = .45; (b) each $p_{i} = .55$.
    What is the best strategy for passing the course if (a) $p_{i} = .45$ (b) $p_{i} = .55$?

  13. Suppose that the diameters in millimeters of the eggs laid by a large flock of hens can be modelled by a normal distribution with a mean of 40 mm. and a variance of 4 mm$^{2}$. The wholesale selling price is 5 cents for an egg less than 37 mm in diameter, 6 cents for eggs between 37 and 42 mm, and 7 cents for eggs over 42 mm. What is the average wholesale price per egg?

  14. In a survey of $n$ voters from a given riding in Canada, the proportion $\frac{x}{n}$ who say they would vote Conservative is used to estimate $p$, the probability a voter would vote P.C. ($x$ is the number of Conservative supporters in the survey.) If Conservative support is actually 16%, how large should $n$ be so that with probability .95, the estimate will be in error at most .03?

  15. When blood samples are tested for the presence of a disease, samples from 20 people are pooled and analysed together. If the analysis is negative, none of the 20 people is infected. If the pooled sample is positive, at least one of the 20 people is infected so they must each be tested separately; i.e., a total of 21 tests is required. The probability a person has the disease is .02.

    1. Find the mean and variance of the number of tests required for each group of 20.

    2. For 2000 people, tested in groups of 20, find the mean and variance of the total number of tests. What assumption(s) has been made about the pooled samples?

    3. Find the approximate probability that more than 800 tests are required for the 2000 people.

  16. Suppose 80% of people who buy a new car say they are satisfied with the car when surveyed one year after purchase. Let $X$ be the number of people in a group of 60 randomly chosen new car buyers who report satisfaction with their car. Let $Y$ be the number of satisfied owners in a second (independent) survey of 62 randomly chosen new car buyers. Using a suitable approximation, find MATH. A continuity correction is expected.

  17. Suppose that the unemployment rate in Canada is 7%.

  18. Gambling. Your chances of winning or losing money can be calculated in many games of chance as described here.

    Suppose each time you play a game (or place a bet) of $1 that the probability you win (thus ending up with a profit of $1) is .49 and the probability you lose (meaning your ``profit" is -$1) is .51

    1. Let $X$ represent your profit after $n$ independent plays or bets. Give a normal approximation for the distribution of $X$.

    2. If $n = 20$, determine $P(X \geq0)$. (This is the probability you are ``ahead" after 20 plays.) Also find $P(X \geq0)$ if $n = 50$ and $n = 100 $. What do you conclude?

      Note: For many casino games (roulette, blackjack) there are bets for which your probability of winning is only a little less than .5. However, as you play more and more times, the probability you lose (end up ``behind") approaches 1.

    3. Suppose now you are the casino. If all players combined place $n = 100,000$ $1 bets in an evening, let $X$ be your profit. Find the value $c$ with the property that $P(X > c) = .99$. Explain in words what this means.

  19. Gambling: Crown and Anchor. Crown and Anchor is a game that is sometimes played at charity casinos or just for fun. It can be played with a ``wheel of fortune" or with 3 dice, in which each die has its 6 sides labelled with a crown, an anchor, and the four card suits club, diamond, heart and spade, respectively. You bet an amount (let's say $1) on one of the 6 symbols: let's suppose you bet on ``heart". The 3 dice are then rolled simultaneously and you win $\$t$ if $t$ hearts turn up ($t = 0, 1, 2, 3$).

    1. Let $X$ represent your profits from playing the game $n$ times. Give a normal approximation for the distribution of $X$.

    2. Find (approximately) the probability that $X>0$ if (i) $n = 10$, (ii) $n = 50$.

  20. Binary classification. Many situations require that we ``classify" a unit of some type as being one of two types, which for convenience we will term Positive and Negative. For example, a diagnostic test for a disease might be positive or negative; an email message may be spam or not spam; a credit card transaction may be fraudulent or not. The problem is that in many cases we cannot tell for certain whether a unit is Positive or Negative, so when we have to decide which a unit is, we may make errors. The following framework helps us to deal with these problems.

    For a randomly selected unit from the population being considered, define the indicator random variable MATH Suppose that we cannot know for certain whether $Y = 0$ or $Y = 1$ for a given unit, but that we can get a measurement $X$ with the property that MATH where $\mu_{1} > \mu_{0}$. We now decide to classify units as follows, based on their measurement $X$: select some value $d$ between $\mu_{0}$ and $\mu _{1}$, and then

    1. if $X \geq d$, classify the unit as Positive

    2. if $X < d$, classify the unit as Negative

  21. Binary classification and spam detection. The approach in the preceding question can be used for problems such as spam detection, which was discussed earlier in Problems 4.17 and 4.18. Instead of using binary features as in those problems, suppose that for a given email message we compute a measure $X$, designed so that $X$ tends to be high for spam messages and low for regular (non-spam) messages. (For example $X$ can be a composite measure based on the presence or absence of certain words in a message, as well as other features.) We will treat $X$ as a continuous random variable.

    Suppose that for spam messages, the distribution of $X$ is approximately MATH, and that for regular messages, it is approximately MATH, where $\mu_{1} > \mu_{0}$. This is the same setup as for Problem 9.21. We will filter spam by picking a value $d$, and then filtering any message for which $X \geq d$. The trick here is to decide what value of $d$ to use.

  22. Random chords of a circle. Given a circle, find the probability that a chord chosen at random be longer than the side of an inscribed equilateral triangle. For example in Figure bertrand, the line joining $A$ and $B$ satisfies the condition, the other lines do not.
    bertrand.jpg
    Bertrand's Paradox

    This is called Bertrand's paradox (see the Java applet at http://www.cut-the-knot.org/bertrand.shtml) and there various possible solutions, depending on exactly how you interpret the phrase "a chord chosen at random". For example, since the only important thing is the position of the second point relative to the first one, we can fix the point $A$ and consider only the chords that emanate from this point. Then it becomes clear that 1/3 of the outcomes (those with angle with the tangent at that point between 60 and 120 degrees) will result in a chord longer than the side of an equilateral triangle. But a chord is fully determined by its midpoint. Chords whose length exceeds the side of an equilateral triangle have their midpoints inside a smaller circle with radius equal to 1/2 that of the given one. If we choose the midpoint of the chord at random and uniformly from the points within the circle, what is the probability that corresponding chord has length greater than the side of the triangle? Can you think of any other interpretations which lead to different answers?

  23. A model for stock returns. A common model for stock returns is as follows: the number of trades $N$ of stock XXX in a given day has a Poisson distribution with parameter $\lambda.$ At each trade, say the $i$'th trade, the change in the price of the stock is $X_{i}$ and has a normal distribution with mean $0$ and variance $\sigma^{2},$ say and these changes are independent of one another and independent of $N.$ Find the moment generating function of the total change in stock price over the day. Is this a distribution that you recognise? What is its mean and variance?

  24. Let MATH be independent random variable with a Normal distribution having mean $1,$ and variance $2.$ Find the moment generating function for

    1. $X_{1}$

    2. $X_{1}+X_{2}$

    3. MATH

    4. $n^{-1/2}(S_{n}-n)$