








Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An overview of probability theory, focusing on random variables, their probability density functions (pdf) and cumulative distribution functions (cdf), as well as bivariate distributions. It covers discrete and continuous random variables, their means and variances, and the concept of independent random variables.
Typology: Study notes
1 / 14
This page cannot be seen from the preview
Don't miss anything!
prepared by Professor Jenny Baglivo
©c Copyright 2004 by Jenny A. Baglivo. All Rights Reserved.
These notes are a brief review of probability theory. The complete set of notes for probability theory are at the MT426 website.
0 Brief Review of Probability 2
0.1 Random variable (RV), PDF, CDF, quantiles...................... 2 0.2 Expectation, E(X), Var(X), SD(X)............................ 3
0.3 Bivariate distributions................................... 4 0.4 Linear combinations of independent RVs......................... 8
0.5 Linear combinations of independent normal RVs.................... 9 0.6 Central limit theorem................................... 11 0.7 Random samples and sample summaries......................... 13
A random variable, X, is a function from the sample space of an experiment to the real numbers. The range of X is the set of values the random variable assumes.
X is said to be (1) a discrete random variable if its range is either a finite set or a countably infinite set, (2) a continuous random variable if its range is an interval or a union of intervals, or (3) a mixed random variable if it is neither discrete nor continuous.
Cumulative distribution function (CDF). The CDF of X is the function
F (x) = P (X ≤ x) for all real numbers x.
If X is a discrete random variable, then F (x) is a step function; if X is continuous, then F (x) is continuous.
Probability density function (PDF). If X is a discrete random variable, then the PDF is a probability:
p(x) = P (X = x) for all real numbers x.
If X is a continuous random variable, then the PDF is a rate:
f (x) = d dx
F (x) whenever the derivative exists.
Quantiles. The pth^ quantile (or 100pth^ percentile), xp, is the point satisfying the equation
P (X ≤ xp) = p.
To find xp, solve the equation F (x) = p for x.
Example 0.1 (Exponential RV) Let X be the continuous random variable with PDF
f (x) = λe−λx^ when x ≥ 0 and 0 otherwise,
where λ is a positive constant.
Given t > 0,
∫ (^) t x=0 λe
−λxdx = 1 − e−λt. Thus, the CDF of X is
F (x) = 1 − e−λx^ when x > 0 and 0 otherwise.
Given 0 < p < 1, 1 − e−λx^ = p =⇒ x = − log(1 − p)/λ
where log() is the natural logarithm function. Thus, the pth^ quantile of X can be written as follows: xp = − log(1 − p)/λ.
The mean and variance of X can be computed as follows:
E(X) = 0 × 0 .4096 + 1 × 0 .4096 + 2 × 0 .1536 + 3 × 0 .0256 + 4 × 0 .0016 = 0. 80
E(X^2 ) = 0^2 × 0 .4096 + 1^2 × 0 .4096 + 2^2 × 0 .1536 + 3^2 × 0 .0256 + 4^2 × 0 .0016 = 1. 28
V ar(X) = E(X^2 ) − E(X)^2 = 1. 28 − 0. 802 = 0. 64
Lastly, the standard deviation of X is SD(X) =
Example 0.3 (Uniform RV) Let X be the uniform random variable on the interval [− 10 , 20]. The PDF of X is as follows:
f (x) = 301 when − 10 ≤ x ≤ 20 and 0 otherwise.
The mean and variance of X can be computed as follows:
E(X) =
∫ (^20) x=− 10 x^ 1 30 dx^ = 5
E(X^2 ) =
∫ (^20) x=− 10 x 2 1 30 dx^ = 100
V ar(X) = E(X^2 ) − E(X)^2 = 100 − 52 = 75
Lastly, the standard deviation of X is SD(X) =
A probability distribution describing the joint variability of two or more random variables is called a joint distribution. A bivariate distribution is the joint distribution of a pair of random variables.
The joint CDF of the random pair (X, Y ) is the function
F (x, y) = P (X ≤ x, Y ≤ y) for all real pairs (x, y).
Discrete random pairs. The joint PDF of the discrete random pair (X, Y ) is defined as follows:
p(x, y) = P (X = x and Y = y) = P (X = x, Y = y) for all real pairs (x, y).
We sometimes write pXY (x, y) to emphasize the two random variables.
X and Y are said to be independent when
pXY (x, y) = pX (x) pY (y) for all real pairs (x, y),
where pX (x) and pY (y) are the marginal frequency functions of X and Y , respectively. (That is, P (X = x, Y = y) = P (X = x)P (Y = y) for all real pairs (x, y).)
Continuous random pairs. The joint CDF of the continuous random pair (X, Y ) is defined as follows:
f (x, y) =
∂x∂y
P (X ≤ x, Y ≤ y) =
∂y∂x
P (X ≤ x, Y ≤ y)
when the joint CDF has continuous second partial derivatives. The notation fXY (x, y) is sometimes used to emphasize the two random variables.
X and Y are said to be independent when
fXY (x, y) = fX (x) fY (y) for all real pairs (x, y),
where fX (x) and fY (y) are the marginal density functions of X and Y , respectively.
Expectation. Let g(X, Y ) be a real-valued function. The mean (or expected value or expectation) of g(X, Y ) can be computed as follows:
E(g(X, Y )) =
∑ (x,y) g(x, y)p(x, y)^ in the discrete case ∫ (x,y) g(x, y)f^ (x, y)dx^ in the continuous case
where the sum (integral) is over all pairs with nonzero joint PDF. The expectation is defined as long as the sum (integral) converges absolutely.
Properties of sums and integrals, and the fact that the joint PDF of independent random variables equals the product of the marginal PDFs, can be used to prove the following:
E(a + b 1 g 1 (X, Y ) + b 2 g 2 (X, Y )) = a + b 1 g 1 (X, Y ) + b 2 g 2 (X, Y ).
E(g(X)h(Y )) = E(g(X))E(h(Y )).
Covariance and correlation. Let X and Y be random variables with finite means (μx, μy) and finite standard deviations (σx, σy).
The covariance of X and Y is defined as follows:
Cov(X, Y ) = E((X − μx)(Y − μy)).
The notation σxy = Cov(X, Y ) is often used to denote the covariance. The correlation of X and Y is defined as follows:
Corr(X, Y ) = Cov(X, Y ) σxσy
σxy σxσy
The notation ρ = Corr(X, Y ) is used to denote the correlation. The parameter ρ is called the correlation coefficient.
0
1
2
3 4
x
0
(^21)
(^43)
y
z
0
1
2
3 4
x
z
0.5 1 1.5 2 2.5 3 x
1
2
3
y
0.5 1 1.5 2 2.5 3 x
1
2
3
y
Figure 0.1: Joint PDF for a random pair (left) and region of nonzero density (right).
Since X is a binomial RV with n = 4 and p = 0.20:
E(X) = np = 0. 80 and V ar(X) = np(1 − p) = 0. 64.
Since Y is a binomial RV with n = 4 and p = 0.50:
E(Y ) = np = 2 and V ar(Y ) = np(1 − p) = 1.
Further, E(XY ) =
∑ 4 x=
∑ 4 −x y=0 x y p(x, y) = 0(0.464) + 1(0.108) + 2(0.252) + 3(0.116) + 4(0.060) = 1. 2.
These computations imply that
Cov(X, Y ) = E(XY ) − E(X)E(Y ) = 1. 20 − (0.80)(2.00) = − 0. 40
and
Corr(X, Y ) =
Cov(X, Y ) SD(X)SD(Y )
√ (0.64)(1.00)
Example 0.5 (Tetrahedral Distribution) Let (X, Y ) be a continuous pair whose joint PDF is as follows:
f (x, y) =
(y − x) when 0 < x < y < 3 and 0 otherwise.
The left part of Figure 0.1 shows the solid (of volume 1) under the surface z = f (x, y) and above the xy-plane over the region of non-zero density. The right part of the figure shows the region of non-zero density (0 < x < y < 3).
Then
E(X) =
∫ (^3)
y=
∫ (^) y
x=
xf (x, y) dx dy =
∫ (^3)
y=
y^3 27
dy =
∫ (^3)
y=
∫ (^) y
x=
x^2 f (x, y) dx dy =
∫ (^3)
y=
y^4 54
dy =
and V ar(X) = E(X^2 ) − (3/4)^2 = 27/80.
Similarly,
E(Y ) =
∫ (^3)
y=
∫ (^) y
x=
yf (x, y) dx dy =
∫ (^3)
y=
y^3 9 dy =
∫ (^3)
y=
∫ (^) y
x=
y^2 f (x, y) dx dy =
∫ (^3)
y=
y^4 9
dy =
and V ar(Y ) = E(Y 2 ) − (9/4)^2 = 27/80.
Further,
E(XY ) =
∫ (^3)
y=
∫ (^) y
x=
x y f (x, y) dx‘dy =
∫ (^3)
y=
y^4 27 dy =
These computations imply that
Cov(X, Y ) = E(XY ) − E(X)E(Y ) =
( 3 4
) ( 9 4
) = 0. 1125
and
Corr(X, Y ) =
Cov(X, Y ) SD(X)SD(Y )
√ (27/80)(27/80)
We consider linear combinations of the form
Y = a + b 1 X 1 + b 2 X 2 + · · · + bnXn
(a and bi are constants), where
fX (x 1 , x 2 ,... , xn) = f 1 (x 1 )f 2 (x 2 ) · · · fn(xn)
for all n-tuples of real numbers (fi(x) is the marginal PDF of Xi).
Theorem 0.9 (Independent Normal RVs) Let X 1 , X 2 ,.. ., Xn be independent normal random variables and let
Y = a + b 1 X 1 + b 2 X 2 + · · · + bnXn.
Then Y is a normal random variable.
In particular, if X 1 , X 2 ,.. ., Xn is a random sample from a normal distribution with mean μ and standard deviation σ, then
Exercise 0.10 (Composite Score) The personnel department of a large corporation gives two aptitude tests to job applicants. One measures verbal ability; the other, quantitative ability. From many years’ experience, the company has found that the verbal scores tend to be normally distributed with a mean of 50 and a standard deviation of 10. The quantitative scores are normally distributed with a mean of 100 and a standard deviation of 20, and appear to be independent of the verbal scores. A composite score, C, is assigned to each applicant, where C = 2 (Verbal Score) + 3 (Quantitative Score).
If company policy prohibits hiring anyone whose composite score is below 375, what per- centage of applicants will be summarily rejected?
Solution: Let V be the verbal score and Q be the quantitative score of an applicant. Then the composite score C = 2V + 3Q has summary measures
E(C) = 2E(V ) + 3E(Q) = 400 and V ar(C) = 2^2 V ar(V ) + 3^2 V ar(Q) = 4000.
Further, since V and Q are independent normal random variables, C is a normal random variable.
We want P (C < 375) =
( 375 − 400 √ 4000
) ≈ Φ(− 0 .40) = 1 − Φ(0.40) = 0. 3446 ,
where Φ(·) is the CDF of the standard normal random variable.
Thus, about 34.5% of the applicants will be summarily rejected.
Exercise 0.11 (Sample Mean) Let X 1 , X 2 ,... , Xn be a random sample from a normal distribution with mean 10 and standard deviation 3. How large must n be in order that P (X ≤ 10 .5) is at least 99%?
Solution: The sample mean X is a normal random variable with summary measures
E(X) = E(X) = 10 and V ar(X) =
V ar(X) n
n
We need to solve 0.99 = P (X ≤ 10 .5) for n:
(
) =⇒
√ 9 /n
= z 0. 99 = 2. 33 =⇒ n ≈ 195. 44.
Since we want the probability to be at least 99% and since n must be a whole number, our solution is n = 196.
Exact sums. Other situations where we know exact sums are as follows:
Assume that X 1 , X 2 , X 3 ,... is an infinite sequence of IID random variables, each with the same distribution as X. Let
Sm = X 1 + X 2 + · · · + Xm
be the sum of the first m terms, and let
Xm =
m
Sm
be the average of the first m terms.
If the X distribution has a finite mean and variance, then the central limit theorem says that the distributions of the sample sum Sm and the sample mean Xm are approximately normal when m is large.
The formal statement is as follows:
Recall that a random sample of size n from the X distribution is a list of n mutually independent random variables, each with the same distribution as X.
Sample mean, sample variance. If X 1 , X 2 ,.. ., Xn is a random sample from a distri- bution with mean μ and standard deviation σ, then the sample mean, X, is the random variable X =
n
(X 1 + X 2 + · · · + Xn) ,
the sample variance, S^2 , is the random variable
n − 1
∑^ n
i=
( Xi − X
) 2
and the sample standard deviation, S, is the positive square root of the sample variance. The following theorem can be proven using properties of expectation:
Theorem 0.14 (Sample Summaries) If X is the sample mean and S 2 is the sample variance of a random sample of size n from a distribution with mean μ and standard deviation σ, then
Note that in statistical applications, the observed value of the sample mean is used to estimate an unknown mean μ and the observed value of the sample variance is used to estimate an unknown variance σ^2.
Sample correlation. A random sample of size n from the joint (X, Y ) distribution is a list of n mutually independent random pairs, each with the same distribution as (X, Y ).
If (X 1 , Y 1 ), (X 2 , Y 2 ),... , (Xn, Yn) is a random sample of size n from a bivariate distribution with correlation ρ = Corr(X, Y ), then the sample correlation, R, is the random variable
∑n √ i=1(Xi^ −^ X)(Yi^ −^ Y^ ) ∑n i=1(Xi^ −^ X)^2
∑n i=1(Yi^ −^ Y^ )^2
where X and Y are the sample means of the X and Y samples, respectively.
Note that in statistical applications, the observed value of the sample correlation is used to estimate an unknown correlation ρ.
Example 0.15 (Brain-Body Weights)(Allison & Cicchetti, Science, 194:732-374, 1976; lib.stat.cmu.edu/DASL.) As part of a study on sleep in mammals, researchers collected information on the average brain weight and average body weight for 43 different species.
-3 -2 -1 0 1 2 3 4 x
0
1
2
3
4
y
Figure 0.2: Log-brain weight (vertical axis) versus log-body weight (horizontal axis) for the brain-body study, with linear prediction equation superimposed.
(Body-weight, brain-weight) combinations ranged from (0.05kg,0.14g) for the short-tail shrew to (2547.0kg,4603.0g) for the Asian elephant. The data for “man” are
(62kg,1320g) = (136.4lb,2.9lb).
Let X be the common logarithm of body weight in kilograms, and Y be the common logarithm of brain weight in grams. Sample summaries are as follows:
Figure 0.2 compares the common logarithms of average brain weight in grams (vertical axis) and average body weight in kilograms (horizontal axis) for the 43 species.
The linear prediction equation y = y + r sy sx
(x − x)
is superimposed (as well as contours of the best fitting bivariate normal density).