Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Probability Theory: Random Variables, PDF, CDF, and Bivariate Distributions - Prof. Jenny , Study notes of Mathematical Statistics

An overview of probability theory, focusing on random variables, their probability density functions (pdf) and cumulative distribution functions (cdf), as well as bivariate distributions. It covers discrete and continuous random variables, their means and variances, and the concept of independent random variables.

Typology: Study notes

Pre 2010

Uploaded on 08/30/2009

koofers-user-tqx-1
koofers-user-tqx-1 🇺🇸

5

(2)

10 documents

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Brief Review of Probability
prepared by Professor Jenny Baglivo
c
Copyright 2004 by Jenny A. Baglivo. All Rights Reserved.
These notes are a brief review of probability theory. The complete set of notes for probability
theory are at the MT426 website.
0 Brief Review of Probability 2
0.1 Random variable (RV), PDF, CDF, quantiles . . . . . . . . . . . . . . . . . . . . . . 2
0.2 Expectation, E(X), Var(X), SD(X) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
0.3 Bivariate distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.4 Linear combinations of independent RVs . . . . . . . . . . . . . . . . . . . . . . . . . 8
0.5 Linear combinations of independent normal RVs . . . . . . . . . . . . . . . . . . . . 9
0.6 Central limit theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
0.7 Random samples and sample summaries . . . . . . . . . . . . . . . . . . . . . . . . . 13
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download Probability Theory: Random Variables, PDF, CDF, and Bivariate Distributions - Prof. Jenny and more Study notes Mathematical Statistics in PDF only on Docsity!

Brief Review of Probability

prepared by Professor Jenny Baglivo

©c Copyright 2004 by Jenny A. Baglivo. All Rights Reserved.

These notes are a brief review of probability theory. The complete set of notes for probability theory are at the MT426 website.

0 Brief Review of Probability 2

0.1 Random variable (RV), PDF, CDF, quantiles...................... 2 0.2 Expectation, E(X), Var(X), SD(X)............................ 3

0.3 Bivariate distributions................................... 4 0.4 Linear combinations of independent RVs......................... 8

0.5 Linear combinations of independent normal RVs.................... 9 0.6 Central limit theorem................................... 11 0.7 Random samples and sample summaries......................... 13

0 Brief Review of Probability

0.1 Random variable (RV), PDF, CDF, quantiles

A random variable, X, is a function from the sample space of an experiment to the real numbers. The range of X is the set of values the random variable assumes.

X is said to be (1) a discrete random variable if its range is either a finite set or a countably infinite set, (2) a continuous random variable if its range is an interval or a union of intervals, or (3) a mixed random variable if it is neither discrete nor continuous.

Cumulative distribution function (CDF). The CDF of X is the function

F (x) = P (X ≤ x) for all real numbers x.

If X is a discrete random variable, then F (x) is a step function; if X is continuous, then F (x) is continuous.

Probability density function (PDF). If X is a discrete random variable, then the PDF is a probability:

p(x) = P (X = x) for all real numbers x.

If X is a continuous random variable, then the PDF is a rate:

f (x) = d dx

F (x) whenever the derivative exists.

Quantiles. The pth^ quantile (or 100pth^ percentile), xp, is the point satisfying the equation

P (X ≤ xp) = p.

To find xp, solve the equation F (x) = p for x.

Example 0.1 (Exponential RV) Let X be the continuous random variable with PDF

f (x) = λe−λx^ when x ≥ 0 and 0 otherwise,

where λ is a positive constant.

Given t > 0,

∫ (^) t x=0 λe

−λxdx = 1 − e−λt. Thus, the CDF of X is

F (x) = 1 − e−λx^ when x > 0 and 0 otherwise.

Given 0 < p < 1, 1 − e−λx^ = p =⇒ x = − log(1 − p)/λ

where log() is the natural logarithm function. Thus, the pth^ quantile of X can be written as follows: xp = − log(1 − p)/λ.

The mean and variance of X can be computed as follows:

E(X) = 0 × 0 .4096 + 1 × 0 .4096 + 2 × 0 .1536 + 3 × 0 .0256 + 4 × 0 .0016 = 0. 80

E(X^2 ) = 0^2 × 0 .4096 + 1^2 × 0 .4096 + 2^2 × 0 .1536 + 3^2 × 0 .0256 + 4^2 × 0 .0016 = 1. 28

V ar(X) = E(X^2 ) − E(X)^2 = 1. 28 − 0. 802 = 0. 64

Lastly, the standard deviation of X is SD(X) =

Example 0.3 (Uniform RV) Let X be the uniform random variable on the interval [− 10 , 20]. The PDF of X is as follows:

f (x) = 301 when − 10 ≤ x ≤ 20 and 0 otherwise.

The mean and variance of X can be computed as follows:

E(X) =

∫ (^20) x=− 10 x^ 1 30 dx^ = 5

E(X^2 ) =

∫ (^20) x=− 10 x 2 1 30 dx^ = 100

V ar(X) = E(X^2 ) − E(X)^2 = 100 − 52 = 75

Lastly, the standard deviation of X is SD(X) =

0.3 Bivariate distributions

A probability distribution describing the joint variability of two or more random variables is called a joint distribution. A bivariate distribution is the joint distribution of a pair of random variables.

The joint CDF of the random pair (X, Y ) is the function

F (x, y) = P (X ≤ x, Y ≤ y) for all real pairs (x, y).

Discrete random pairs. The joint PDF of the discrete random pair (X, Y ) is defined as follows:

p(x, y) = P (X = x and Y = y) = P (X = x, Y = y) for all real pairs (x, y).

We sometimes write pXY (x, y) to emphasize the two random variables.

X and Y are said to be independent when

pXY (x, y) = pX (x) pY (y) for all real pairs (x, y),

where pX (x) and pY (y) are the marginal frequency functions of X and Y , respectively. (That is, P (X = x, Y = y) = P (X = x)P (Y = y) for all real pairs (x, y).)

Continuous random pairs. The joint CDF of the continuous random pair (X, Y ) is defined as follows:

f (x, y) =

∂^2

∂x∂y

P (X ≤ x, Y ≤ y) =

∂^2

∂y∂x

P (X ≤ x, Y ≤ y)

when the joint CDF has continuous second partial derivatives. The notation fXY (x, y) is sometimes used to emphasize the two random variables.

X and Y are said to be independent when

fXY (x, y) = fX (x) fY (y) for all real pairs (x, y),

where fX (x) and fY (y) are the marginal density functions of X and Y , respectively.

Expectation. Let g(X, Y ) be a real-valued function. The mean (or expected value or expectation) of g(X, Y ) can be computed as follows:

E(g(X, Y )) =

  

∑ (x,y) g(x, y)p(x, y)^ in the discrete case ∫ (x,y) g(x, y)f^ (x, y)dx^ in the continuous case

where the sum (integral) is over all pairs with nonzero joint PDF. The expectation is defined as long as the sum (integral) converges absolutely.

Properties of sums and integrals, and the fact that the joint PDF of independent random variables equals the product of the marginal PDFs, can be used to prove the following:

  1. If a, b 1 , and b 2 are constants, and gi(X, Y ) are real-valued functions (i = 1, 2), then

E(a + b 1 g 1 (X, Y ) + b 2 g 2 (X, Y )) = a + b 1 g 1 (X, Y ) + b 2 g 2 (X, Y ).

  1. If X and Y are independent, and g(X) and h(Y ) are real-valued functions, then

E(g(X)h(Y )) = E(g(X))E(h(Y )).

Covariance and correlation. Let X and Y be random variables with finite means (μx, μy) and finite standard deviations (σx, σy).

The covariance of X and Y is defined as follows:

Cov(X, Y ) = E((X − μx)(Y − μy)).

The notation σxy = Cov(X, Y ) is often used to denote the covariance. The correlation of X and Y is defined as follows:

Corr(X, Y ) = Cov(X, Y ) σxσy

σxy σxσy

The notation ρ = Corr(X, Y ) is used to denote the correlation. The parameter ρ is called the correlation coefficient.

0

1

2

3 4

x

0

(^21)

(^43)

y

z

0

1

2

3 4

x

z

0.5 1 1.5 2 2.5 3 x

1

2

3

y

0.5 1 1.5 2 2.5 3 x

1

2

3

y

Figure 0.1: Joint PDF for a random pair (left) and region of nonzero density (right).

Since X is a binomial RV with n = 4 and p = 0.20:

E(X) = np = 0. 80 and V ar(X) = np(1 − p) = 0. 64.

Since Y is a binomial RV with n = 4 and p = 0.50:

E(Y ) = np = 2 and V ar(Y ) = np(1 − p) = 1.

Further, E(XY ) =

∑ 4 x=

∑ 4 −x y=0 x y p(x, y) = 0(0.464) + 1(0.108) + 2(0.252) + 3(0.116) + 4(0.060) = 1. 2.

These computations imply that

Cov(X, Y ) = E(XY ) − E(X)E(Y ) = 1. 20 − (0.80)(2.00) = − 0. 40

and

Corr(X, Y ) =

Cov(X, Y ) SD(X)SD(Y )

√ (0.64)(1.00)

Example 0.5 (Tetrahedral Distribution) Let (X, Y ) be a continuous pair whose joint PDF is as follows:

f (x, y) =

(y − x) when 0 < x < y < 3 and 0 otherwise.

The left part of Figure 0.1 shows the solid (of volume 1) under the surface z = f (x, y) and above the xy-plane over the region of non-zero density. The right part of the figure shows the region of non-zero density (0 < x < y < 3).

Then

E(X) =

∫ (^3)

y=

∫ (^) y

x=

xf (x, y) dx dy =

∫ (^3)

y=

y^3 27

dy =

E(X^2 ) =

∫ (^3)

y=

∫ (^) y

x=

x^2 f (x, y) dx dy =

∫ (^3)

y=

y^4 54

dy =

and V ar(X) = E(X^2 ) − (3/4)^2 = 27/80.

Similarly,

E(Y ) =

∫ (^3)

y=

∫ (^) y

x=

yf (x, y) dx dy =

∫ (^3)

y=

y^3 9 dy =

E(Y 2 ) =

∫ (^3)

y=

∫ (^) y

x=

y^2 f (x, y) dx dy =

∫ (^3)

y=

y^4 9

dy =

and V ar(Y ) = E(Y 2 ) − (9/4)^2 = 27/80.

Further,

E(XY ) =

∫ (^3)

y=

∫ (^) y

x=

x y f (x, y) dx‘dy =

∫ (^3)

y=

y^4 27 dy =

These computations imply that

Cov(X, Y ) = E(XY ) − E(X)E(Y ) =

( 3 4

) ( 9 4

) = 0. 1125

and

Corr(X, Y ) =

Cov(X, Y ) SD(X)SD(Y )

√ (27/80)(27/80)

0.4 Linear combinations of independent RVs

We consider linear combinations of the form

Y = a + b 1 X 1 + b 2 X 2 + · · · + bnXn

(a and bi are constants), where

  1. The X 1 , X 2 ,.. ., Xn are mutually independent discrete random variables or mutually independent continuous random variables. That is, where the Xi are discrete with joint PDF: pX (x 1 , x 2 ,... , xn) = p 1 (x 1 )p 2 (x 2 ) · · · pn(xn) for all n-tuples of real numbers (pi(x) is the marginal PDF of Xi), or where the Xi are continuous with joint PDF:

fX (x 1 , x 2 ,... , xn) = f 1 (x 1 )f 2 (x 2 ) · · · fn(xn)

for all n-tuples of real numbers (fi(x) is the marginal PDF of Xi).

  1. Each Xi has a finite mean and a finite standard deviation.

Theorem 0.9 (Independent Normal RVs) Let X 1 , X 2 ,.. ., Xn be independent normal random variables and let

Y = a + b 1 X 1 + b 2 X 2 + · · · + bnXn.

Then Y is a normal random variable.

In particular, if X 1 , X 2 ,.. ., Xn is a random sample from a normal distribution with mean μ and standard deviation σ, then

  1. The sample sum is a normal RV with mean nμ and variance nσ^2 and
  2. The sample mean is a normal RV with mean μ and variance σ^2 /n.

Exercise 0.10 (Composite Score) The personnel department of a large corporation gives two aptitude tests to job applicants. One measures verbal ability; the other, quantitative ability. From many years’ experience, the company has found that the verbal scores tend to be normally distributed with a mean of 50 and a standard deviation of 10. The quantitative scores are normally distributed with a mean of 100 and a standard deviation of 20, and appear to be independent of the verbal scores. A composite score, C, is assigned to each applicant, where C = 2 (Verbal Score) + 3 (Quantitative Score).

If company policy prohibits hiring anyone whose composite score is below 375, what per- centage of applicants will be summarily rejected?

Solution: Let V be the verbal score and Q be the quantitative score of an applicant. Then the composite score C = 2V + 3Q has summary measures

E(C) = 2E(V ) + 3E(Q) = 400 and V ar(C) = 2^2 V ar(V ) + 3^2 V ar(Q) = 4000.

Further, since V and Q are independent normal random variables, C is a normal random variable.

We want P (C < 375) =

( 375 − 400 √ 4000

) ≈ Φ(− 0 .40) = 1 − Φ(0.40) = 0. 3446 ,

where Φ(·) is the CDF of the standard normal random variable.

Thus, about 34.5% of the applicants will be summarily rejected.

Exercise 0.11 (Sample Mean) Let X 1 , X 2 ,... , Xn be a random sample from a normal distribution with mean 10 and standard deviation 3. How large must n be in order that P (X ≤ 10 .5) is at least 99%?

Solution: The sample mean X is a normal random variable with summary measures

E(X) = E(X) = 10 and V ar(X) =

V ar(X) n

n

We need to solve 0.99 = P (X ≤ 10 .5) for n:

(

  1. 5 − 10 √ 9 /n

) =⇒

√ 9 /n

= z 0. 99 = 2. 33 =⇒ n ≈ 195. 44.

Since we want the probability to be at least 99% and since n must be a whole number, our solution is n = 196.

Exact sums. Other situations where we know exact sums are as follows:

  1. If X 1 is a binomial random variable based on n 1 Bernoulli trials with success proba- bility p, X 2 is a binomial random variable based on n 2 Bernoulli trials with success probability p, and X 1 and X 2 are independent, then X 1 + X 2 has a binomial distri- bution with parameters n 1 + n 2 and p.
  2. If X 1 is a Poisson random variable with parameter λ 1 , X 2 is a Poisson random variable with parameter λ 2 , and X 1 and X 2 are independent, then X 1 + X 2 has a Poisson distribution with parameter λ 1 + λ 2.
  3. If X 1 is a negative binomial random variable with parameters r 1 and p, X 2 is a negative binomial random variable with parameters r 2 and p, and X 1 and X 2 are independent, then X 1 + X 2 has a negative binomial distribution with parameters r 1 + r 2 and p.
  4. If X 1 has a gamma distribution with parameters α 1 and β, X 2 has a gamma distri- bution with parameters α 2 and β, and X 1 and X 2 are independent, then X 1 + X 2 has a gamma distribution with parameters α 1 + α 2 and β.

0.6 Central limit theorem

Assume that X 1 , X 2 , X 3 ,... is an infinite sequence of IID random variables, each with the same distribution as X. Let

Sm = X 1 + X 2 + · · · + Xm

be the sum of the first m terms, and let

Xm =

m

Sm

be the average of the first m terms.

If the X distribution has a finite mean and variance, then the central limit theorem says that the distributions of the sample sum Sm and the sample mean Xm are approximately normal when m is large.

The formal statement is as follows:

0.7 Random samples and sample summaries

Recall that a random sample of size n from the X distribution is a list of n mutually independent random variables, each with the same distribution as X.

Sample mean, sample variance. If X 1 , X 2 ,.. ., Xn is a random sample from a distri- bution with mean μ and standard deviation σ, then the sample mean, X, is the random variable X =

n

(X 1 + X 2 + · · · + Xn) ,

the sample variance, S^2 , is the random variable

S^2 =

n − 1

∑^ n

i=

( Xi − X

) 2

and the sample standard deviation, S, is the positive square root of the sample variance. The following theorem can be proven using properties of expectation:

Theorem 0.14 (Sample Summaries) If X is the sample mean and S 2 is the sample variance of a random sample of size n from a distribution with mean μ and standard deviation σ, then

  1. E(X) = μ and V ar(X) = σ^2 /n.
  2. E(S^2 ) = σ^2.

Note that in statistical applications, the observed value of the sample mean is used to estimate an unknown mean μ and the observed value of the sample variance is used to estimate an unknown variance σ^2.

Sample correlation. A random sample of size n from the joint (X, Y ) distribution is a list of n mutually independent random pairs, each with the same distribution as (X, Y ).

If (X 1 , Y 1 ), (X 2 , Y 2 ),... , (Xn, Yn) is a random sample of size n from a bivariate distribution with correlation ρ = Corr(X, Y ), then the sample correlation, R, is the random variable

R =

∑n √ i=1(Xi^ −^ X)(Yi^ −^ Y^ ) ∑n i=1(Xi^ −^ X)^2

∑n i=1(Yi^ −^ Y^ )^2

where X and Y are the sample means of the X and Y samples, respectively.

Note that in statistical applications, the observed value of the sample correlation is used to estimate an unknown correlation ρ.

Example 0.15 (Brain-Body Weights)(Allison & Cicchetti, Science, 194:732-374, 1976; lib.stat.cmu.edu/DASL.) As part of a study on sleep in mammals, researchers collected information on the average brain weight and average body weight for 43 different species.

-3 -2 -1 0 1 2 3 4 x

0

1

2

3

4

y

Figure 0.2: Log-brain weight (vertical axis) versus log-body weight (horizontal axis) for the brain-body study, with linear prediction equation superimposed.

(Body-weight, brain-weight) combinations ranged from (0.05kg,0.14g) for the short-tail shrew to (2547.0kg,4603.0g) for the Asian elephant. The data for “man” are

(62kg,1320g) = (136.4lb,2.9lb).

Let X be the common logarithm of body weight in kilograms, and Y be the common logarithm of brain weight in grams. Sample summaries are as follows:

  1. Mean log-body weight is x = 0.311738, with a SD of sx = 1.33113.
  2. Mean log-brain weight is y = 1.17421, with a SD of sy = 1.09415.
  3. Correlation between log-body weight and log-brain weight is r = 0.951693.

Figure 0.2 compares the common logarithms of average brain weight in grams (vertical axis) and average body weight in kilograms (horizontal axis) for the 43 species.

The linear prediction equation y = y + r sy sx

(x − x)

is superimposed (as well as contours of the best fitting bivariate normal density).