Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Bernoulli and Binomial Distributions: Probability of Success in Independent Trials, Study notes of Biostatistics

The concepts of Bernoulli and Binomial distributions, focusing on the probability of success in independent trials. It covers the definitions, probability mass functions, mean and variance, and provides examples and exercises. from a university course on Biostatistics, MATH1015, at SydU, during the first semester of the academic year 2013.

What you will learn

  • What is the probability mass function of a Binomial distribution?
  • What is the difference between a Bernoulli and a Binomial distribution?
  • How do you calculate the mean and variance of a Bernoulli distribution?
  • What is a Bernoulli distribution?

Typology: Study notes

2021/2022

Uploaded on 09/27/2022

shezi
shezi 🇺🇸

4.7

(12)

233 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
MATH1015 Biostatistics Week 5
5 The Binomial Distribution
The binomial distribution plays a very important role in many
life science problems. In order to develop this distribution, now
we look at a related distribution called Bernouilli distribution.
5.1 Bernoulli Distribution (P.43)
Many life science experiments result in responses which have
only two possible outcomes “success” (S) and “failure” (F). Such
responses are called dichotomous. For example, a doctor is in-
terested to know whether the recent medical examination gives
’positive’ or ’negative’ result for cancer for his patient.
Examples:
1. Gender of a new born: “boy” (B) or “girl” (G).
2. Result of an experiment: “success” (S) or “failure” (F).
3. Result of an examination: “pass” (P) or “fail” (F).
Definition: A random variable whose responses are dichoto-
mous is called a Bernoulli random variable.
Note: In many problems, it is easy to use 1 for “success” (S)
and 0 for “failure” (F).
In example 3, let X= 1 if the examination mark Mis over
50 and 0 otherwise. Then Xis the result of dichotomising the
SydU MATH1015 (2013) First semester 1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Bernoulli and Binomial Distributions: Probability of Success in Independent Trials and more Study notes Biostatistics in PDF only on Docsity!

5 The Binomial Distribution

The binomial distribution plays a very important role in many life science problems. In order to develop this distribution, now we look at a related distribution called Bernouilli distribution.

5.1 Bernoulli Distribution (P.43)

Many life science experiments result in responses which have only two possible outcomes “success” (S) and “failure” (F). Such responses are called dichotomous. For example, a doctor is in- terested to know whether the recent medical examination gives ’positive’ or ’negative’ result for cancer for his patient.

Examples:

  1. Gender of a new born: “boy” (B) or “girl” (G).
  2. Result of an experiment: “success” (S) or “failure” (F).
  3. Result of an examination: “pass” (P) or “fail” (F).

Definition: A random variable whose responses are dichoto- mous is called a Bernoulli random variable.

Note: In many problems, it is easy to use 1 for “success” (S) and 0 for “failure” (F).

In example 3, let X = 1 if the examination mark M is over 50 and 0 otherwise. Then X is the result of dichotomising the

random variable M such that

X =

1 if M ≥ 50 , (or pass/success) 0 if M < 50. (or fail/failure).

In general X = 1 denotes the event of a success (S).

  • Let P (X = 1) = P (Success) = p.
  • Therefore, clearly X = 0 denotes the event of a failure (F) and P (X = 0) = P (Failure) = 1 − p, where 0 ≤ p ≤ 1.

Let p(r) = P (X = r). Therefore, the above can be written as

p(r) =

p if r = 1, 1 − p if r = 0. Therefore, the probability distribution of a Bernoulli RV X can be given as

x 1 0 P (X = x) p 1 − p

Then

  • mean of X is E(X) = 1 × p + 0 × (1 − p) = p
  • variance of X is Var(X) = p(1 − p)

Note: E(X^2 ) = 1^2 × p + 0^2 × (1 − p) = p and therefore, Var(X) = E(X^2 ) − [E(X)]^2 = p − p^2 = p(1 − p)

5.2.1 Probability mass function Consider the following example to understand the probability distribution of the number of successes associated with X ∼ B(n, p).

Example: When n = 4, it is clear from a tree diagram that there are 2^4 = 16 possible outcomes altogether. Let X be number of successes (S). All 16 outcomes are given below:

Outcome x Probability Outcome x Probability No. of case SSSS 4 p^4 FFFF 0 (1 − p)^4 each 1 SSSF 3 p^3 (1 − p) SFSS 3 p^3 (1 − p) SSFS 3 p^3 (1 − p) FSSS 3 p^3 (1 − p) 4 SSFF 2 p^2 (1 − p)^2 FFSS 2 p^2 (1 − p)^2 SFSF 2 p^2 (1 − p)^2 FSFS 2 p^2 (1 − p)^2 SFFS 2 p^2 (1 − p)^2 FSSF 2 p^2 (1 − p)^2 FFFS 1 p(1 − p)^3 FSFF 1 p(1 − p)^3 FFSF 1 p(1 − p)^3 SFFF 1 p(1 − p)^3

Then the probability distribution of or pmf 9probability mass function) of X is

P (X = 4) = p^4 P (X = 3) = 4 p^3 (1 − p) P (X = 2) = 6 p^2 (1 − p)^2 P (X = 1) = 4 p(1 − p)^3 P (X = 0) = (1 − p)^4

Binomial Coefficients

The number of ways of selecting 2 items from 4 is denoted by

2

and this value is 6. These are called the number of combinations or the binomial coefficients. That is,

2

= 6 gives the number of combinations choosing 2 trials for “S” from 4 trials. This is also denoted by 4 C 2 and read as “4 choose 2”.

There are many ways to calculate

2

. In a calculator, check for the button (^) nCr or

(n r

. Press 4 and then (^) nCr or

(n r

followed by

  1. Do you get 6?

Exercise: Find 6 C 3 , 7 C 2 , and 9 C 7 from your calculator.

Answer: 20, 21 and 36.

Return back to any sequence of x “S” and n − x “F”, i.e.,

S S| {z· · · S} x S′s

F F| {z· · · F} (n−x) F ′s

It occurs at the same probability of px(1 − p)n−x. The number of combinations of choosing x trials for “S” from the totally n trials is

( (^) n x

. Therefore, the probability of exactly x successes out of n independent trials is given by

P (X = x) =

n x

px(1 − p)n−x, x = 0, 1 ,... , n.

Clearly, a binomial random variable is a sum of n independent Bernoulli random variables.

Example: A biologist estimates that the chance of germination for a type of bean seed is 0.7. A student was given 6 seeds. Let X be the number of seeds germinated from 6 seeds. Assuming that the germination of seeds are independent, explain why the distribution of X is binomial. What are the values of n and p? What are the probabilities that he gets

(a) all seeds germinated,

(b) just one seed not germinated, and

(c) at most four seeds germinated?

Solution: Since the germination of 6 seeds are independent and the outcome is binary, germinated or not, with the same proba- bility of germination being 0.7, the distribution of X is binomial, i.e. X ∼ B(6, 0 .7) with n = 6 and p = 0.7.

(a) P (X = 6) =

(b) P (X = 5) =

(c) P (X ≤ 4) = 1 − P (X ≥ 5)

[(

]

0 1 2 3 4 5 6

(a)

(c)^  (b)

Exercise: Book P.55, Q2.

5.3 Binomial plot

A binomial distribution can be plotted using a bar chart. The fol- lowing plots show different binomial distributions when p varies.

prob.

0 1 2 3 4

(a) p = 0. 1

0 1 2 3 4 5 6

(b) p = 0. 3

0 1 2 3 4 5 6

(c) p = 0. 5

0 1 2 3 4 5 6

(d) p = 0. 7

2 3 4 5 6

(e) p = 0. 9

Binomial distribution for n=

  1. Plot (c) is symmetric. Plots (a) and (b) are skewed (have a heavy tail) to the right since p is low and so low values occur at higher probabilities and high values occur at lower probabilities. On the other hand, plots (d) and (e) are skewed to the left.
  2. Plots (b) & (d) and (a) & (e) are mirror image to each other as their p sum to 1.

The following gives another set of plots when p = 0.1 and n increases. Clearly when n is small, the distribution is skewed but it becomes more symmetric as n increases.

5.4 Use of table and computer (P.47-48)

Example: A pharmaceutical firm has discovered a new diag- nostic test that has 90% chance to indicate a positive result for a patient who are infected by a certain disease. If it is tried on 5 infected patients, find the probability that 4 will be detected.

Solution: Let X be the number of patients out of 5 infected patients who are diagnosed by the test. Since the results are independent across infected patients and the outcome is binary with the same probability of positive result, the distribution of X is binomial, i.e. X ∼ B(5, 0 .9) with n = 5 and p = 0.9.

Clearly there are

4

= 5 ways of choosing 4 “S” from 5 patients. Therefore,

P (X = 4) =

Use of computer: In R, pbinom(x,n,p) gives Pr(X ≤ x), and dbinom(x,n,p) gives Pr(X = x).

Attend the tutorial for the use of R.

Use of binomial table: It gives P (X ≤ x) for n = 2,... , 12 and p = 0. 1 , 0. 2 ,... , 0 .9. For example, using the binomial tables with n = 5, p = 0.4 and x = 1, 2, we have

0 1 2 3 4 5

 rP (X ≤ 1)  rP (X ≤ 2) f

P (X = 2) = P (X ≤ 2) − P (X ≤ 1) = 0. 6826 − 0 .3370 = 0. 3456.

Exercise: Let X ∼ B(5, 0 .9). Find (a) P (X ≤ 4); (b) P (X = 4).

Solution: From the binomial table with n = 5, p = 0.9, x = 3, 4,

(a) P (X ≤ 4) = 0.4095. (b) P (X = 4) = P (X ≤ 4) − P (X ≤ 3). = 0. 4095 − 0 .0815 = 0. 3280.

Exercises:

  1. Check if X ∼ B(7, 0 .2), P (X = 3) = 0.1149;
  2. Check if X ∼ B(11, 0 .1), P (X = 4) = 0.0158;

5.5 Mean and variance of binomial distribu-

tion (P.51-52)

Theorem: If X ∼ B(n, p), then the mean and variance of X are given by

μ = E(X) = np and σ^2 = Var(X) = np(1 − p)

because X = Y 1 + Y 2 + · · · + Yn is the sum of n independent Bernoulli r.v. Yi each with E(Yi) = p and Var(Yi) = p(1 − p).

Example: Let X ∼ B(8, 0 .60). Find E(X), Var(X) and SD(X).

Solution: We have n = 8 and p = 0. 6.

E(X) = np = 8 × 0 .6 = 4. 8.

Var(X) = np(1 − p) = 8 × 0. 6 × 0 .4 = 1. 92.

SD(X)=

6

p

p (1 − p)

  1. 5

  2. 25

0 1

ppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp

Note that Var(X) = np(1 − p) increases from p > 0 and attains its maximum at p = 0.5 for a given n since the uncertainty is greatest when the success and failure are equally likely. Var(X) also increases with n for a given p.

The sum of 2 binomial random variables:

If X 1 ∼ B(n 1 , p) & X 2 ∼ B(n 2 , p), X = X 1 +X 2 ∼ B(n 1 +n 2 , p).

since X = Y 1 + · · · + Yn 1 + | {z } n 1 ; sum=X 1 ∼B(n 1 ,p)

  • Yn 1 +1 + · · · + Yn | {z } n 2 ; sum=X 2 ∼B(n 2 ,p)

. Note that this

result does NOT apply if p differs in X 1 and X 2.

Example: Let X 1 ∼ B(5, 0 .4), X 2 ∼ B(7, 0 .4) and X 3 ∼ B(7, 0 .2). Find the distributions of X 1 + X 2 and X 1 + X 3.

Solution: X 1 + X 2 ∼ B(5 + 7, 0 .4) = B(12, 0 .4) but the distribution of X 1 + X 3 is unknown.

Example: (Soft drinks) Two rival soft drinks, C and P taste the same. In a blindfold test, 12 people are asked (independently) to state their preference for one or the other.

(a) What is the probability that the majority prefer P? (b) How many people out of 12 people would prefer P?

Solution: Let X denote the number of people who prefer P out of 12 people. We have X ∼ B(12, 0 .5) with n = 12 and p = 0.5.

(a) P (X ≥ 7)

= 1 − P (X ≤ 6)

= 1 −

[( 12 0

)

. 50. 512 +

( 12 1

)

. 51. 511 + · · · +

( 12 6

)

. 56. 56

]

= 1 − 0. 512

[( 12 0

)

( 12 1

)

  • · · · +

( 12 6

)]

= 1 − 0 .6128 = 0. 3872. (Table with n = 12, p = 0. 5 , x = 6)

(b) E(X) = np = 12 × 0 .5 = 6.