probability

given a sample space $\text{[math]}$ and an associated sigma-algebra $\text{[math]}$ , a probability function is a function $\text{[math]}$ with domain $\text{[math]}$ that satisfies

$\text{[math]}$ for all $\text{[math]}$ .
$\text{[math]}$ .
if $\text{[math]}$ are pairwise disjoint, then $\text{[math]}$ .

[cite:;taken from @berger_inference_2002 chapter 1 basics of probability theory; definition 1.2.4]

any function $\text{[math]}$ that satisfies the axioms of probability is called a probability function. the axiomatic definition makes no attempt to tell what particular function $\text{[math]}$ to choose; it merely requires $\text{[math]}$ to satisfy the axioms. for any sample space many different probability functions can be defined. which one(s) reflects what is likely to be observed in a particular experiment is still to be discussed.
we need general methods of defining probability functions that we know will always satisfy Kolmogorov's Axioms. we do not want to have to check the axioms for each new probability function. the following gives a common method of defining a legitimate probability function.

let $\text{[math]}$ be a finite set. let $\text{[math]}$ be any sigma algebra of subsets of $\text{[math]}$ . let $\text{[math]}$ be nonnegative numbers that sum to 1. for any $\text{[math]}$ , define $\text{[math]}$ by
$\text{[math]}$ (the sum over an empty set is defined to be 0.) then $\text{[math]}$ is a probability function on $\text{[math]}$ . this remains true if $\text{[math]}$ is a countable set.
[cite:;taken from @berger_inference_2002 chapter 1 basics of probability theory; theorem 1.2.6]

[cite:;refer to @berger_inference_2002 chapter 1 basics of probability theory; example 1.2.7]

before we leave the axiomatic development of probability, there is one further point to consider. axiom 3 of probability.html, which is commonly known as the Axiom of Countable Additivity, is not universally accepted among statisticians. indeed, it can be argued that axioms should be simple, self-evident statements. comparing axiom 3 to the other axioms, which are simple and self-evident, may lead us to doubt whether it is reasonable to assume the truth of axiom 3.
the Axiom of Countable Additivity is rejected by a school of statisticians led by deFinetti (1972), who chooses to replace this axiom with the Axiom of Finite Additivity.

if $\text{[math]}$ and $\text{[math]}$ are disjoint, then
$\text{[math]}$

while this axiom may not be entirely self-evident, it is certainly simpler than the Axiom of Countable Additivity (and is implied by it).
assuming only finite additivity, while perhaps more plausible, can lead to unexpected complications in statistical theory - complications that, at this level, do not necessarily enhance understanding of the subject. we therefore proceed under the assumption that the Axiom of Countable Additivity holds.
[cite:;taken from @berger_inference_2002 chapter 1 basics of probability theory]

if $\text{[math]}$ is a probability function and $\text{[math]}$ is any set in $\text{[math]}$ , then

$\text{[math]}$ ,
$\text{[math]}$ ;
$\text{[math]}$ .

[cite:;taken from @berger_inference_2002 theorem 1.2.8]

if $\text{[math]}$ is a probability function and $\text{[math]}$ and $\text{[math]}$ are any sets in $\text{[math]}$ , then

$\text{[math]}$ ;
$\text{[math]}$ ;
if $\text{[math]}$ , then $\text{[math]}$ .

[cite:;taken from @berger_inference_2002 theorem 1.2.9]

the following theorem gives some useful results for dealing with a collection of sets

if $\text{[math]}$ is a probability function, then

$\text{[math]}$ for any partition $\text{[math]}$ ;
$\text{[math]}$ for any sets $\text{[math]}$ (Boole's inequality).

[cite:;taken from @berger_inference_2002 theorem 1.2.11]

formula (b) of broken link: blk:the-prob-3 gives a useful inequality for the probability of an intersection. since $\text{[math]}$ , we have from broken link: blk:the-prob-2, after some rearranging,
$\text{[math]}$ this inequality is a special case of what is known as Bonferroni's inequality. Bonferroni's inequality allows us to bound the probability of a simultaneous event (the intersection) in terms of the probabilities of the individual events.
there is a similarity between Boole's inequality and Bonferroni's inequality. in fact, they are essentially the same thing. we could have used boole's inequality to derive broken link: blk:the-prob-3. if we apply boole's inequality to $\text{[math]}$ , we have
$\text{[math]}$ and using the facts that $\text{[math]}$ and $\text{[math]}$ , we obtain
$\text{[math]}$ this becomes, on rearranging terms,
$\text{[math]}$ which is a more general version of the Bonferroni inequality of probability.html.

[cite:;taken from @berger_inference_2002]