APPENDIX 2: PROBABILITY

from Ellery Eells, *Probabilistic Causality*. Cambridge University Press, 1991,
pp. 399-402.

In this appendix, I will present some of the basic ideas of the mathematical theory of
probability. As in the case of Appendix 1, this will not be a comprehensive or detailed
survey -- it is only intended to introduce the basic formal probability concepts and rules
used in this book, and to clarify the terminology and notation used in this book. Here I
will discuss only the __abstract and formal__ calculus of probability; in Chapter 1,
the question of __interpretation__ is addressed.

A probability function, __Pr__, is any function (or rule of association) that
assigns to (or associates with) each element __X__ of some Boolean algebra __B__
(see Appendix 1) a real number, __Pr__(__X__), in accordance with the following
three conditions:

For all __X__ and __Y__ in __B__,

(1) __Pr__(__X__) __>__ 0;

(2) __Pr__(__X__) = 1, if __X__ is a tautology (that is, if __X__ is
logically true, or __X__ = __1__ in __B__);

(3) __Pr__(__X__v__Y__) = __Pr__(__X__) + __Pr__(__Y__), if __X__&__Y__
is a contradiction

(that is, if __X__&__Y__ is logically false, or __X__&__Y__ = __0__
in __B__).

These three conditions are the __probability axioms__, also called "the
Kolmogorov axioms" (for Kolmogorov 1933). A function __Pr__ that satisfies the
axioms, relative to an algebra __B__, is said to be a __probability function on B__
-- that is, with "domain" __B__ (that is, the set of propositions of __B__)
and range the closed interval [0,1]. In what follows, reference to an assumed algebra __B__
will be implicit.

In Appendix 1, I explained how the __propositional__ calculus is applicable to
"propositions" understood as sentences or statements as well as to
"propositions" understood as factors or properties -- and the same goes for the __probability__
calculus. Roughly speaking, "__Pr__(__X__) = __r__" can be understood
either as asserting that a __sentence or statement__ __X__ has a probability of __r__
of being __true__ (in a given situation), or as asserting that a __factor or property__
has a probability of __r__ of being __exemplified__ (in a given instance or
population). Specifying an interpretation of the propositions is part what must be done to
"interpret" a probability function on an algebra; the other part is interpreting
"__Pr__". Various interpretations of probability (such as frequency, degree
of belief, and partial logical entailment interpretations) are discussed in Chapter 1;
here, the focus is on the formal calculus.

Here are some easy consequences of the probability axioms.

(4) __Pr__(~__X__) = 1 - __Pr__(__X__), for all __X__.

__Proof__: By (1), __Pr__(__X__v~__X__) = 1; and by (3), __Pr__(__X__v~__X__)
= __Pr__(__X__) + __Pr__(~__X__). So, 1 = __Pr__(__X__) + __Pr__(~__X__),
and thus __Pr__(~__X__) = 1 - __Pr__(__X__).

(5) __Pr__(__X__) = 0, if __X__ is a contradiction.

__Proof__: ~__X__ is a tautology, so by (2), __Pr__(~__X__) = 1. By (4), __Pr__(~__X__)
= 1 -Pr(__X__). So, 1 = 1 - __Pr__(__X__), and thus __Pr__(__X__) = 0.

(6) __Pr__(__X__) = __Pr__(__Y__), if __X__ and __Y__ are logically
equivalent.

__Proof__: __X__ and ~__Y__ are mutually exclusive and __X__v~__Y__ is a
tautology. So by (2), (3), and (4), 1 = __Pr__(__X__v~__Y__) = __Pr__(__X__)
+ __Pr__(~__Y__) = __Pr__(__X__) + 1 - __Pr__(__Y__). So, 1 = __Pr__(__X__)
+ 1 - __Pr__(__Y__), and 0 = __Pr__(__X__) - __Pr__(__Y__), and thus __Pr__(__X__)
= __Pr__(__Y__).

(7) __Pr__(__X__) __<__ __Pr__(__Y__), if __X__ logically implies __Y__.

(8) 0 __<__ __Pr__(__X__) __<__ 1, for all __X__.

(9) __Pr__(__X__v__Y__) = __Pr__(__X__) + __Pr__(__Y__) - __Pr__(__X__&__Y__),
for all __X__ and __Y__.

The probability of __Y__ __conditional on__ (__or given__) __X__, written __Pr__(__Y__/__X__),
is defined to be equal to __Pr__(__X__&__Y__)/__Pr__(__X__). Note that __Pr__(__Y__/__X__)
is defined only when __Pr__(__X__) > 0. Since for any __X__ and __Y__, __Pr__(__X__&__Y__)
= __Pr__(__Y__&__X__) (by (6) above), an immediate consequence of the
definition of conditional probability is what is often called the __multiplication rule__:

(9) __Pr__(__X__&__Y__) = __Pr__(__X__)__Pr__(__Y__/__X__)
= __Pr__(__Y__)__Pr__(__X__/__Y__), for all __X__ and __Y__.

From (9) follows this simple version of __Bayes' theorem__:

__Pr__(__Y__/__X__) = __Pr__(__X__/__Y__)__Pr__(__Y__)/__Pr__(__X__),
for all __X__ and __Y__.

A proposition __Y__ is said to be __probabilistically__ (or __statistically__)
__independent__ of a proposition __X__ if __Pr__(__Y__/__X__) = __Pr__(__Y__).
Alternatively, and equivalently, __Y__'s being probabilistically independent of __X__
can be defined as __Pr__(__X__&__Y__) = __Pr__(__X__)__Pr__(__Y__).
Thus, probabilistic independence is symmetric: if __Y__ is probabilistically
independent of __X__, then __X__ is probabilistically independent of __Y__, for
all __X__ and __Y__.

If propositions __X__ and __Y__ are not probabilistically independent, then there
is said to be a __probabilistic__ (or __statistical__) __correlation__ (or __dependence__)
between __X__ and __Y__. The correlation is called __positive__ or __negative__
according to whether __Pr__(__Y__/__X__) is greater or less than __Pr__(__Y__).
This is sometimes described by saying that __X__ is __positively or negatively
probabilistically relevant to Y__, or that __X has positive or negative probabilistic
significance for Y__. It is easy to see that the following six probabilistic relations
are equivalent:

__Pr__(__Y__/__X__) > __Pr__(__Y__);

__Pr__(__X__/__Y__) > __Pr__(__X__);

__Pr__(__Y__) > __Pr__(__Y__/~__X__);

__Pr__(__X__) > __Pr__(__X__/~__Y__);

Pr(__Y__/__X__) > __Pr__(__Y__/~__X__);

__Pr__(__X__/__Y__) > __Pr__(__X__/~__Y__).

Also, these six relations would remain equivalent if the ">"'s were all
replaced with "<"'s, or with "="'s. Thus, the two kinds of
probabilistic correlation (positive and negative), as well as probabilistic independence,
are symmetric. If __Pr__(__Y__/__Z__&__X__) = __Pr__(__Y__/__Z__&~__X__),
then __Z__ is said to __screen off__ any probabilistic correlation of __Y__ with __X__.

Two propositions __X__ and __Y__ are called __probabilistically equivalent__
if __Pr__((__X__&__Y__)v(~__X__&~__Y__)) = 1. Another way of
putting this is as follows. A common propositional connective, not mentioned in Appendix
1, is the __biconditional connective__, "<->". The biconditional of two
propositions __X__ and __Y__ is the proposition that is true just in case __X__
and __Y__ have the same __truth value__ -- that is, either they are both true or
they are both false. The __biconditional__ of __X__ and __Y__ is often expressed
as "__X__ if and only if __Y__", or, for short, "__X__ iff __Y__"
(__X__ __if__ __Y__, __and__ __X__ __only if__ __Y__). Then __X__
and __Y__ are probabilistically equivalent just when __Pr__(__X__<->__Y__)
= 1. When two propositions __X__ and __Y__ are probabilistically equivalent, then
they are "interchangeable in all probabilistic contexts". That is, given that __X__
and __Y__ are probabilistically equivalent, if (possibly truth-functionally complex)
propositions __Z__(__X__,__Y__) and __W__(__X__,__Y__) result from any
(possibly truth-functionally complex) propositions __Z__ and __W__, respectively, by
changing __X__'s to __Y__'s or __Y__'s to __X__'s, in any way, then __Pr__(__Z__/__W__)
= __Pr__(__Z__(__X__,__Y__)/__W__(__X__,__Y__)).

A generalization of the common idea of an average is the statistical idea of
expectation, or expected value. Given a variable __N__ which can take on the possible
values __n___{1}, ..., __n___{_s}, and a probability __Pr__ on
propositions of the form "__N__ = __n___{_i}", the __expectation__,
or __expected value__, of __N__ (calculated in terms of the probability __Pr__)
is:

SUM_{i=}^{r}_{1} __Pr__(__N__ = __n___{_i})__n___{_i}.

If the probabilities in terms of which an expectation is calculated are conditional
probabilities, then the expectation is a __conditional expectation__, or __conditional
expected value__. For example, if __R__ is a proposition that may be relevant to the
value of __N__, then

SUM_{i=}^{r}_{1} __Pr__(__N__ = __n___{_i}/__R__)__n___{_i}

is a conditional expectation.