version of 3 June 1992

By POINCARé's theorem, the interesting property of recurrence is generic
for many dynamical systems, including the motions of the molecules of gas in an
insulated bottle (assuming the validity of classical physics), which has a
state space of huge dimensionality, say 10^{25 }or much more. One of
the most basic and familiar physical facts is that when left alone, macroscopic
matter tends to thermodynamic equilibrium, and this happens largely
independently of initial conditions or even of the composition of the matter.
In the early days of the kinetic theory of gases, J. CLARK MAXWELL and L.
BOLTZMANN hoped to explain this striking phenomenon by appealing to recurrence.
BOLTZMANN enunciated the **ergodic hypothesis**, the original form of which
posits that the trajectory of any given every initial condition passes through
every point of the surface in phase space having the same total energy as the
initial condition. This portion of phase space is known as the **energy
shell**, and because of conservation of energy, we may treat the energy shell
as the state space. If the ergodic hypothesis were true, BOLTZMANN felt, then
the time averages of physical observables would tend, as t ->[[infinity]],
to a spatial average, the average of the observable over the energy shell.
That average is what we would recognize as the thermodynamic equilibrium for
the observable. (Caution: Equilibrium has a different meaning here than in
section II. Thermodynamic equilibrium certainly does not mean that molecules
are at rest, which would only happen at absolute zero temperature.) EHRENFEST
later realized that the original form of the ergodic hypothesis was probably
impossible for physical systems, and modified it to something more in the
spirit of POINCARé recurrence.

Recall one of our simplest models, the circle rotation [[phi]]([[theta]]) = [[theta]] + [[alpha]] mod 2[[pi]], with [[alpha]] an irrational multiple of 2[[pi]]. This model is known from a theorem due to P. BOHL, W. SIERPINSKI, and H. WEYL [ARNOLD-AVEZ, 1968] to have an equidistribution property of the form desired by BOLTZMANN:

**Theorem VII.1**. Consider the circle rotation with [[alpha]]/2[[pi]]
irrational. Given any arc A and any initial angle [[theta]]o, let N(A,k)
denote the number of iterates [[phi]]m([[theta]]o) lying in the arc A for m
<= k. Then

limk->[[infinity]](N(A,k)/k) = length(A)/2[[pi]]. (7.1)

**Exercise **VII.1. The usual proof of this theorem is indirect, using
FOURIER series. Outline a direct proof of this theorem by deriving a
contradiction as follows: Show that if the theorem were false, then there
would be two arcs A1 and A2 with length(A1) < length (A2) but

a( ,lim,k->[[infinity]]) f(N(l(A1,k)),k) > a( ,lim,k->[[infinity]]) f(N(l(A2,k)),k) .

Next use JACOBI's theorem (Exercise III.2) to show that for some fixed finite M, [[phi]]M(A1) Ì A2, and argue that this implies that

a( ,lim,k->[[infinity]]) f(N(l(A1,k)),k) <= a( ,lim,k->[[infinity]]) f(N(l(A2,k)),k) .

It follows that the limit depends only on the length of the arc. Finish the proof by showing that the dependence is a strict proportionality.

**Exercise **VII.2. How does the equidistribution theorem extend to free
motion on a torus?

Now consider what happens when we average some continuous function f([[theta]]) over time, i.e., allowing [[theta]] to evolve into [[phi]]k([[theta]]). Because the sampling points [[phi]]k([[theta]]) are equidistributed, we discover that:

lima( ,N->[[infinity]]) b(f(1,N))isu(k=0,N-1, f([[phi]]k([[theta]]))) =
b(f(1,2[[pi]]))i(S^{1},, f([[theta]])d[[theta]]). (7.2)

In other words, the time-average equals the phase-average. This is the property that BOLTZMANN hoped to derive from his hypothesis for systems of many degrees of freedom.

While initially attractive, the ergodic hypothesis leads to some troubling
paradoxes, notably **ZERMELO's paradox**: If the gas is initially released
from one corner of an insulated bottle, POINCARé's recurrence theorem
predicts that, according to classical physics, at some time all the molecules
will suddenly rush back into the corner of the bottle. According to the
ergodic hypothesis, some analogous phenomenon should eventually happen for
almost every trajectory. But this has never happened in all of history! The
difficulty is that the time required for recurrence in a typical dynamical
system of more than very low dimensionality is vast. Following [THIRRING,
1980], imagine ten clocks running at a frequency of roughly 1 HERTZ, i.e., 1
revolution per second, but suppose that the clocks are not quite accurate, and
the precise frequencies are incommensurate (have irrational ratios). This
dynamical system is mathematically identical to the free motion on a
ten-dimensional torus, with irrational speeds, and, as should have been
discovered in Exercise VII.2, every trajectory is recurrent and has an
equidistribution property. Let us estimate the time it takes for all the
clocks to return to the initial position in synchronism within 1% of a second.
The state space can be divided into (1%)^{-10} = 10^{20} cells
of this margin of error, so it should take 10^{20} seconds to sample
them all and return. The universe is believed to be less than 10^{18}
seconds old.

**Exercise** VII.3. Estimate the recurrence time for a mole of hydrogen
atoms at room temperature confined to a one-liter bottle. Feel free to make
simplifying assumptions and crude approximations.

Paradoxes like this have caused led to the recognition that the important fact is not the original form of BOLTZMANN's hypothesis, but rather that time-averages can be replaced by phase averages. Recurrence and ergodicity (the ability to equate time-averages and phase averages) depend on the existence of invariant measures. Normally, we shall assume that there is an invariant measure u on the state space S, i.e., as in (4.5),

u(U) = u([[phi]]t^{-1}(U)), (7.3)

and that u(S) < [[infinity]]. For convenience let us normalize the measure
so that u(S) = 1, i.e., u is a **probability measure**. Many state spaces
have infinite measure, but it may be possible to restrict to finite subregions;
for instance if conservation of energy applies, motions take place on
subsurfaces of fixed energy, which are typically of finite measure.

**Digression on invariant measures**. In the abstract, it is usually not
difficult to produce invariant measures. A theorem of KOLMOGOROV
(Kolmogocentsrov) and KRYLOV (Krylocentsv) states that if S is a compact
measure space and F maps S to itself continuously, then there exists an
invariant measure on S for F. Unfortunately, this measure can be unpleasantly
singular. A dynamical system may, and usually does, have more than one
invariant measure. For example, a point measure (= delta function) supported
at a fixed point of an iterated mapping is clearly invariant under the
dynamical system, as is a sum of equal point measures supported at the points
of an n-cycle. Thus, for the cat map, not only is dxdy an invariant
probability measure, so are d(x)d(y)dxdy and the measure assigning the weight
1/4 to each of the points (1/3,1/3), (2/3,0), (2/3,2/3), and (1/3,0),
*viz.*,

f(1,4) bbc{(d(x-1/3)d(y-1/3) + d(x-2/3)d(y) + d(x-2/3)d(y-2/3) + d(x-1/3)d(y))dx dy.

Often, different invariant measures will be mutually singular, meaning that the
sets of positive measure according to one of them are sets of measure zero
according to the other, but there are degenerate cases where this is not true.
For the identity mapping, all measures are obviously preserved. For a less
trivial example, consider the simple harmonic oscillator. In appropriate units
its motion is a uniform rotation in the phase plane, so every measure that is
independent of the angular variable in the phase plane will be invariant. For
many purposes, it is desirable to know that the invariant measure is smooth at
least along the unstable directions. Such measures are called SRB measures
after Sinai (Sinacentsi[[sterling]]), Ruelle, and Bowen. Under many
circumstances, SRB measures can be generated directly from the dynamics by
looking at the trajectories beginning with an initial open set (weighted
according to the LEBESGUE measure), known as the **physical measure**.

For a differentiable dynamical system

f(dz,dt) = V(z), (7.4)

a formal calculation in the spirit of LIOUVILLE's Theorem shows that the invariant measures of the form du = w(z)dz satisfy

div(wV) = 0. (7.5)

Of course the invariant measure might easily be singular. When the
coefficients are nice, equation (7.5) has many independent solutions for the
**density**, or **weight**, w(z). A partial differential equation of
this sort is usually solved by the method of characteristics, and, of course,
the characteristic equations for it are precisely (7.4). The invariant
measures are, roughly speaking, arbitrary transversely to the trajectories.

More abstractly, an absolutely integrable density will be a fixed point of a certain operator, called the FROBENIUS-PERRON operator, i.e.,

and any measurable set U,

(7.6)

This discussion above raises the hope that along trajectories, or in one dimension, the dynamics determines a natural invariant measure, and this is frequently the case.

**Example**. Let [[Phi]]1 = F4 = 4x(1-x). Then the FROBENIUS-PERRON
operator can be calculated by taking the sets U to be intervals of the form
[0,x) and differentiating with respect to x:

An invariant density must thus satisfy the functional equation which was analyzed by ULAM and VON NEUMANN, who found the solution This is the same invariant density that is obtained by transforming F4 to the dyadic map and then "pulling back" the Lebesgue measure which is invariant under the latter.

In more general situations, (7.2) is replaced by

lima( ,N->[[infinity]]) b(f(1,N))isu(k=0,N-1, f([[phi]]k(z))) = b(f(1,u(S)))i(S,, f(z)du(z)), (7.7)

for a discrete dynamical system , and by

lima( ,T->[[infinity]]) b(f(1,T))i(0,T, f([[phi]]t(z)))dt = b(f(1,u(S)))i(S,, f(z)du(z)), (7.8)

for a continuous dynamical system.

If f is a function of z[[propersubset]]S, let us refer to its **time mean**,
i.e., the left side of (7.7) or (7.8), as f*, and its **phase mean**, i.e.,
the right side of (7.7) or (7.8), as f^. Their general relationship is
described by a theorem of BIRKHOFF and KHINCHIN (Xicentsnhin):

**Theorem VII.2**. Let S be a finite-measure space, and assume that f is
absolutely integrable and measure preserving. Then

a) f*(z) exists for a.e. z, and f*(z) is absolutely integrable;

b) For a.e. z, and all times t, f*([[phi]]t(z)) = f*(z), i.e., the time mean is time-invariant other than a null set of z.

c) (f*)^ = f^. (7.9)

In this theorem, the condition of absolute integrability can be replaced by
square-integrability. The variant of this theorem where the convergence of the
time-mean is asserted in the integrated (L^{1} or L^{2}) sense
is due to VON NEUMANN.

**Definition**. A dynamical system with u(S) < [[infinity]] is
**ergodic **iff f* = f^ a.e.

The use of the word ergodic has evolved somewhat since BOLTZMANN's day, and on occasion it is still used, for example, to refer to the density of a trajectory (cf. section V). In addition, several modern sources use apparently different definitions which are in fact equivalent to the one used here, according to the following theorem:

**Theorem VII.3**. A dynamical system with u(S) =1 is ergodic** **if and
only if

a) the only invariant subsets I Ì S have measure u(I) = 0 or

1; or, equivalently,

b) the only invariant real-valued functions f on S are constant a.e.

The value of this theorem is that properties a) and b) are often easier to check than the equality of time and phase means. In this theorem, invariant means invariant in time n or t, while constant means constant in z. Think of real-valued functions on S as physical observables of the system.

**Proof**. a) implies b): Assume a) and that a function f is
time-invariant. If f is invariant, then the set U[[alpha]] = {z: f(z) >=
[[alpha]]} is invariant for any real number [[alpha]], and therefore has
measure 0 or 1. Now, as [[alpha]]->-[[infinity]], u(U[[alpha]]) = 1,
whereas as [[alpha]]->+[[infinity]], u(U[[alpha]]) = 0. Since
U[[alpha]]Ì U[[beta]] for [[alpha]]>[[beta]], there is a single
critical value [[alpha]]c where the measure changes from 1 to 0, and, clearly,
f(z) = [[alpha]] a.e.

b) implies a): Assume b) and that a set U is time-invariant. Then the indicator function for U, i.e.,

^{[[chi]]}U := blc{(a(l(1, x[[propersubset]]U), ,l(0,
x[[reflexsubset]]U)))

must also be invariant. The only way this function can be constant a.e. is for u(U) to be 0 or 1.

Ergodic implies b): Suppose that f is invariant under an ergodic
transformation. Invariance implies that f(x) = f*(x), but ergodicity implies
that f*(x) is a constant, *viz.*, f^, a.e.

b) implies ergodic: By the BIRKHOFF-KHINCHIN theorem, for any f(x), f*(x) is time-invariant. Assuming b), f*(x) is constant a.e. This constant must be its average, i.e., f*^ = f^.

x( )

Properties a) and b) are pretty easy to see for simple models such as the circle rotations with [[alpha]]/2[[pi]] irrational. There are often nontrivial invariant sets of zero measure; for instance an n-cycle is a finite number of points, and thus normally has measure zero. We shall see that the cat map is ergodic, and, as we know, the points with rational coördinates are periodic. Although the rational points form a dense set, the measure of the rationals is zero, so that fact does not contradict the theorem.

An area with perhaps surprisingly many applications of ergodic dynamics is
number theory [FURSTENBERG, 1981]. To get the flavor of this, let's
investigate the distribution of first digits of the numbers 2^{n}, n =
0, 1, ...[following ARNOLD-AVEZ, 1968]. These digits are 1, 2, 4, 8, 1, 3, 6,
1, 2, 5, 1, 2, 4, 8, 1, .... For large n, what is the probability that the
first digit is, say, 7? Is it less than that of 8? The calculator reveals
that the first 7 occurs for 2^{46}. The condition that the first digit
of 2^{n} is k is that

k 10^{r} < 2^{n} < (k+1)10^{r},

or, equivalently,

r + log10(k)< n[[alpha]] < r + log10(k+1), (7.10)

where [[alpha]] is the irrational number log10(2) [[congruent]] .30103. The interval [0,1) is the non-overlapping union of the intervals [log10(1), log10(2)), [log10(2), log10(3)), ..., [log10(9), log10(10)), so we can unambiguously calculate mod 1:

log10(k) < n[[alpha]] mod 1 < log10(k+1)

The numbers {n[[alpha]]} mod 1 correspond to irrational rotations of the point 0 on a circle, scaled to have circumference 1 rather than 2[[pi]], and are thus equidistributed in [0,1) by the theorem of BOHL et al. It follows that the the limiting probability of a first digit being k is

log10(k+1) - log10(k) = log10b(f(k+1,k)).

**Exercise** VII.4. Find the distribution of the first digits of
a^{n} in base b arithmetic, where a and b are fixed integers.

In 1938, BENFORD observed that the first digits of many disparate tables of data are distributed according to this same law. BENFORD checked the surface areas of rivers, street addresses of prominent citizens, etc. It is something of a mystery why BENFORD's rule should hold so universally, but ergodicity is probably a large part of the explanation.

Ergodicity has important consequences for the notion of a LYAPUNOV exponent. As we know, in one dimension, by Equation (6.2), the LYAPUNOV exponent is a time average,

[[lambda]](x) = lim b(f(1,N)) isu(k=0,N-1, ln|F'(xk)|) .

**Exercise** VII.5. Show that the LYAPUNOV exponent for the quadratic
mapping F4 is ln2, using a direct calculation with the invariant measure

If the derivative is bounded away from 0 and [[infinity]], then the BIRKHOFF-KHINCHIN theorem guarantees that the LYAPUNOV exponent is defined for a.e. x. If the dynamical system is ergodic, then [[lambda]](x) is independent of x a.e. These facts have been extended to the many-dimensional case by the multiplicative ergodic theorem of OSELEDEC (Oseledecentsc):

**Theorem VII.4**. Suppose that <S, [[phi]]t, u> is an ergodic
dynamical system, and T(z) is an nxn matrix-valued function of z satisfying

Let Tk(z) := T([[phi]]k-1(z))T([[phi]]k-2(z)) ... T(z). Then For a.e. z, the limit

exists. The logarithms of its eigenvalues are the LYAPUNOV exponents.

In this theorem, log^{+}(x) = max(0, ln(x)). A variant of the theorem
states that for a.e. z, there is a nested sequence of subspaces R^{n}
E^{(1)}(z) ... of the tangent space R^{n} at z,
such that if
**v** [[propersubset]] E^{(j)}\E^{(j+1)},
**v** != 0, then

The notation is such that the LYAPUNOV exponents are enumerated in decreasing
order: [[lambda]]^{(1)} > [[lambda]]^{(2)} > ....

The example of the circle rotations shows that there is nothing particularly
chaotic about a merely ergodic motion, and since I criticized POINCARÉ
recurrence earlier because of the extraordinarily long times required for the
recurrence to take place, the alert reader will have been asking for some
comparable estimates in the context of ergodic theory. The notion of a
**correlation function** makes it possible to speak of a characteristic time
in this sense:

**Definition**. Given an ergodic probability measure u, the **correlation
function** for two sets of positive measure U and V is

C(t;U,V) := u([[phi]]t^{-1}(U)[[intersection]]V) - u(U)u(V).

The word **autocorrelation function** is often used when U=V. If a system
is topologically transitive, it often suffices to consider only the
autocorrelation functions. A related quantity is the **time autocorrelation
function**,

C(t,T;U) := u([[phi]]t+T(U)[[intersection]][[phi]]t(U)) -
u([[phi]]t(U))^{2}.

The term **power spectrum** refers to the FOURIER transform of a correlation
function.

We can speak of a characteristic time [[tau]] for the decay of correlations if, as is frequently the case, |C(t;U,V)| <= const. exp(-t/[[tau]]). The existence of a [[tau]] of the right order of magnitude is actually what is required for thermodynamic equilibrium. For the circle rotations, there is no exponential upper bound, so in effect [[tau]]=[[infinity]].

The minimal desideratum for a statistical understanding of a dynamical system
as t->[[infinity]] is that the correlations tend to 0 in that limit. This
is known as the **mixing** property, a term which also originated in
statistical mechanics, in the work of GIBBS. It can be thought of as a
measure-theoretic replacement for the concept of topological transitivity. We
suppose without loss of generality that u(S) = 1, in order to simplify matters
slightly:

**Definition**. A dynamical system is **mixing** if for any pair of
measurable sets U and V,

limt->[[infinity]] u([[phi]]t^{-1}(U) [[intersection]] V) = u(U)
u(V). (7.11)

Again, if the dynamical system is invertible, (7.11) can be replaced with

limt->[[infinity]] u(U [[intersection]] [[phi]]t(V)) = u(U) u(V). (7.12)

Mixing can be thought of as a statement of statistical independence after long times. The conditional probability of an event U occurring at time t given event V at time 0 is

f(u([[phi]]t^{-1}(U) [[intersection]] V),u(V)).

According to (7.11) this probability is asymptotically the same as u(U), the
probability of event U. Since the flow is measure-preserving, u(U) =
u([[phi]]t^{-1}(U)), and

f(u([[phi]]t^{-1}(U) [[intersection]] V),u(V)) ->
u([[phi]]t^{-1}(U))

In other words, if a system is mixing, information about initial conditions becomes worthless for making predictions after long times.

Suppose that a dynamical system is mixing and U = V is invariant. Then we find
that u(U) = u(U)^{2}, so u(U) = 0 or 1. The theorem stated above then
shows that the dynamical system is ergodic. Mixing implies ergodic, but not
vice versa: For instance, circular rotations are not mixing.

**Exercise** VII.6. Show that the circle rotation is not mixing.

The cat map is mixing, as is visually demonstrated by looking at the image of some figure in the rectangle 0 <= x,y <= 1 after many iterations, and that fact can be proved by using the Markov partition and symbolic dynamics for the cat map [e.g., ARNOLD-AVEZ, 1968, section 10]. The situation is a bit easier to understand for some of our other models, such as the baker's transformation.

Let us begin with the dyadic mapping, F(x) = 2x mod 1. As we know, in binary
arithmetic this mapping is equivalent to the left shift on the digits of x.
The usual measure dx is invariant: If I is an interval in [0,1), then
F^{-1}(I) consists of two intervals, one to the left of 1/2 and one to
the right. Both intervals are exactly half as long as I. Here it is
important to use the definition with inverse images (7.3), since the dyadic map
is not uniquely invertible. We also observe that the set of
x[[propersubset]][0,1) such that r of its binary digits have been specified (0
or 1) is a set of intervals of total length 2^{-r}

**Exercise** VII.7. Prove this observation.

Suppose now we take two intervals A = [k2^{-n,} (k+1)2^{-n})
and B = [j2^{-m}, (j+1)2^{-m}). A represents a set of numbers
where the first n digits have been specified, while B represents a set of
numbers where the first m digits have been specified. If p>m, then
F^{-p}(A) (i.e., (F[[ring]]^{p})^{-1}(A)) is a set of
numbers with digits between p+1 and p+n specified, so
F^{-p}(A)[[intersection]]B has n+m digits specified, and thus has
measure 2^{-
n-
m} = u(A)u(B).
The definition of mixing is satisfied for such intervals. Since all BOREL
measurable sets can be generated from these basic intervals by complementation
and countable union and intersection, it is not hard to see that (7.11) must
hold for all A and B.

**Example**. The transformation (x,y) -> (2x mod 1, 2y mod 1) on the
unit square is likewise mixing, by a similar argument.

**Example**. The mapping x -> frac(1/x) (except 0->0) on [0,1) is
mixing with respect to the invariant measure

du = f(1,ln2) f(dx,1+x)

(cf. Exercise IV.11).

**Exercise **VII.8. Is the baker's transformation mixing? Answer the
question by considering the case of rectangles whose corners have
coördinates of the form k2^{-n}, and arguing that general sets can
be approximated by combinations of such rectangles.

**Hint**. Why would a baker use this transformation if it weren't mixing?
Consider the case of two rectangles of the form

A = {j 2^{-k} < x < (j+1) 2^{-k}, l 2^{-m} <
y < (l+1) 2^{-m}}

and

B = {n 2^{-p} < x < (n+1) 2^{-p}, q2^{-r} <
y < (q+1) 2^{-r}},

and think about intersections of B with [[phi]]N(A) for large N, which will
consist of a large number of horizontal bands of height 2^{-n-N}.

**Example**. Let us return to the BERNOULLI shift [[sigma]] of section V,
choosing for S the symbol sequences [[Sigma]]2 (with no restrictions), and
calling the two symbols H and T, for heads and tails. There is a natural
probability measure on U Ì [[Sigma]]2 defined as the probability that
flipping coins will produce a sequence in U. For example, if U is the set of
all sequences with s5 = H, s7 = H, and s9997354638 = T, then the probability is
2^{-3} = 1/8, since there were three specifications. This should be
reminiscent of Exercise VII.7, and in fact the proof that the system
<[[sigma]], [[Sigma]]2> is mixing is just the same as for the dyadic map.
It makes no difference, by the way, whether we work with the unilateral or
bilateral version of [[Sigma]]2.

Suppose, however, that the coin is biased, so that the probability of heads is p != .5 and the probability of tails is q = 1-p. Flipping this coin will also define a probability measure on [[Sigma]]2, and it is not hard to see that it is invariant, ergodic, and even mixing with respect to [[sigma]] by a slight modification of the argument given above. This is disturbing, since if f is any reasonable real-valued function on [[Sigma]]2, we expect the time average of f, which makes no reference to a measure, to be equal to all the different phase averages of f with different choices of p. Obviously those different phase averages cannot always be the same. For example, suppose that we equate H with the numerical value 1 and T with the value 0, and that f(s) = s0.s1 (product of the values of s0 and s1). With the fair coin, p=.5, the expectation value of f is obtained by giving equal weights to the four values H.H = 1, H.T = 0, T.H = 0, and T.T = 0, i.e., i(,, f(s) du.5(s)) = .25. With a biased coin favoring H with p = .9, we get .81.1 + .09.0 + .09.0 + .01.0 = .81.

**Exercise** VII.9. What's going on here?

A useful observation is that if two dynamical systems are topologically conjugate, and one of them has an invariant measure, then so does the other. If the one system is ergodic, or mixing, then so is the other, with respect to the right measure. Thus, suppose that S is a homeomorphism between S and T and the dynamical systems for x[[propersubset]]S and y[[propersubset]]T are related by

[[phi]]t(x) = S^{-1}([[psi]]t(S(x)).

If there is a measure d[[nu]] on T it can be **pulled back** to define a
measure d[[rho]] on S by the formula. (The ability to pull back a measure does
not actually require a homeomorphism.)

[[rho]](U) := [[nu]](S(U)). (7.13)

In particular, the quadratic mapping for coupling u > 2+r(5) is mixing with respect to the invariant measure corresponding to assigning probability p to the interval IR to the right of and the probability q:=1-p to the interval IL to the left of . The resulting measures are singular with respect to LEBESGUE measure and are supported on the fractal set [[Lambda]].

Another use of this observation is to point the way to a proof that the cat map is mixing. Once we have established a nontrivial symbolic dynamics for a system, it will be mixing with respect to any measure that is topologically conjugate to one of the mixing measures on the BERNOULLI system. As a general rule, in the absence of a symbolic dynamics it is extremely difficult to prove that a realistic system is mixing.

It is fairly easy, by the way, to generate an impression of measures like the balanced measure on [[Lambda]] for the quadratic mapping on a computer screen. The function ux(1-x) has two inverses, one assigning values in IR and the other values in IL to x[[propersubset]][[Lambda]]. If one begins at a randomly chosen point and iterates the two inverses randomly with probabilities p and q, and plots the iterates, they will almost certainly distribute themselves according to the desired measure. Problems will obviously occur if one starts off on an n-cycle, but it is not actually necessary to start off in [[Lambda]], since the inverses will attract other points onto [[Lambda]]. In order to get a reasonable picture, one should not plot the first hundred iterates, or however many iterates it takes to get points within a small error of [[Lambda]].

**Exercise** VII.10. Carry out this computer analysis.

Complicated dynamical systems typically have structures exhibiting one or more
of the following **signs of chaos** :

a) A positive LYAPUNOV exponent;

b) The mixing property;

c) Fractal geometry.

Ordinarily, a dynamical system regarded as chaotic will exhibit at least two of these signs, and different scientists often use the word "chaotic" in different senses, although examples of [GREBOGI et al., 1984] have convinced people that fractal geometry can occur in a non-chaotic setting, whereas many examples regarded as chaotic, such as the cat map, do not have obvious fractal structures. A working definition of chaos, in the spirit of DEVANEY's definition, might be a) and b) together. To see that a) is independent of b), and unconvincing by itself, observe that the dynamical system on R defined by the equation of motion dx/dt = 2x has a positive LYAPUNOV exponent (=2). An example of GIRSANOV (Girsacentsnov) [ARNOLD-AVEZ, 1968, p. 45] shows that a system can be mixing without having a positive LYAPUNOV exponent. One problem with using the mixing property as the criterion for chaos, is that it is as a rule very difficult to prove for realistic systems.

**Exercise** VII.11. Consider the dynamical system on S = {0 <= z <
[[infinity]]} defined by

F(z) = blc{(a(l(f(2z,1-z^{2}), 0 <= z < 1), ,
l(f(z^{2}-1,2z), 1 <= z < [[infinity]]))).

This is ergodic with respect to the measure du = f(2,[[pi]])
f(dz,1+z^{2}) . (Take this as given.)

a) Find the Lyapunov exponent for some point z0 != 0 of your own choosing.

b) Does almost every point have the same Lyapunov exponent? Justify your answer, and, if possible, find [[lambda]].

c) Verify or disprove that this transformation is chaotic (in the sense of Devaney).

**Hint**. tan(2[[theta]]) = f(2tan[[theta]],1-tan^{2}[[theta]]).

The **entropy** of a physical system is another notion having its origins in
thermodynamics and statistical mechanics, which was used as the central concept
of information theory by SHANNON and WEAVER [1949] and was brought into general
dynamical systems theory by KOLMOGOROV (Kolmogocentsrov). Suppose that we wish
to quantify the amount of information obtained by measuring a system.
Intuitively, information is additive; the information gained by making two
independent measurements is the sum of the information gained by making each
one. Moreover, if we make a measurement and are lucky enough to obtain an
improbable outcome, that means that we have gained more information than if we
obtain a more probably outcome; the police have an easier time if they are
certain the criminal is a seven foot tall left-handed woman with flaming red
hair than if the description matches dozens of people in any K-Mart. Suppose
now that we measure two independent events with probabilities P1 and P2, so
that their joint probability is P1P2. What familiar function could describe
the information gained by these measurements? The answer is -
ln(P),
where P is the probability of the event measured. Since P <= 1,
this is positive, and since -
ln(P1P2)
= -
ln(P1) + (-
ln(P2)),
it is additive.

Now suppose that an experiment has k possible distinct outcomes, with probabilities p1...pk. Without knowing the outcomes of an experiment in advance, the expected increase of information would be

**Definition**. Let z(t) := - t ln(t) for 0 < t <= 1, and z(0) := 0,
and let [[alpha]] denote a finite measurable partition of S, i.e., a collection
of nonintersecting subsets Aj whose union is all of S. The **entropy**
**of the partition** is

(7.14)

**Remarks**. Think of the partition as an exhaustive specification of the
possible outcomes of an experiment. Many sources prefer to use log2 rather
than ln, so that the rate of information produced is measured in binary bits.
Historically, the entropy as defined in thermodynamics is an integral of the
reciprocal of the temperature with respect to the heat content, and appears to
have little to do with this function. When trying to justify thermodynamics on
the basis of the theory of statistical physics, however, BOLTZMANN showed that
the entropy was equivalent to his **H function**, which is an integral over
velocity space of f(**v**,t) ln(f(**v**,t))

The definition given above has no dynamics in it. If we want quantify the information gained by watching the outcome of an experiment again and again as the dynamical system runs, we define an entropy as follows:

If [[alpha]] and [[beta]] are two partitions of S, we let [[alpha]][[logicalor]][[beta]] be the partition consisting of all possible intersections of the form Aj[[intersection]]Bk, Aj[[propersubset]][[alpha]], Bk[[propersubset]][[beta]]. This is a refinement of either original partition. The information value of such a refined partition corresponding to multiple experiments is greater than that of either individual experiment.

**Definition**. The **entropy of the partition [[alpha]] with respect to
the dynamics** [[phi]]1 is defined as

(7.15)

In words, this is the long-time average of the rate of increase of information
as one continues to observe the dynamics. A theorem in [ARNOLD-AVEZ, 1968]
guarantees the convergence of this limit. One final improvement can be made,
*viz.*, to make an optimal choice of the experimental set-up:

**Definition**. The **entropy **h([[phi]]1,u) **of the dynamical
system** is defined as the supremum of h([[alpha]],[[phi]]1,u) over all
finite measurable partitions.

In some cases the supremum may be avoided, if one can find a special sort of
partition, known as a **generating partition**.

**Example**. Consider the BERNOULLI system <[[sigma]],[[Sigma]]N, u>
where u is the measure independently assigning probability pk to the N
possible values of the symbol value for coördinate sk,
p1+...+pN = 1. Let us call the N symbols the numbers 1...N, and
suppose that our initial partition {Ak} consists of all elements of [[Sigma]]N
with s1 = 1; s1 = 2; etc. The entropy of this partition is

whereas if the partition is refined by taking intersections with [[phi]]s(-1,1)(Ak), we get

When the partition is dynamically refined n times, we get exactly n times , so the entropy of the partition with respect to the dynamics is . It turns out that this partition is the optimal one, so the entropy of the system has this same value.

Notice that if all pk= 1/N, then h = ln(N), in particular the entropy of [[Sigma]]2 with the usual fair-coin measure is ln(2). Entropy is an invariant under topological conjugacy, provided that one measure is the pull-back of the other as in (7.13). One interesting consequence of this is that different BERNOULLI systems such as <[[sigma]],[[Sigma]]2> with the fair-coin measure and, say, <[[sigma]],[[Sigma]]3> with the three symbols likewise equally probable, are not topologically equivalent.

We have seen that the quadratic mapping F4 is topologically conjugate to the dyadic map, with an invariant measure which is the pull-back of the usual LEBESGUE measure. From the measure-theoretic point of view, the dyadic map is just like <[[sigma]],[[Sigma]]2,ufair coin>, so the entropy is ln(2). This is, curiously, the same as the LYAPUNOV exponent. Indeed, a formula due to PESIN (Pecentssin),

(7.16)

has been proved to hold under wide circumstances. Specifically, suppose that u is an SRB measure:

**Definition**. Suppose that u is an ergodic measure and U Ì S with
u(U) > 0 can be written as a disjoint union of pieces U[[alpha]] of unstable
manifolds V[[alpha]]. If all of the conditional probability measures obtained
by restricting u to U[[alpha]] are absolutely continuous with respect to the
LEBESGUE measure for V[[alpha]], then u is an **SRB measure**.

If there is more than one choice of U, it turns out that this definition does not depend on the choice.

**Theorem VII.5** (LEDRAPPIER and YOUNG). If F is a smooth (C^{2})
diffeomorphism on a compact set S and u is an ergodic measure, then (7.16)
holds if and only if u is an SRB measure.

Unfortunately, there are some reasonable examples where no SRB measure exists.

In addition to being related to the LYAPUNOV exponents via (7.16), the entropy
of a system is also connected with the notion of fractal dimension. If u is an
invariant probability measure for a dynamical system, the **information
dimension** can be defined as the infimum of the HAUSDORFF dimensions of all
sets A of full measure, u(A) = 1.