## Measure theory, topology, and the role of examples

Bill Thurston once observed that topology and measure theory are very immiscible (i.e. they don’t mix easily); this statement has always resonated with me, and I thought I would try to explain some of the (personal, psychological, and mathematical) reasons why. On the face of it, topology and measure theory are very closely related. Both are concerned with spaces equipped with certain algebras of sets (open sets, measurable sets) and classes of functions (continuous functions, measurable functions). Continuous functions (on reasonable spaces) are measurable, and (some) measures can be integrated to define continuous functions. However, in my mind at least, they are very different in a psychological sense, and one of the most important ways in which they differ concerns the role of examples.

At the risk of oversimplifying, one might say that one modern mathematical tradition, perhaps exemplified by the Bourbakists, insists that examples are either irrelevant or misleading. There is a famous story about Grothendieck, retold in this article by Allyn Jackson, which goes as follows:

One striking characteristic of Grothendieck’s mode of thinking is that it seemed to rely so little on examples. This can be seen in the legend of the so-called “Grothendieck prime”. In a mathematical conversation, someone suggested to Grothendieck that they should consider a particular prime number. “You mean an actual number?” Grothendieck asked. The other person replied, yes, an actual prime number. Grothendieck suggested, “All right, take $57$”.

Leaving aside the “joke” of Grothendieck’s (supposed) inability to factor $57$, this anecdote has an instructive point. No doubt Grothendieck’s associate was expecting a small prime number such as $2$ or $3$. What would have been the reaction if Grothendieck had said “All right, take $2147483647$“? When one considers examples, one is prone to consider simple examples; of course this is natural, but one must be aware that such examples can be misleading. Morwen Thistlethwaite once made a similar observation about knot theory; from memory he said something like:

When someone asks you to think about a knot, you usually imagine a trefoil, or a figure $8$, or maybe a torus knot. But the right image to have in your mind is a room entirely filled with a long, tangled piece of string.

Note that there is another crucial function of examples, namely their role as counterexamples, which certify the invalidity of a general claim — such counterexamples should, of course, be as simple as possible (and even Grothendieck was capable of coming up with some); but I am concerned here and in the sequel with the role of “confirming” examples, so to speak.

At the other extreme(?), and again at the risk of oversimplifying, one might take the “$A=B$” (or Petkovsek-Wilf-Zeilberger) point of view, that sufficiently good/many examples are proofs. They give a beautifully simple but psychologically interesting example (Theorem 1.4.2 in $A=B$): to show that the angle bisectors of a triangle are coincident, it suffices to verify this for a sufficiently large but finite (explicit) number of examples. The reason such a proof is valid is that the co-ordinates of the pairwise intersections of the angle bisectors are rational functions of (certain trigonometric functions of) the angles, of an explicit (and easily determined) degree, and to prove an identity between rational functions, it suffices to prove that it holds for enough values. Another aspect of the $A=B$ philosophy is that by the process of abstraction, a theorem in one context can become an example in another. For example, “even plus odd equals odd” might be a theorem over $\mathbb{Z}$, but an example over $\mathbb{Z}/2\mathbb{Z}$. One might say that the important thing about examples is that they should be sufficiently general that they exhibit all or enough of the complexity of the general case, and that if enough features of an example can be reimagined or abstracted as parameters, an example can become (or be translated into) a theorem.

In some fields of mathematics, one can make the idea of a “general example” rigorous. In algebraic geometry, one has the concept of a generic point on a scheme; in differential topology, one considers submanifolds in general position; in ergodic theory, one considers a normal number (or sequence in some fixed alphabet). In fact, it is not so clear whether a “formal” generic object in some domain should be thought of as the ultimate example, or as the ultimate rejection of the use of examples! In any case, in practice, when as mathematicians we select examples to test our ideas on, we rarely adhere to a rigorous procedure to ensure that our examples are good ones, and we are therefore susceptible to certain well-known psychological biases. The first is the availability heuristic, as defined by the psychologists Kahneman and Tversky, which says roughly that people tend to overemphasize the importance of examples that they can think of most easily. Why exactly is this bad? Well, because it interacts with another bias, that it is easier to think of an example which is more specific — e.g. it is easier to think of a fruit that starts with the letter “A” than just to think of a fruit. One might argue that this bias is unavoidable, given the nature of the task “think of an example of X” — e.g. it is much easier to find a unique solution (of a differential equation, of a system of linear equations, etc.) than to find a solution to an underdetermined problem. In fact, finding a unique solution is so much easier than solving an underdetermined problem, that one often tries to solve the underdetermined problem by adding constraints until the solution is unique and can be found (e.g. simplex method in linear programming). Conversely, this bias is also part of the explanation for why examples are so useful: the mind devotes more attention and mental resources to a more specific object. So even if one is interested in finding a rigorous and abstract proof, it is often easier to find a proof for a specific example, and then to “rewrite” the proof, replacing the specific example by the general case, and checking that no additional hypotheses are used. The second psychological bias is that of framing. A frame consists of a collection of schemata and stereotypes that provide a context in which an event is interpreted. Many mathematical concepts or objects can be formulated in many different ways which are all logically equivalent, but which frame the concept or object quite differently. The word “bird” suggests (to most people) a schema which involves flight, wings, beaks, etc. The mental image it conjures up will (almost never) resemble a flightless bird like a penguin, or a kiwi, unless extra cues are given, like “a bird indigenous to New Zealand”. A statement about covering spaces might be equivalent to a statement in group theory, but the first might bring to mind topological ideas like paths, continuous maps, compact subsets etc. while the second might suggest homomorphisms, exact sequences, extensions etc., and the examples suggested by the frames might be substantially (mathematically) different, sometimes in crucial ways.

Back to measure theory and topology. In topology, one is frequently (always?) interested in a topological space. Here context is very important — a “topological space” could be a finite set, a graph, a solenoid, a manifold, a Cantor set, a sheaf, a CW complex, a lamination, a profinite group, a Banach space, the space of all compact metric spaces (up to isometry) in the Gromov-Hausdorff metric, etc. By contrast, a “measure space” is an interval, plus some countable (possibly empty) collection of atoms. Of course, one thinks of a measure space much more concretely, by adding some incidental extra structure which is irrelevant to the measure structure, but crucial to the mathematical (and psychological) interest of the space; hence a “measure space” could be the space of infinite sequences in a fixed alphabet, the Sierpinski gasket in the plane, the attractor of an Axiom A diffeomorphism, and so on. In other words, one is typically interested in measure theory as a tool to explore some mathematical object with additional structure, whereas one is frequently interested in topological spaces as objects of intrinsic interest in their own right. Many interesting classes of topological objects can be visualized in great detail — sometimes so much detail that in practice one generates proofs by examining sufficiently complicated examples, and building up clear and detailed mental models of the kinds of phenomena that can occur in a given context. Visualizing a “typical” measurable set (even a subset of $\mathbb{R}$ or the plane) or map is much more difficult, if it is even possible (or, for that matter, a non-measurable set). In fact, one tends routinely to bump up against important subtleties in mathematical logic (especially set theory) when trying even to define such elusive entities as a “typical” measurable subset of the interval. For instance, Solovay’s famous theorem says (amongst other things) that the statement “every set of real numbers is Lebesgue measurable” is consistent with Zermelo-Frankel set theory without the axiom of choice (in fact, Solovay’s result is relative to the existence of certain large cardinals — so-called inaccessible cardinals). Solovay explicitly addresses in his paper the issue of explicitly describing a non-Lesbesgue measurable set or reals:

Of course the axiom of choice is true, and so there are non-measurable sets. It is natural to ask if one can explicitly describe a non-Lesbegue measurable set.

Say that a set of reals $A$ is definable from a set $x_0$ if there is a formula $\Psi(x,y)$ having free only the variables $x$ and $y$, so that $A = \lbrace y : \Psi(x_0,y)\rbrace$. Solovay shows that (again, assuming the existence of an inaccessible cardinal), even if the axiom of choice is true, every set of reals definable from a countable sequence of ordinals is Lebesgue measurable (interestingly enough, one of the most important concepts introduced by Solovay in his paper is the notion of a random real, namely a real number that is not contained in any of a certain class of Borel sets of measure zero, namely those that are rational (i.e. those that can be encoded in a certain precise way); this resonates somewhat with the “generic points” and “normal numbers” mentioned earlier).

If imagining “good” examples in measure theory is hard, what is the alternative? Evidently, it is to imagine “bad” examples, or at least very non-generic ones. Under many circumstances, the “standard” mental image of a measurable map is a piecewise-constant map from the unit interval to a countable (even finite) set. This example rests on two approximations: the process of building up an arbitrary Borel set (in $\mathbb{R}$, say) from half-open intervals by complementing, intersections and unions; and the process of defining an arbitrary measurable function as a limit of a sequence of finite sums of multiples of indicator functions. Such a mental image certainly has its uses, but for my own part I think that if one is going to use such a mental model anyway, one should be aggressive about using one’s intuition about continuous functions and open sets to make the example as specific, as rich and as “generic” as possible, while understanding that the result is not the measurable function or set itself, but only an approximation to it, and one should try to keep in mind a sequence of such maps, with increasing complexity and richness (if possible).

Of course, non-measurable sets do arise in practice. If one wants in the end to prove a theorem whose truth or falsity does not depend on the Axiom of choice, then by Solovay one could do without such sets if necessary. The fact that we do not must mean that the use of non-measurable sets (necessarily constructed using the Axiom of choice) leads to shorter/more findable proofs, or more understandable proofs, or both. Let me mention a few examples of situations in which the Axiom of choice is extremely useful:

1. The Hahn-Banach theorem in functional analysis
2. The existence of ultralimits of sequences of metric spaces (equivalently, the existence of Stone-Cech compactifications)
3. A group $G$ is said to be left orderable if there is a total order $<$ on $G$ such that $g implies $fg < fh$ for all $f,g,h \in G$. If $S$ is a finite subset of nontrivial elements of $G$, the order partitions $S$ into $S^+ \cup S^-$, where the superscript denotes the elements that are greater than, respectively less than the identity element. Suppose for some finite set $S$, and every partition into $S^+ \cup S^-$ some product of elements of one of the subsets (with repeats allowed) is equal to the identity. Then necessarily $G$ is not left orderable. In fact, the converse is also true: if no such “poison” subset exists, then $G$ is left orderable. This follows from the compactness of the set of partitions of $G-\text{id}$ into two subsets $G^+ \cup G^-$ (equivalently, the compactness of the set $\lbrace -1,1\rbrace^{G-\text{id}}$) which follows from Tychonoff’s theorem.
4. The existence of Galois automorphisms of $\mathbb{C}$ over $\mathbb{Q}$ (other than complex conjugation). Such automorphisms are necessarily non-measurable, and therefore (by Solovay) cannot be constructed without the axiom of choice. In fact, this follows from a theorem of Mackey, that any measurable homomorphism between (sufficiently nice, i.e. locally compact, second countable) topological groups is continuous. We give the sketch of a proof. Suppose $f:G \to H$ is given, and without loss of generality, assume it is surjective. Let $U$ be a neighborhood of the identity in $H$, and let $V$ be a symmetric open neighborhood of the identity with $V\cdot V \subset U$. The group $G$ is covered by countably many translates of $f^{-1}(V)$, and therefore the measure of $f^{-1}(V)$ is positive. Let $X \subset f^{-1}(V) \subset W$ where $X$ is compact, $W$ is open, and such that the (Haar) measure of $W$ is less than twice the Haar measure of $X$ (the existence of such an open set $W$ depends on the fact that measure agrees with outer measure for measurable sets). Since $W$ is open, there is an open neighborhood $T$ of the identity in $G$ so that $tX \subset W$ for all $t \in T$. But $tX$ and $X$ both have measure more than half the measure of $W$, so they intersect. Since $V$ is symmetric, so is $f^{-1}(V)$, and therefore $T \subset f^{-1}(V\cdot V)$. This implies $f$ is continuous, as claimed. A continuous Galois automorphism of $\mathbb{C}$ is either the identity, or complex conjugation.

Personally, I think that one of the most compelling reasons to accept the Axiom of choice is psychological, and is related to the phenomenon of closure. If we see a fragment of a scene or a pattern, our mind fills in the rest of the scene or pattern for us. We have no photoreceptor cells in our eyes where the optic nerve passes through the retina, but instead of noticing this gap, we have an unperceived blind spot in our field of vision. If we can choose an element of a finite set whenever we want to, we feel as though nothing would stop us from making such a choice infinitely often. We are inclined to accept a formula with a “for all” quantifier  ranging over an infinite set, if the formula holds every time we check it. We are inclined to see patterns — even where they don’t exist. This is the seductive and dangerous (?) side of examples, and maybe a reason to exercise a little caution.

In fact, this discussion barely scratches the surface (and does not really probe into either topology or measure theory in any deep way). I would be very curious to hear contrasting opinions.

Update (6/20): There are many other things that I could/should have mentioned about the interaction between measure theory and topology, and the difficulty of finding good generic examples in measure theory. For example, I definitely should have mentioned:

1. Lusin’s theorem, which says that a measurable function is continuous on almost all its domain; e.g. if $f$ is any measurable function on an interval $[a,b]$, then for any positive $\epsilon$ there is a compact subset $E \subset [a,b]$ so that the measure of $[a,b] - E$ is at most $\epsilon$, and $f$ is continuous on $E$.
2. von Neumann’s theorem, that a Borel probability measure $\mu$ on the unit cube $I^n$ in $\mathbb{R}^n$ is equivalent to Lesbesgue measure (on the cube) by a self-homeomorphism of the cube (which can be taken to be the identity on the boundary) if and only if it is nonatomic, gives the boundary measure zero, and is positive on nonempty relatively open subsets.
3. Pairs of mutually singular measures of full support on simple sets. For example, let $C$ be the Cantor set of infinite strings in the alphabet $\lbrace 0,1\rbrace$ with its product topology, and define an infinite string $W_\infty$ in $C$ inductively as follows. For any string $w$, define the complement $w^c$ to be the string whose digits are obtained from $w$ by interchanging $0$ and $1$. Then define $W_1$ to be the string $1$, and inductively define $W_{n+1}= W_n W_n^c W_n^c \cdots W_n^c$ where there are $f(n)-1$ copies of $W_n^c$, and $f(n)$ is chosen so that $\prod_n (f(n)-1)/f(n) = r > 1/2$. Let $\Lambda$ be the set of accumulation points of $W_\infty$ under the shift map. Any finite string that appears in $W_\infty$ appears with definite density, so $\Lambda$ is invariant and minimal (i.e. every orbit is dense) under the shift map. However, the proportion of $1$‘s in $W_n$ is at least $r$ for $n$ odd, and at most $1-r$ for $n$ even. Let $d_n$ denote the Dirac measure on the infinite string $W_nW_nW_n\cdots$, and let $e_n$ denote the average of $d_n$ over its (finite) orbit under the shift map. Define $\mu_0^i = \frac 1 i \sum_{j\le i} e_{2j}$ and $\mu_1^i = \frac 1 i \sum_{j\le i} e_{2j+1}$. These probability measures are shift-invariant, and have shift-invariant weak limits $\mu_0,\mu_1$ as $i \to \infty$ with support in $\Lambda$. Moreover, if $\Lambda_j$ denotes the strings in $\Lambda$ that start with $j$ for $j=0,1$, then $\mu_j(\Lambda_j) \ge r$. In particular, the space of shift invariant probability measures on $\Lambda$ is at least $1$-dimensional, and we may therefore obtain distinct mutually singular ergodic shift-invariant probability measures on $\Lambda$. Since $\Lambda$ is minimal, both measures have full support.
4. Shelah’s theorem that if one works in (ZF) plus the axiom of dependent choice, if there is an uncountable well-ordered set of reals, then there is a non-(Lebesgue) measurable set, which shows the necessity of Solovay’s use of inaccessible cardinals. (By Solovay, the axiom of dependent choice is consistent with the statement that every set of reals is Lebesgue-measurable).
This entry was posted in Psychology and tagged , , , , , , , , , , . Bookmark the permalink.

### 4 Responses to Measure theory, topology, and the role of examples

1. Andy P. says:

A very nice entry! One small correction : in example 4, the statement of the result should read “any measurable homomorphism between…topological groups is CONTINUOUS”.

2. Danny Calegari says:

Dear Andy – thanks for the correction! (I also neglected to mention that one must wedge $f^{-1}(V)$ between a compact set and an open set and then play the game with the compact set in place of $f^{-1}(V)$ – doh!)

3. Anonymous says:

Hahn Banach theorem implies the Axiom of Choice.