You are currently browsing the monthly archive for June 2009.

I recently made the final edits to my paper “Positivity of the universal pairing in 3 dimensions”, written jointly with Mike Freedman and Kevin Walker, to appear in Jour. AMS. This paper is inspired by questions that arise in the theory of unitary TQFT’s. An n+1-dimensional TQFT (“topological quantum field theory”) is a functor Z from the category of smooth oriented n-manifolds and smooth cobordisms between them, to the category of (usually complex) vector spaces and linear maps, that obeys the (so-called) monoidal axiom Z(A \coprod B) = Z(A) \otimes Z(B). The monoidal axiom implies that Z(\emptyset)=\mathbb{C}. Roughly speaking, the functor associates to a “spacelike slice” — i.e. to each n-manifold A — the vector space of “quantum states” on A (whatever they are), denoted Z(A). A cobordism stands in for the physical idea of the universe and its quantum state evolving in time. An n+1-manifold W bounding A can be thought of as a cobordism from the empty manifold to A, so Z(W) is a linear map from \mathbb{C} to Z(A), or equivalently, a vector in Z(A) (the image of 1 \in \mathbb{C}).

Note that as defined above, a TQFT is sensitive not just to the underlying topology of a manifold, but to its smooth structure. One can define variants of TQFTs by requiring more or less structure on the underlying manifolds and cobordisms. One can also consider “decorated” cobordism categories, such as those whose objects are pairs (A,K) where A is a manifold and K is a submanifold of some fixed codimension (usually 2) and whose morphisms are pairs of cobordisms (W,S) (e.g.  Wilson loops in a 2+1-dimensional TQFT).

In realistic physical theories, the space of quantum states is a Hilbert space — i.e. it is equipped with a nondegenerate inner product. In particular, the result of pairing a vector with itself should be positive. One says that a TQFT with this property is unitary. In the TQFT, reversing the orientation of a manifold interchanges a vector space with its dual, and pairing is accomplished by gluing diffeomorphic manifolds with opposite orientations. It is interesting to note that many 3+1-dimensional TQFTs of interest to mathematicians are not unitary; e.g. Donaldson theory, Heegaard Floer homology, etc. These theories depend on a grading, which prevents attempts to unitarize them. It turns out that there is a good reason why this is true, discussed below.

Definition: For any n-manifold S, let \mathcal{M}(S) denote the complex vector space spanned by the set of n+1-manifolds bounding S, up to a diffeomorphism fixed on S. There is a pairing on this vector space — the universal pairing — taking values in the complex vector space \mathcal{M} spanned by the set of closed n+1-manifolds up to diffeomorphism. If \sum_i a_iA_i and \sum_j b_jB_j are two vectors in \mathcal{M}(A), the pairing of these two vectors is equal to the formal sum \sum_{ij} a_i\overline{b}_j A_i\overline{B}_j where overline is complex conjugation on numbers, and orientation-reversal on manifolds, and A_i\overline{B}_j denotes the closed manifold obtained by gluing {}A_i to \overline{B}_j along S.

The point of making this definition is the following. If v \in \mathcal{M}(S) is a vector with the property that \langle v,v\rangle_S = 0 (i.e. the result of pairing v with itself is zero), then Z(v)=0 for any unitary TQFT Z. One says that the universal pairing is positive in n+1 dimensions if every nonzero vector v pairs nontrivially with itself.

Example: The Mazur manifold M is a smooth 4-manifold with boundary S. There is an involution \theta of S that does not extend over M, so M,\theta(M) denote distinct elements of \mathcal{M}(S). Let v = M - \theta(M), their formal difference. Then the result of pairing v with itself has four terms: \langle v,v\rangle_S = M\overline{M} - \theta(M)\overline{M} - M\overline{\theta(M)} + \theta(M)\overline{\theta(M)}. It turns out that all four terms are diffeomorphic to S^4, and therefore this formal sum is zero even though v is not zero, and the universal pairing is not positive in dimension 4.

More generally, it turns out that unitary TQFTs cannot distinguish s-cobordant 4-manifolds, and therefore they are insensitive to essentially all “interesting” smooth 4-manifold topology! This “explains” why interesting 3+1-dimensional TQFTs, such as Donaldson theory and Heegaard Floer homology (mentioned above) are necessarily not unitary.

One sees that cancellation arises, and a pairing may fail to be positive, if there are some unusual “coincidences” in the set of terms A_i\overline{B}_j arising in the pairing. One way to ensure that cancellation does not occur is to control the coefficients for the terms appearing in some fixed diffeomorphism type. Observe that the “diagonal” coefficients a_i\overline{a}_i are all positive real numbers, and therefore cancellation can only occur if every manifold appearing as a diagonal term is diffeomorphic to some manifold appearing as an off-diagonal term. The way to ensure that this does not occur is to define some sort of ordering or complexity on terms in such a way that the term of greatest complexity can occur only on the diagonal. This property — diagonal dominance — can be expressed in the following way:

Definition: A pairing \langle \cdot,\cdot \rangle_S as above satisfies the topological Cauchy-Schwarz inequality if there is a complexity function \mathcal{C} defined on all closed n+1-manifolds, so that if {}A,B are any two n+1-manifolds with boundary S, there is an inequality \mathcal{C}(A\overline{B}) \le \max(\mathcal{C}(A\overline{A}),\mathcal{C}(B\overline{B})) with equality if and only if A=B.

The existence of such a complexity function ensures diagonal dominance, and therefore the positivity of the pairing \langle\cdot,\cdot\rangle_S.

Example: Define a complexity function \mathcal{C} on closed 1-manifolds, by defining \mathcal{C}(M) to be equal to the number of components of M. This complexity function satisfies the topological Cauchy-Schwarz inequality, and proves positivity for the universal pairing in 1 dimension.

Example: A suitable complexity function can also be found in 2 dimensions. The first term in the complexity is number of components. The second is a lexicographic list of the Euler characteristics of the resulting pieces (i.e. the complexity favors more components of bigger Euler characteristic). The first term is maximized if and only if the pieces of A and B are all glued up in pairs with the same number of boundary components in S; the second term is then maximized if and only if each piece of A is glued to a piece of B with the same Euler characteristic and number of boundary components — i.e. if and only if A=B.

Positivity holds in dimensions below 3, and fails in dimensions above 3. The main theorem we prove in our paper is that positivity holds in dimension 3, and we do this by constructing an explicit complexity function which satisfies the topological Cauchy-Schwarz inequality.

Unfortunately, the function itself is extremely complicated. At a first pass, it is a tuple c=(c_0,c_1,c_2,c_3) where c_0 treats number of components, c_1 treats the kernel of \pi_1(S) \to \pi_1(A) under inclusion, c_2 treats the essential 2-spheres, and c_3 treats prime factors arising in the decomposition.

The term c_1 is itself very interesting: for each finite group G Witten and Dijkgraaf constructed a real unitary TQFT Z_G (i.e. one for which the resulting vector spaces are real), so that roughly speaking Z_G(S) is the vector space spanned by representations of \pi_1(S) into G up to conjugacy, and Z_G(A) is the vector that counts (in a suitable sense) the number of ways each such representation extends over \pi_1(A). The value of Z_G on a closed manifold is roughly just the number of representations of the fundamental group in G, up to conjugacy. The complexity c_1 is obtained by first enumerating all isomorphism classes of finite groups G_1,G_2,G_3 \cdots and then listing the values of Z_{G_i} in order. If the kernel of \pi_1(S) \to \pi_1(A) is different from the kernel of \pi_1(S) \to \pi_1(B), this difference can be detected by some finite group (this fact depends on the fact that 3-manifold groups are residually finite, proved in this context by Hempel); so c_1 is diagonal dominant unless these two kernels are equal; equivalently, if the maximal compression bodies of S in A and B are diffeomorphic rel. S. It is essential to control these compression bodies before counting essential 2-spheres, so this term must come before c_2 in the complexity.

The term c_3 has a contribution c_p from each prime summand. The complexity c_p itself is a tuple c_p = (c_S,c_h,c_a) where c_S treats Seifert-fibered pieces, c_h treats hyperbolic pieces, and c_a treats the way in which these are assembled in the JSJ decomposition. The term c_h is quite interesting; evaluated on a finite volume hyperbolic 3-manifold M it gives as output the tuple c_h(M) = (-\text{vol}(M),\sigma(M)) where \text{vol}(M) denotes hyperbolic volume, and \sigma(M) is the geodesic length spectrum, or at least those terms in the spectrum with zero imaginary part. The choice of the first term depends on the following theorem:

Theorem: Let S be an orientable surface of finite type so that each component has negative Euler characteristic, and let {}A,B be irreducible, atoroidal and acylindrical, with boundary S. Then A\overline{A},A\overline{B},B\overline{B} admit unique complete hyperbolic structures, and either 2\text{vol}(A\overline{B}) > \text{vol}(A\overline{A})+\text{vol}(B\overline{B}) or else 2\text{vol}(A\overline{B}) = \text{vol}(A\overline{A}) + \text{vol}(B\overline{B}) and S is totally geodesic in A\overline{B}.

This theorem is probably the most technically difficult part of the paper. Notice that even though in the end we are only interested in closed manifolds, we must prove this theorem for hyperbolic manifolds with cusps, since these are the pieces that arise in the JSJ decomposition. This theorem was proved for closed manifolds by Agol-Storm-Thurston, and our proof follows their argument in general terms, although there are more technical difficulties in the cusped case. One starts with the hyperbolic manifold A\overline{B}, and finds a least area representative of the surface S. Cut along this surface, and double (metrically) to get two singular metrics on the topological manifolds A\overline{A} and B\overline{B}. The theorem will be proved if we can show the volume of this singular metric is bigger than the volume of the hyperbolic metric. Such comparison theorems for volume are widely studied in geometry; in many circumstances one defines a geometric invariant of a Riemannian metric, and then shows that it is minimized/maximized on a locally symmetric metric (which is usually unique in dimensions >2). For example, Besson-Courtois-Gallot famously proved that a negatively curved locally symmetric metric on a manifold uniquely minimizes the volume entropy over all metrics with fixed volume (roughly, the entropy of the geodesic flow, at least when the curvature is negative).

Hamilton proved that if one rescales Ricci flow to have constant volume, then scalar curvature R satisfies R' = \Delta R + 2|\text{Ric}_0|^2 + \frac 2 3 R(R-r) where \text{Ric}_0 denotes the traceless Ricci tensor, and r denotes the spatial average of the scalar curvature R. If the spatial minimum of R is negative, then at a point achieving the minimum, \Delta R is non-negative, as are the other two terms; in other words, if one does Ricci flow rescaled to have constant volume, the minimum of scalar curvature increases (this fact remains true for noncompact manifolds, if one substitutes infimum for maximum). Conversely, if one rescales to keep the infimum of scalar curvature constant, volume decreases under flow. In 3 dimensions, Perelman shows that Ricci flow with surgery converges to the hyperbolic metric. Surgery at finite times occurs when scalar curvature blows up to positive infinity, so surgery does not affect the infimum of scalar curvature, and only makes volume smaller (since things are being cut out). Consequently, Perelman’s work implies that of all metrics on a hyperbolic 3-manifold with the infimum of scalar curvature equal to -6, the constant curvature metric is the unique metric minimizing volume.

Now, the metric on A\overline{A} obtained by doubling along a minimal surface is not smooth, so one cannot even define the curvature tensor. However, if one interprets scalar curvature as an “average” of Ricci curvature, and observes that a minimal surface is flat “on average”, then one should expect that the distributional scalar curvature of the metric is equal to what it would be if one doubled along a totally geodesic surface, i.e. identically equal to -6. So Perelman’s inequality should apply, and prove the desired volume estimate.

To make this argument rigorous, one must show that the singular metric evolves under Ricci flow, and instantaneously becomes smooth, with R \ge -6. A theorem of Miles Simon says that this follows if one can find a smooth background metric with uniform bounds on the curvature and its first derivatives, and which is 1+\epsilon-bilipschitz to the singular metric. The existence of such a background metric is essentially trivial in the closed case, but becomes much more delicate in the cusped case. Basically, one needs to establish the following comparison lemma, stated somewhat informally:

Lemma: Least area surfaces in cusps of hyperbolic 3-manifolds become asymptotically flat faster than the thickness of the cusp goes to zero.

In other words, if one lifts a least area surface S to a surface \tilde{S} in the universal cover, there is a (unique) totally geodesic surface \pi (the “osculating plane”) asymptotic to \tilde{S} at the fixed point of the parabolic element corresponding to the cusp, and satisfying the following geometric estimate. If B_t is the horoball centered at the parabolic fixed point at height t (for some horofunction), then the Hausdorff distance between \tilde{S} \cap B_t and \pi \cap B_t is o(e^{-t}). One must further prove that if a surface S has multiple ends in a single cusp, these ends osculate distinct geodesic planes. Given this, it is not too hard to construct a suitable background metric. Between ends of S, the geometry looks more and more like a slab wedged between two totally geodesic planes. The double of this is a nonsingular hyperbolic manifold, so it certainly enjoys uniform control on the curvature and its first derivatives; this gives the background metric in the thin part. In the thick part, one can convolve the singular metric with a bump function to find a bilipschitz background metric; compactness of the thick part implies trivially that any smooth metric enjoys uniform bounds on the curvature and its first derivatives. Hence one may apply Simon, and then Perelman, and the volume estimate is proved.

The Seifert fibered case is very fiddly, but ultimately does not require many new ideas. The assembly complexity turns out to be surprisingly involved. Essentially, one thinks of the JSJ decomposition as defining a decorated graph, whose vertices correspond to the pieces in the decomposition, and whose edges control the gluing along tori. One must prove an analogue of the topological Cauchy-Schwarz inequality in the context of (decorated) graphs. This ends up looking much more like the familiar TQFT picture of tensor networks, but a more detailed discussion will have to wait for another post.

Mapping class groups (also called modular groups) are of central importance in many fields of geometry. If S is an oriented surface (i.e. a 2-manifold), the group \text{Homeo}^+(S) of orientation-preserving self-homeomorphisms of S is a topological group with the compact-open topology. The mapping class group of S, denoted \text{MCG}(S) (or \text{Mod}(S) by some people) is the group of path-components of \text{Homeo}^+(S), i.e. \pi_0(\text{Homeo}^+(S)), or equivalently \text{Homeo}^+(S)/\text{Homeo}_0(S) where \text{Homeo}_0(S) is the subgroup of homeomorphisms isotopic to the identity.

When S is a surface of finite type (i.e. a closed surface minus finitely many points), the group \text{MCG}(S) is finitely presented, and one knows a great deal about the algebra and geometry of this group. Less well-studied are groups of the form \text{MCG}(S) when S is of infinite type. However, such groups do arise naturally in dynamics.

Example: Let G be a group of (orientation-preserving) homeomorphisms of the plane, and suppose that G has a bounded orbit (i.e. there is some point p for which the orbit Gp is contained in a compact subset of the plane). The closure of such an orbit Gp is compact and G-invariant. Let K be the union of the closure of Gp with the set of bounded open complementary regions. Then K is compact, G-invariant, and has connected complement. Define an equivalence relation \sim on the plane whose equivalence classes are the points in the complement of K, and the connected components of K. The quotient of the plane by this equivalence relation is again homeomorphic to the plane (by a theorem of R. L. Moore), and the image of K is a totally disconnected set k. The original group G admits a natural homomorphism to the mapping class group of \mathbb{R}^2 - k. After passing to a G-invariant closed subset of k if necessary, we may assume that k is minimal (i.e. every orbit is dense). Since k is compact, it is either a finite discrete set, or it is a Cantor set.

The mapping class group of \mathbb{R}^2 - \text{finite set} contains a subgroup of finite index fixing the end of \mathbb{R}^2; this subgroup is the quotient of a braid group by its center. There are many tools that show that certain groups G cannot have a big image in such a mapping class group.

Much less studied is the case that k is a Cantor set. In the remainder of this post, we will abbreviate \text{MCG}(\mathbb{R}^2 - \text{Cantor set}) by \Gamma. Notice that any homeomorphism of \mathbb{R}^2 - \text{Cantor set} extends in a unique way to a homeomorphism of S^2, fixing the point at infinity, and permuting the points of the Cantor set (this can be seen by thinking of the “missing points” intrinsically as the space of ends of the surface). Let \Gamma' denote the mapping class group of S^2 - \text{Cantor set}. Then there is a natural surjection \Gamma \to \Gamma' whose kernel is \pi_1(S^2 - \text{Cantor set}) (this is just the familiar Birman exact sequence).

The following is proved in the first section of my paper “Circular groups, planar groups and the Euler class”. This is the first step to showing that any group G of orientation-preserving diffeomorphisms of the plane with a bounded orbit is circularly orderable:

Proposition: There is an injective homomorphism \Gamma \to \text{Homeo}^+(S^1).

Sketch of Proof: Choose a complete hyperbolic structure on S^2 - \text{Cantor set}. The Birman exact sequence exhibits \Gamma as a group of (equivalence classes) of homeomorphisms of the universal cover of this hyperbolic surface which commute with the deck group. Each such homeomorphism extends in a unique way to a homeomorphism of the circle at infinity. This extension does not depend on the choice of a representative in an equivalence class, and one can check that the extension of a nontrivial mapping class is nontrivial at infinity. qed.

This property of the mapping class group \Gamma does not distinguish it from mapping class groups of surfaces of finite type (with punctures); in fact, the argument is barely sensitive to the topology of the surface at all. By contrast, the next theorem demonstrates a significant difference between mapping class groups of surfaces of finite type, and \Gamma. Recall that for a surface S of finite type, the group \text{MCG}(S) acts simplicially on the complex of curves \mathcal{C}(S), a simplicial complex whose simplices are the sets of isotopy classes of essential simple closed curves in S that can be realized mutually disjointly. A fundamental theorem of Masur-Minsky says that \mathcal{C}(S) (with its natural simplicial path metric) is \delta-hyperbolic (though it is not locally finite). Bestvina-Fujiwara show that any reasonably big subgroup of \text{MCG}(S) contains lots of elements that act on \mathcal{C}(S) weakly properly, and therefore such groups admit many nontrivial quasimorphisms. This has many important consequences, and shows that for many interesting classes of groups, every homomorphism to a mapping class group (of finite type) factors through a finite group. In view of the potential applications to dynamics as above, one would like to be able to construct quasimorphisms on mapping class groups of infinite type.

Unfortunately, this does not seem so easy.

Proposition: The group \Gamma' is uniformly perfect.

Proof: Remember that \Gamma' denotes the mapping class group of S^2 - \text{Cantor set}. We denote the Cantor set in the sequel by C.

A closed disk D is a dividing disk if its boundary is disjoint from C, and separates C into two components (both necessarily Cantor sets). An element g \in \Gamma is said to be local if it has a representative whose support is contained in a dividing disk. Note that the closure of the complement of a dividing disk is also a dividing disk. Given any dividing disk D, there is a homeomorphism of the sphere \varphi permuting C, that takes D off itself, and so that the family of disks \varphi^n(D) are pairwise disjoint, and converge to a limiting point x \in C. Define h to be the infinite product h = \prod_i \varphi^i g \varphi^{-i}. Notice that h is a well-defined homeomorphism of the plane permuting C. Moreover, there is an identity [h^{-1},\varphi] = g, thereby exhibiting g as a commutator. The theorem will therefore be proved if we can exhibit any element of \Gamma' as a bounded product of local elements.

Now, let g be an arbitrary homeomorphism of the sphere permuting C. Pick an arbitrary p \in C. If g(p)=p then let h be a local homeomorphism taking p to a disjoint point q, and define g' = hg. So without loss of generality, we can find g' = hg where h is local (possibly trivial), and g'(p) = q \ne p. Let {}E be a sufficiently small dividing disk containing p so that g'(E) is disjoint from {}E, and their union does not contain every point of C. Join {}E to g'(E) by a path in the complement of C, and let D be a regular neighborhood, which by construction is a dividing disk. Let f be a local homeomorphism, supported in D, that interchanges {}E and g'(E), and so that f g' is the identity on D. Then fg' is itself local, because the complement of the interior of a dividing disk is also a dividing disk, and we have expressed g as a product of at most three local homeomorphisms. This shows that the commutator length of g is at most 3, and since g was arbitrary, we are done. qed.

The same argument just barely fails to work with \Gamma in place of \Gamma'. One can also define dividing disks and local homeomorphisms in \Gamma, with the following important difference. One can show by the same argument that local homeomorphisms in \Gamma are commutators, and that for an arbitrary element g \in \Gamma there are local elements h,f so that fhg is the identity on a dividing disk; i.e. this composition is anti-local. However, the complement of the interior of a dividing disk in the plane is not a dividing disk; the difference can be measured by keeping track of the point at infinity. This is a restatement of the Birman exact sequence; at the level of quasimorphisms, one has the following exact sequence: Q(\Gamma') \to Q(\Gamma) \to Q(\pi_1(S^2 - C))^{\Gamma'}.

The so-called “point-pushing” subgroup \pi_1(S^2 - C) can be understood geometrically by tracking the image of a proper ray from C to infinity. We are therefore motivated to consider the following object:

Definition: The ray graph R is the graph whose vertex set is the set of isotopy classes of proper rays r, with interior in the complement of C, from a point in C to infinity, and whose edges are the pairs of such rays that can be realized disjointly.

One can verify that the graph R is connected, and that the group \Gamma acts simplicially on R by automorphisms, and transitively on vertices.

Lemma: Let g \in \Gamma and suppose there is a vertex v \in R such that v,g(v) share an edge. Then g is a product of at most two local homeomorphisms.

Sketch of proof: After adjusting g by an isotopy, assume that r and g(r) are actually disjoint. Let E,g(E) be sufficiently small disjoint disks about the endpoint of r and g(r), and \alpha an arc from {}E to g(E) disjoint from r and g(r), so that the union r \cup E \cup \alpha \cup g(E) \cup g(r) does not separate the part of C outside E \cup g(E). Then this union can be engulfed in a punctured disk D' containing infinity, whose complement contains some of C. There is a local h supported in a neighborhood of E \cup \alpha \cup g(E) such that hg is supported (after isotopy) in the complement of D' (i.e. it is also local). qed.

It follows that if g \in\Gamma has a bounded orbit in R, then the commutator lengths of the powers of g are bounded, and therefore \text{scl}(g) vanishes. If this is true for every g \in \Gamma, then Bavard duality implies that \Gamma admits no nontrivial homogeneous quasimorphisms. This motivates the following questions:

Question: Is the diameter of R infinite? (Exercise: show \text{diam}(R)\ge 3)

Question: Does any element of \Gamma act on R with positive translation length?

Question: Can one use this action to construct nontrivial quasimorphisms on \Gamma?

Bill Thurston once observed that topology and measure theory are very immiscible (i.e. they don’t mix easily); this statement has always resonated with me, and I thought I would try to explain some of the (personal, psychological, and mathematical) reasons why. On the face of it, topology and measure theory are very closely related. Both are concerned with spaces equipped with certain algebras of sets (open sets, measurable sets) and classes of functions (continuous functions, measurable functions). Continuous functions (on reasonable spaces) are measurable, and (some) measures can be integrated to define continuous functions. However, in my mind at least, they are very different in a psychological sense, and one of the most important ways in which they differ concerns the role of examples.

At the risk of oversimplifying, one might say that one modern mathematical tradition, perhaps exemplified by the Bourbakists, insists that examples are either irrelevant or misleading. There is a famous story about Grothendieck, retold in this article by Allyn Jackson, which goes as follows:

One striking characteristic of Grothendieck’s mode of thinking is that it seemed to rely so little on examples. This can be seen in the legend of the so-called “Grothendieck prime”. In a mathematical conversation, someone suggested to Grothendieck that they should consider a particular prime number. “You mean an actual number?” Grothendieck asked. The other person replied, yes, an actual prime number. Grothendieck suggested, “All right, take 57”.

Leaving aside the “joke” of Grothendieck’s (supposed) inability to factor 57, this anecdote has an instructive point. No doubt Grothendieck’s associate was expecting a small prime number such as 2 or 3. What would have been the reaction if Grothendieck had said “All right, take 2147483647“? When one considers examples, one is prone to consider simple examples; of course this is natural, but one must be aware that such examples can be misleading. Morwen Thistlethwaite once made a similar observation about knot theory; from memory he said something like:

When someone asks you to think about a knot, you usually imagine a trefoil, or a figure 8, or maybe a torus knot. But the right image to have in your mind is a room entirely filled with a long, tangled piece of string.

Note that there is another crucial function of examples, namely their role as counterexamples, which certify the invalidity of a general claim — such counterexamples should, of course, be as simple as possible (and even Grothendieck was capable of coming up with some); but I am concerned here and in the sequel with the role of “confirming” examples, so to speak.

At the other extreme(?), and again at the risk of oversimplifying, one might take the “A=B” (or Petkovsek-Wilf-Zeilberger) point of view, that sufficiently good/many examples are proofs. They give a beautifully simple but psychologically interesting example (Theorem 1.4.2 in A=B): to show that the angle bisectors of a triangle are coincident, it suffices to verify this for a sufficiently large but finite (explicit) number of examples. The reason such a proof is valid is that the co-ordinates of the pairwise intersections of the angle bisectors are rational functions of (certain trigonometric functions of) the angles, of an explicit (and easily determined) degree, and to prove an identity between rational functions, it suffices to prove that it holds for enough values. Another aspect of the A=B philosophy is that by the process of abstraction, a theorem in one context can become an example in another. For example, “even plus odd equals odd” might be a theorem over \mathbb{Z}, but an example over \mathbb{Z}/2\mathbb{Z}. One might say that the important thing about examples is that they should be sufficiently general that they exhibit all or enough of the complexity of the general case, and that if enough features of an example can be reimagined or abstracted as parameters, an example can become (or be translated into) a theorem.

In some fields of mathematics, one can make the idea of a “general example” rigorous. In algebraic geometry, one has the concept of a generic point on a scheme; in differential topology, one considers submanifolds in general position; in ergodic theory, one considers a normal number (or sequence in some fixed alphabet). In fact, it is not so clear whether a “formal” generic object in some domain should be thought of as the ultimate example, or as the ultimate rejection of the use of examples! In any case, in practice, when as mathematicians we select examples to test our ideas on, we rarely adhere to a rigorous procedure to ensure that our examples are good ones, and we are therefore susceptible to certain well-known psychological biases. The first is the availability heuristic, as defined by the psychologists Kahneman and Tversky, which says roughly that people tend to overemphasize the importance of examples that they can think of most easily. Why exactly is this bad? Well, because it interacts with another bias, that it is easier to think of an example which is more specific — e.g. it is easier to think of a fruit that starts with the letter “A” than just to think of a fruit. One might argue that this bias is unavoidable, given the nature of the task “think of an example of X” — e.g. it is much easier to find a unique solution (of a differential equation, of a system of linear equations, etc.) than to find a solution to an underdetermined problem. In fact, finding a unique solution is so much easier than solving an underdetermined problem, that one often tries to solve the underdetermined problem by adding constraints until the solution is unique and can be found (e.g. simplex method in linear programming). Conversely, this bias is also part of the explanation for why examples are so useful: the mind devotes more attention and mental resources to a more specific object. So even if one is interested in finding a rigorous and abstract proof, it is often easier to find a proof for a specific example, and then to “rewrite” the proof, replacing the specific example by the general case, and checking that no additional hypotheses are used. The second psychological bias is that of framing. A frame consists of a collection of schemata and stereotypes that provide a context in which an event is interpreted. Many mathematical concepts or objects can be formulated in many different ways which are all logically equivalent, but which frame the concept or object quite differently. The word “bird” suggests (to most people) a schema which involves flight, wings, beaks, etc. The mental image it conjures up will (almost never) resemble a flightless bird like a penguin, or a kiwi, unless extra cues are given, like “a bird indigenous to New Zealand”. A statement about covering spaces might be equivalent to a statement in group theory, but the first might bring to mind topological ideas like paths, continuous maps, compact subsets etc. while the second might suggest homomorphisms, exact sequences, extensions etc., and the examples suggested by the frames might be substantially (mathematically) different, sometimes in crucial ways.

Back to measure theory and topology. In topology, one is frequently (always?) interested in a topological space. Here context is very important — a “topological space” could be a finite set, a graph, a solenoid, a manifold, a Cantor set, a sheaf, a CW complex, a lamination, a profinite group, a Banach space, the space of all compact metric spaces (up to isometry) in the Gromov-Hausdorff metric, etc. By contrast, a “measure space” is an interval, plus some countable (possibly empty) collection of atoms. Of course, one thinks of a measure space much more concretely, by adding some incidental extra structure which is irrelevant to the measure structure, but crucial to the mathematical (and psychological) interest of the space; hence a “measure space” could be the space of infinite sequences in a fixed alphabet, the Sierpinski gasket in the plane, the attractor of an Axiom A diffeomorphism, and so on. In other words, one is typically interested in measure theory as a tool to explore some mathematical object with additional structure, whereas one is frequently interested in topological spaces as objects of intrinsic interest in their own right. Many interesting classes of topological objects can be visualized in great detail — sometimes so much detail that in practice one generates proofs by examining sufficiently complicated examples, and building up clear and detailed mental models of the kinds of phenomena that can occur in a given context. Visualizing a “typical” measurable set (even a subset of \mathbb{R} or the plane) or map is much more difficult, if it is even possible (or, for that matter, a non-measurable set). In fact, one tends routinely to bump up against important subtleties in mathematical logic (especially set theory) when trying even to define such elusive entities as a “typical” measurable subset of the interval. For instance, Solovay’s famous theorem says (amongst other things) that the statement “every set of real numbers is Lebesgue measurable” is consistent with Zermelo-Frankel set theory without the axiom of choice (in fact, Solovay’s result is relative to the existence of certain large cardinals — so-called inaccessible cardinals). Solovay explicitly addresses in his paper the issue of explicitly describing a non-Lesbesgue measurable set or reals:

Of course the axiom of choice is true, and so there are non-measurable sets. It is natural to ask if one can explicitly describe a non-Lesbegue measurable set.

Say that a set of reals A is definable from a set x_0 if there is a formula \Psi(x,y) having free only the variables x and y, so that A = \lbrace y : \Psi(x_0,y)\rbrace. Solovay shows that (again, assuming the existence of an inaccessible cardinal), even if the axiom of choice is true, every set of reals definable from a countable sequence of ordinals is Lebesgue measurable (interestingly enough, one of the most important concepts introduced by Solovay in his paper is the notion of a random real, namely a real number that is not contained in any of a certain class of Borel sets of measure zero, namely those that are rational (i.e. those that can be encoded in a certain precise way); this resonates somewhat with the “generic points” and “normal numbers” mentioned earlier).

If imagining “good” examples in measure theory is hard, what is the alternative? Evidently, it is to imagine “bad” examples, or at least very non-generic ones. Under many circumstances, the “standard” mental image of a measurable map is a piecewise-constant map from the unit interval to a countable (even finite) set. This example rests on two approximations: the process of building up an arbitrary Borel set (in \mathbb{R}, say) from half-open intervals by complementing, intersections and unions; and the process of defining an arbitrary measurable function as a limit of a sequence of finite sums of multiples of indicator functions. Such a mental image certainly has its uses, but for my own part I think that if one is going to use such a mental model anyway, one should be aggressive about using one’s intuition about continuous functions and open sets to make the example as specific, as rich and as “generic” as possible, while understanding that the result is not the measurable function or set itself, but only an approximation to it, and one should try to keep in mind a sequence of such maps, with increasing complexity and richness (if possible).

Of course, non-measurable sets do arise in practice. If one wants in the end to prove a theorem whose truth or falsity does not depend on the Axiom of choice, then by Solovay one could do without such sets if necessary. The fact that we do not must mean that the use of non-measurable sets (necessarily constructed using the Axiom of choice) leads to shorter/more findable proofs, or more understandable proofs, or both. Let me mention a few examples of situations in which the Axiom of choice is extremely useful:

  1. The Hahn-Banach theorem in functional analysis
  2. The existence of ultralimits of sequences of metric spaces (equivalently, the existence of Stone-Cech compactifications)
  3. A group G is said to be left orderable if there is a total order < on G such that g<h implies fg < fh for all f,g,h \in G. If S is a finite subset of nontrivial elements of G, the order partitions S into S^+ \cup S^-, where the superscript denotes the elements that are greater than, respectively less than the identity element. Suppose for some finite set S, and every partition into S^+ \cup S^- some product of elements of one of the subsets (with repeats allowed) is equal to the identity. Then necessarily G is not left orderable. In fact, the converse is also true: if no such “poison” subset exists, then G is left orderable. This follows from the compactness of the set of partitions of G-\text{id} into two subsets G^+ \cup G^- (equivalently, the compactness of the set \lbrace -1,1\rbrace^{G-\text{id}}) which follows from Tychonoff’s theorem.
  4. The existence of Galois automorphisms of \mathbb{C} over \mathbb{Q} (other than complex conjugation). Such automorphisms are necessarily non-measurable, and therefore (by Solovay) cannot be constructed without the axiom of choice. In fact, this follows from a theorem of Mackey, that any measurable homomorphism between (sufficiently nice, i.e. locally compact, second countable) topological groups is continuous. We give the sketch of a proof. Suppose f:G \to H is given, and without loss of generality, assume it is surjective. Let U be a neighborhood of the identity in H, and let V be a symmetric open neighborhood of the identity with V\cdot V \subset U. The group G is covered by countably many translates of f^{-1}(V), and therefore the measure of f^{-1}(V) is positive. Let X \subset f^{-1}(V) \subset W where X is compact, W is open, and such that the (Haar) measure of W is less than twice the Haar measure of X (the existence of such an open set W depends on the fact that measure agrees with outer measure for measurable sets). Since W is open, there is an open neighborhood T of the identity in G so that tX \subset W for all t \in T. But tX and X both have measure more than half the measure of W, so they intersect. Since V is symmetric, so is f^{-1}(V), and therefore T \subset f^{-1}(V\cdot V). This implies f is continuous, as claimed. A continuous Galois automorphism of \mathbb{C} is either the identity, or complex conjugation.

Personally, I think that one of the most compelling reasons to accept the Axiom of choice is psychological, and is related to the phenomenon of closure. If we see a fragment of a scene or a pattern, our mind fills in the rest of the scene or pattern for us. We have no photoreceptor cells in our eyes where the optic nerve passes through the retina, but instead of noticing this gap, we have an unperceived blind spot in our field of vision. If we can choose an element of a finite set whenever we want to, we feel as though nothing would stop us from making such a choice infinitely often. We are inclined to accept a formula with a “for all” quantifier  ranging over an infinite set, if the formula holds every time we check it. We are inclined to see patterns — even where they don’t exist. This is the seductive and dangerous (?) side of examples, and maybe a reason to exercise a little caution.

In fact, this discussion barely scratches the surface (and does not really probe into either topology or measure theory in any deep way). I would be very curious to hear contrasting opinions.

Update (6/20): There are many other things that I could/should have mentioned about the interaction between measure theory and topology, and the difficulty of finding good generic examples in measure theory. For example, I definitely should have mentioned:

  1. Lusin’s theorem, which says that a measurable function is continuous on almost all its domain; e.g. if f is any measurable function on an interval [a,b], then for any positive \epsilon there is a compact subset E \subset [a,b] so that the measure of [a,b] - E is at most \epsilon, and f is continuous on E.
  2. von Neumann’s theorem, that a Borel probability measure \mu on the unit cube I^n in \mathbb{R}^n is equivalent to Lesbesgue measure (on the cube) by a self-homeomorphism of the cube (which can be taken to be the identity on the boundary) if and only if it is nonatomic, gives the boundary measure zero, and is positive on nonempty relatively open subsets.
  3. Pairs of mutually singular measures of full support on simple sets. For example, let C be the Cantor set of infinite strings in the alphabet \lbrace 0,1\rbrace with its product topology, and define an infinite string W_\infty in C inductively as follows. For any string w, define the complement w^c to be the string whose digits are obtained from w by interchanging 0 and 1. Then define W_1 to be the string 1, and inductively define W_{n+1}= W_n W_n^c W_n^c \cdots W_n^c where there are f(n)-1 copies of W_n^c, and f(n) is chosen so that \prod_n (f(n)-1)/f(n) = r > 1/2. Let \Lambda be the set of accumulation points of W_\infty under the shift map. Any finite string that appears in W_\infty appears with definite density, so \Lambda is invariant and minimal (i.e. every orbit is dense) under the shift map. However, the proportion of 1‘s in W_n is at least r for n odd, and at most 1-r for n even. Let d_n denote the Dirac measure on the infinite string W_nW_nW_n\cdots, and let e_n denote the average of d_n over its (finite) orbit under the shift map. Define \mu_0^i = \frac 1 i \sum_{j\le i} e_{2j} and \mu_1^i = \frac 1 i \sum_{j\le i} e_{2j+1}. These probability measures are shift-invariant, and have shift-invariant weak limits \mu_0,\mu_1 as i \to \infty with support in \Lambda. Moreover, if \Lambda_j denotes the strings in \Lambda that start with j for j=0,1, then \mu_j(\Lambda_j) \ge r. In particular, the space of shift invariant probability measures on \Lambda is at least 1-dimensional, and we may therefore obtain distinct mutually singular ergodic shift-invariant probability measures on \Lambda. Since \Lambda is minimal, both measures have full support.
  4. Shelah’s theorem that if one works in (ZF) plus the axiom of dependent choice, if there is an uncountable well-ordered set of reals, then there is a non-(Lebesgue) measurable set, which shows the necessity of Solovay’s use of inaccessible cardinals. (By Solovay, the axiom of dependent choice is consistent with the statement that every set of reals is Lebesgue-measurable).

A regular tetrahedron (in \mathbb{R}^3) can be thought of as the convex hull of four pairwise non-adjacent vertices of a regular cube. A bisecting plane parallel to a face of the cube intersects the tetrahedron in a square (one can think of this as the product of two intervals, contained as the middle slice of the join of two intervals). A plane bisecting the long diagonal of a regular cube intersects the cube in a regular hexagon. In each case, the “slice”  one obtains is “rounder” (in some sense) than the original pointy object.

The unit ball in the L_1 norm on \mathbb{R}^n is a “diamond”, the dual polyhedron to an n-cube (which is the unit ball in the L_\infty norm). In three dimensions, the unit cube is an octahedron, the dual of an (ordinary) cube. This is certainly a very pointy object — in fact, for very large n, almost all the mass of such an object is arbitrarily close to the origin (in the ordinary Euclidean norm). Suppose one intersects such a diamond with a “random” m-dimensional linear subspace V. The intersection is a polyhedron, which is the unit ball in the restriction of the L_1 norm to the subspace V. A somewhat surprising phenomenon is that when n is very big compared to m, and V is chosen “randomly”, the intersection of V with this diamond is very round — i.e. a “random” small dimensional slice of L_1 looks like (a scaled copy of) L_2. In fact, one can replace L_1 by L_p here for any p (though of course, one must be a bit more precise what one means by “random”).

We can think of obtaining a “random” m-dimensional subspace of n-dimensional space by choosing n linear maps L_i \in (\mathbb{R}^m)^*:= \text{Hom}(\mathbb{R}^m,\mathbb{R}) and using them as the co-ordinates of a linear map L = \oplus_i L_i:\mathbb{R}^m \to \mathbb{R}^n. For a generic choice of the L_i, the image has full rank, and defines an m-dimensional subspace. So let \mu be a probability measure on (\mathbb{R}^m)^*, and let L define a random embedding of \mathbb{R}^m into \mathbb{R}^n. The co-ordinates of L determine a finite subset of (\mathbb{R}^m)^* of cardinality n; the uniform probability measure with this subset as support is itself a measure \nu, and we can easily compute that \|L(v)\|_p = \left( \int_{\pi \in (\mathbb{R}^m)^*} |\pi(v)|^p d\nu \right)^{1/p}. For n big compared to m, the measure \nu is almost surely very close (in the weak sense) to \mu.  If we choose \mu to be \text{O}(m)-invariant, it follows that the pullback of the L_p norm on \mathbb{R}^n to \mathbb{R}^m under a random L is itself almost \text{O}(m)-invariant, and is therefore very nearly propotional to the L_2 norm. In particular, the pullback of the L_2 norm on \mathbb{R}^n is very nearly equal to (a multiple of) the L_2 norm on \mathbb{R}^m, so (after rescaling), L is very close to an isometry, and the intersection of L(\mathbb{R}^m) with the unit ball in \mathbb{R}^n in the L_p norm is very nearly round.

Dvoretzky’s theorem says that any infinite dimensional Banach space contains finite dimensional subspaces that are arbitrarily close to L_2 in given finite dimension m. In fact, any symmetric convex body in \mathbb{R}^n for large n depending only on m,\epsilon, admits an m-dimensional slice which is within \epsilon of being spherical. On the other hand, Pelczynski showed that any infinite dimensional subspace of \ell_1 contains a further subspace which is isomorphic to \ell_1, and is complemented in \ell_1; in particular, \ell_1 does not contain an isometric copy of \ell_2, or in fact of any infinite dimensional Banach space with a separable dual (I learned these facts from Assaf Naor).

As many readers are no doubt aware, the title of this blog comes from the famous book Geometry and the Imagination by Hilbert and Cohn-Vossen (based on lectures given by Hilbert). One of the first things discussed in that book is the geometry of conics, especially in two and three dimensions. An ellipsoid is a certain kind of (real) quadric surface, i.e. a surface in \mathbb{R}^n defined by a single quadratic equation of the co-ordinates. It may also be defined as the image of the unit (n-1)-dimensional sphere under an affine self-map of \mathbb{R}^n. After composing with a translation, one may imagine an ellipsoid centered at the origin, and think of it as the image of the unit sphere under a linear automorphism of \mathbb{R}^n — i.e. transformation by a nonsingular matrix M.

A (generic) ellipsoid has n axes; in dimension three, these are the “major axis”, the “minor axis” and the “mean axis”. Distance to the origin is a Morse function on a generic ellipsoid; the symmetry of an ellipsoid under the antipodal map means that critical points occur in antipodal pairs. There are a pair of critical points of each index between 0 and n. There is a gradient flow line of this Morse function between each pair of critical points whose index differs by 1, and the union of these flowlines are the (2-dimensional) ellipse obtained by intersecting the ellipsoid with the plane spanned by the pair of axes in question. This shows that these axes are mutually perpendicular.

One may use this geometric picture to “see” the KAK decomposition of \text{GL}(n,\mathbb{R}) as follows, where K denotes the orthogonal subgroup \text{O}(n,\mathbb{R}), and A denotes the subgroup of diagonal matrices with positive entries. Let M be a linear map of \mathbb{R}^n, and let E be the ellipsoid which is the image of the unit sphere under M. Let \xi_i be the axes of E of index i. There is a unique orthogonal matrix O taking the \xi_i to the co-ordinate axes. There is a unique diagonal matrix D taking O(E) to the round sphere. Hence the composition ODM is orthogonal, and we can express M as a product of an orthogonal matrix, a diagonal matrix, and another orthogonal matrix.

One can use ellipsoids to visualize another less standard matrix decomposition as follows. For simplicity we concentrate on the case of dimension 3. The minor and mean axis span a plane \pi which intersects the ellipsoid in the “smallest” possible ellipse. Rotate this plane by keeping the mean axis fixed, and tilting the minor axis towards the major axis. At some unique point one obtains a plane \pi' that intersects the ellipsoid in a round circle. One may shear the ellipsoid, keeping this plane fixed, into an ellipsoid of rotation. This describes a way to factorize M as a product of a shear, a diagonal matrix with two equal eigenvalues, and a rotation.

Question: What is the generalization of the “shear, dilate, rotate” factorization in higher dimensions?

Question: Is there a way to see the Iwasawa (KAN) decomposition geometrically, by using ellipsoids?

The purpose of this post is to discuss my recent paper with Koji Fujiwara, which will shortly appear in Ergodic Theory and Dynamical Systems, both for its own sake, and in order to motivate some open questions that I find very intriguing. The content of the paper is a mixture of ergodic theory, geometric group theory, and computer science, and was partly inspired by a paper of Jean-Claude Picaud. To state the results of the paper, I must first introduce a few definitions and some background.

Let \Gamma be a finite directed graph (hereafter a digraph) with an initial vertex, and edges labeled by elements of a finite set S in such a way that each vertex has at most one outgoing edge with any given label. A finite directed path in \Gamma starting at the initial vertex determines a word in the alphabet S, by reading the labels on the edges traversed (in order). The set L \subset S^* of words obtained in this way is an example of what is called a regular language, and is said to be parameterized by \Gamma. Note that this is not the most general kind of regular language; in particular, any language L of this kind will necessarily be prefix-closed (i.e. if w \in L then every prefix of w is also in L). Note also that different digraphs might parameterize the same (prefix-closed) regular language L.

If S is a set of generators for a group G, there is an obvious map L \to G called the evaluation map that takes a word w to the element of G represented by that word.

Definition: Let G be a group, and S a finite generating set. A combing of G is a (prefix-closed) regular language L for which the evaluation map L \to G is a bijection, and such that every w \in L represents a geodesic in G.

The intuition behind this definition is that the set of words in L determines a directed spanning tree in the Cayley graph C_S(G) starting at \text{id}, and such that every directed path in the tree is a geodesic in C_S(G). Note that there are other definitions of combing in the literature; for example, some authors do not require the evaluation map to be a bijection, but only a coarse bijection.

Fundamental to the theory of combings is the following Theorem, which paraphrases one of the main results of this paper:

Theorem: (Cannon) Let G be a hyperbolic group, and let S be a finite generating set. Choose a total order on the elements of S. Then the language L of lexicographically first geodesics in G is a combing.

The language L described in this theorem is obviously geodesic and prefix-closed, and the evaluation map is bijective; the content of the theorem is that L is regular, and parameterized by some finite digraph \Gamma. In the sequel, we restrict attention exclusively to hyperbolic groups G.

Given a (hyperbolic) group G, a generating set S, a combing L, one makes the following definition:

Definition: A function \phi:G \to \mathbb{Z} is weakly combable (with respect to S,L) if there is a digraph \Gamma parameterizing L and a function d\phi from the vertices of \Gamma to \mathbb{Z} so that for any w \in L, corresponding to a path \gamma in \Gamma, there is an equality \phi(w) = \sum_i d\phi(\gamma(i)).

In other words, a function \phi is weakly combable if it can be obtained by “integrating” a function d\phi along the paths of a combing. One furthermore says that a function is combable if it changes by a bounded amount under right-multiplication by an element of S, and bicombable if it changes by a bounded amount under either left or right multiplication by an element of S. The property of being (bi-)combable does not depend on the choice of a generating set S or a combing L.

Example: Word length (with respect to a given generating set S) is bicombable.

Example: Let \phi:G \to \mathbb{Z} be a homomorphism. Then \phi is bicombable.

Example: The Brooks counting quasimorphisms (on a free group) and the Epstein-Fujiwara counting quasimorphisms are bicombable.

Example: The sum or difference of two (bi-)combable functions is (bi-)combable.

A particularly interesting example is the following:

Example: Let S be a finite set which generates G as a semigroup. Let \phi_S denote word length with respect to S, and \phi_{S^{-1}} denote word length with respect to S^{-1} (which also generates G as a semigroup). Then the difference \psi_S:= \phi_S - \phi_{S^{-1}} is a bicombable quasimorphism.

The main theorem proved in the paper concerns the statistical distribution of values of a bicombable function.

Theorem: Let G be a hyperbolic group, and let \phi be a bicombable function on G. Let \overline{\phi}_n be the value of \phi on a random word in G of length n (with respect to a certain measure \widehat{\nu} depending on a choice of generating set). Then there are algebraic numbers E and \sigma so that as distributions, n^{-1/2}(\overline{\phi}_n - nE) converges to a normal distribution with standard deviation \sigma.

One interesting corollary concerns the length of typical words in one generating set versus another. The first thing that every geometric group theorist learns is that if S_1, S_2 are two finite generating sets for a group G, then there is a constant K so that every word of length n in one generating set has length at most nK and at least n/K in the other generating set. If one considers an example like \mathbb{Z}^2, one sees that this is the best possible estimate, even statistically. However, if one restricts attention to a hyperbolic group G, then one can do much better for typical words:

Corollary: Let G be hyperbolic, and let S_1,S_2 be two finite generating sets. There is an algebraic number \lambda_{1,2} so that almost all words of length n with respect to the S_1 generating set have length almost equal to n\lambda_{1,2} with respect to the S_2 generating set, with error of size O(\sqrt{n}).

Let me indicate very briefly how the proof of the theorem goes.

Sketch of Proof: Let \phi be bicombable, and let d\phi be a function from the vertices of \Gamma to \mathbb{Z}, where \Gamma is a digraph parameterizing L. There is a bijection between the set of elements in G of word length n and the set of directed paths in \Gamma of length n that start at the initial vertex. So to understand the distribution of \phi, we need to understand the behaviour of a typical long path in \Gamma.

Define a component of \Gamma to be a maximal subgraph with the property that there is a directed path (in the component) from any vertex to any other vertex. One can define a new digraph C(\Gamma) without loops, with one vertex for each component of \Gamma, in an obvious way. Each component C determines an adjacency matrix M_C, with ij-entry equal to 1 if there is a directed edge from vertex i to vertex j, and equal to 0 otherwise. A component C is big if the biggest real eigenvalue \lambda of M_C is at least as big as the biggest real eigenvalue of the matrices associated to every other component. A random long walk in \Gamma will spend most of its time entirely in big components, so these are the only components we need to consider to understand the statistical distribution of \phi.

A theorem of Coornaert implies that there are no big components of C(\Gamma) in series; i.e. there are no directed paths in C(\Gamma) from one big component to another (one also says that the big components do not communicate). This means that a typical long walk in \Gamma is entirely contained in a single big component, except for a (relatively short) path at the start and the end of the walk. So the distribution of \phi gets independent contributions, one from each big component.

The contribution from an individual big component is not hard to understand: the central limit theorem for stationary Markov chains says that for elements of G corresponding to paths that spend almost all their time in a given big component C there is a central limit theorem  n^{-1/2}(\overline{\phi}_n - nE_C) \to N(0,\sigma_C) where the mean E_C and standard deviation \sigma_C depend only on C. The problem is to show that the means and standard deviations associated to different big components are the same. Everything up to this point only depends on weak combability of \phi; to finish the proof one must use bicombability.

It is not hard to show that if \gamma is a typical infinite walk in a component C, then the subpaths of \gamma of length n are distributed like random walks of length n in C. What this means is that the mean and standard deviation E_C,\sigma_C associated to a big component C can be recovered from the distribution of \phi on a single infinite “typical” path in C. Such an infinite path corresponds to an infinite geodesic in G, converging to a definite point in the Gromov boundary \partial G. Another theorem of Coornaert (from the same paper) says that the action of G on its boundary \partial G is ergodic with respect to a certain natural measure called a Patterson-Sullivan measure (see Coornaert’s paper for details). This means that there are typical infinite geodesics \gamma,\gamma' associated to components C and C' for which some g \in G takes \gamma to a geodesic g\gamma ending at the same point in \partial G as \gamma'. Bicombability implies that the values of \phi on \gamma and g\gamma differ by a bounded amount. Moreover, since g\gamma and \gamma' are asymptotic to the same point at infinity, combability implies that the values of \phi on g\gamma and \gamma' also differ by a bounded amount. This is enough to deduce that E_C = E_{C'} and \sigma_C = \sigma_{C'}, and one obtains a (global) central limit theorem for \phi on G. qed.

This obviously raises several questions, some of which seem very hard, including:

Question 1: Let \phi be an arbitrary quasimorphism on a hyperbolic group G (even the case G is free is interesting). Does \phi satisfy a central limit theorem?

Question 2: Let \phi be an arbitrary quasimorphism on a hyperbolic group G. Does \phi satisfy a central limit theorem with respect to a random walk on G? (i.e. one considers the distribution of values of \phi not on the set of elements of G of word length n, but on the set of elements obtained by a random walk on G of length n, and lets n go to infinity)

All bicombable quasimorphisms satisfy an important property which is essential to our proof of the central limit theorem: they are local, which is to say, they are defined as a sum of local contributions. In the continuous world, they are the analogue of the so-called de Rham quasimorphisms on \pi_1(M) where M is a closed negatively curved Riemannian manifold; such quasimorphisms are defined by choosing a 1-form \alpha, and defining \phi_\alpha(g) to be equal to the integral \int_{\gamma_g} \alpha, where \gamma_g is the closed oriented based geodesic in M in the homotopy class of g. De Rham quasimorphisms, being local, also satisfy a central limit theorem.

This locality manifests itself in another way, in terms of defects. Let \phi be a quasimorphism on a hyperbolic group G. Recall that the defect D(\phi) is the supremum of |\phi(gh) - \phi(g) -\phi(h)| over all pairs of elements g,h \in G. A quasimorphism is further said to be homogeneous if \phi(g^n) = n\phi(g) for all integers n. If \phi is an arbitrary quasimorphism, one may homogenize it by taking a limit \psi(g) = \lim_{n \to \infty} \phi(g^n)/n; one says that \psi is the homogenization of \phi in this case. Homogenization typically does not preserve defects; however, there is an inequality D(\psi) \le 2D(\phi). If \phi is local, one expects this inequality to be an equality. For, in a hyperbolic group, the contribution to the defect of a local quasimorphism all arises from the interaction of the suffix of (a geodesic word representing the element) g with the prefix of h (with notation as above). When one homogenizes, one picks up another contribution to the defect from the interaction of the prefix of g with the suffix of h; since these two contributions are essentially independent, one expects that homogenizing a local quasimorphism should exactly double the defect. This is the case for bicombable and de Rham quasimorphisms, and can perhaps be used to define locality for a quasimorphism on an arbitrary group.

This discussion provokes the following key question:

Question 3: Let G be a group, and let \psi be a homogeneous quasimorphism. Is there a quasimorphism \phi with homogenization \psi, satisfying D(\psi) = 2D(\phi)?

Example: The answer to question 3 is “yes” if \psi is the rotation quasimorphism associated to an action of G on S^1 by orientation-preserving homeomorphisms (this is nontrivial; see Proposition 4.70 from my monograph).

Example: Let C be any homologically trivial group 1-boundary. Then there is some extremal homogeneous quasimorphism \psi for C (i.e. a quasimorphism achieving equality \text{scl}(C) = \psi(C)/2D(\psi) under generalized Bavard duality; see this post) for which there is \phi with homogenization \psi satisfying D(\psi) = 2D(\phi). Consequently, if every point in the boundary of the unit ball in the \text{scl} norm is contained in a unique supporting hyperplane, the answer to question 3 is “yes” for any quasimorphism on G.

Any quasimorphism on G can be pulled back to a quasimorphism on a free group, but this does not seem to make anything easier. In particular, question 3 is completely open (as far as I know) when G is a free group. An interesting test case might be the homogenization of an infinite sum of Brooks functions \sum_w h_w for some infinite non-nested family of words \lbrace w \rbrace.  

If the answer to this question is false, and one can find a homogeneous quasimorphism \psi which is not the homogenization of any “local” quasimorphism, then perhaps \psi does not satisfy a central limit theorem. One can try to approach this problem from the other direction:

Question 4: Given a function f defined on the ball of radius n in a free group F, one defines the defect D(f) in the usual way, restricted to pairs of elements g,h for which g,h,gh are all of length at most n. Under what conditions can f be extended to a function on the ball of radius n+1 without increasing the defect?

If one had a good procedure for building a quasimorphism “by hand” (so to speak), one could try to build a quasimorphism that failed to satisfy a central limit theorem, or perhaps find reasons why this was impossible.

A basic reference for the background to this post is my monograph.

Let G be a group, and let [G,G] denote the commutator subgroup. Every element of [G,G] can be expressed as a product of commutators; the commutator length of an element g is the minimum number of commutators necessary, and is denoted \text{cl}(g). The stable commutator length is the growth rate of the commutator lengths of powers of an element; i.e. \text{scl}(g) = \lim_{n \to \infty} \text{cl}(g^n)/n. Recall that a group G is said to satisfy a law if there is a nontrivial word w in a free group F for which every homomorphism from F to G sends w to \text{id}.

The purpose of this post is to give a very short proof of the following proposition (modulo some background that I wanted to talk about anyway):

Proposition: Suppose G obeys a law. Then the stable commutator length vanishes identically on [G,G].

The proof depends on a duality between stable commutator length and a certain class of functions, called homogeneous quasimorphisms

Definition: A function \phi:G \to \mathbb{R} is a quasimorphism if there is some least number D(\phi)\ge 0 (called the defect) so that for any pair of elements g,h \in G there is an inequality |\phi(x) + \phi(y) - \phi(xy)| \le D(\phi). A quasimorphism is homogeneous if it satisfies \phi(g^n) = n\phi(g) for all integers n.

Note that a homogeneous quasimorphism with defect zero is a homomorphism (to \mathbb{R}). The defect satisfies the following formula:

Lemma: Let f be a homogeneous quasimorphism. Then D(\phi) = \sup_{g,h} \phi([g,h]).

A fundamental theorem, due to Bavard, is the following:

Theorem: (Bavard duality) There is an equality \text{scl}(g) = \sup_\phi \frac {\phi(g)} {2D(\phi)} where the supremum is taken over all homogeneous quasimorphisms with nonzero defect.

In particular, \text{scl} vanishes identically on [G,G] if and only if every homogeneous quasimorphism on G is a homomorphism.

One final ingredient is another geometric definition of \text{scl} in terms of Euler characteristic. Let X be a space with \pi_1(X) = G, and let \gamma:S^1 \to X be a free homotopy class representing a given conjugacy class g. If S is a compact, oriented surface without sphere or disk components, a map f:S \to X is admissible if the map on \partial S factors through \partial f:\partial S \to S^1 \to X, where the second map is \gamma. For an admissible map, define n(S) by the equality [\partial S] \to n(S) [S^1] in H_1(S^1;\mathbb{Z}) (i.e. n(S) is the degree with which \partial S wraps around \gamma). With this notation, one has the following:

Lemma: There is an equality \text{scl}(g) = \inf_S \frac {-\chi^-(S)} {2n(S)}.

Note: the function -\chi^- is the sum of -\chi over non-disk and non-sphere components of S. By hypothesis, there are none, so we could just write -\chi. However, it is worth writing -\chi^- and observing that for more general (orientable) surfaces, this function is equal to the function \rho defined in a previous post.

We now give the proof of the Proposition.

Proof. Suppose to the contrary that stable commutator length does not vanish on [G,G]. By Bavard duality, there is a homogeneous quasimorphism \phi with nonzero defect. Rescale \phi to have defect 1. Then for any \epsilon there are elements g,h with \phi([g,h]) \ge 1-\epsilon, and consequently \text{scl}([g,h]) \ge 1/2 - \epsilon/2 by Bavard duality. On the other hand, if X is a space with \pi_1(X)=G, and \gamma:S^1 \to X is a loop representing the conjugacy class of [g,h], there is a map f:S \to X from a once-punctured torus S to X whose boundary represents \gamma. The fundamental group of S is free on two generators x,y which map to the class of g,h respectively. If w is a word in x,y mapping to the identity in G, there is an essential loop \alpha in S that maps inessentially to X. There is a finite cover \widetilde{S} of S, of degree d depending on the word length of w, for which \alpha lifts to an embedded loop. This can be compressed to give a surface S' with -\chi^-(S') \le -\chi^-(\widetilde{S})-2. However, Euler characteristic is multiplicative under coverings, so -\chi^-(\widetilde{S}) = -\chi^-(S)\cdot d. On the other hand, n(S') = n(\widetilde{S})=d so \text{scl}([g,h]) \le 1/2 - 1/d. If G obeys a law, then d is fixed, but \epsilon can be made arbitrarily small. So G does not obey a law. qed.

In a previous post, I discussed some methods for showing that a given group contains a (nonabelian) free subgroup. The methods were analytic and/or dynamical, and phrased in terms of the existence (or nonexistence) of certain functions on G or on spaces derived from G, or in terms of actions of G on certain spaces. Dually, one can try to find a free group in G by finding a homomorphism \rho: F \to G and looking for circumstances under which \rho is injective.

For concreteness, let G = \pi_1(X) for some (given) space X. If F is a free group, a representation \rho:F \to G up to conjugation determines a homotopy class of map f: S \to X where S is a K(F,1). The most natural K(F,1)‘s to consider are graphs and surfaces (with boundary). It is generally not easy to tell whether a map of a graph or a surface to a topological space is \pi_1-injective at the topological level, but might be easier if one can use some geometry.

Example: Let X be a complete Riemannian manifold with sectional curvature bounded above by some negative constant K < 0. Convexity of the distance function in a negatively curved space means that given any map of a graph f:\Gamma \to X one can flow f by the negative gradient of total length until it undergoes some topology change (e.g. some edge shrinks to zero length) or it (asymptotically) achieves a local minimum (the adjective “asymptotically” here just means that the flow takes infinite time to reach the minimum, because the size of the gradient is small when the map is almost minimum; there are no analytic difficulties to overcome when taking the limit). A typical topological change might be some loop shrinking to a point, thereby certifying that a free summand of \pi_1(\Gamma) mapped trivially to G and should have been discarded. Technically, one probably wants to choose \Gamma to be a trivalent graph, and when some interior edge collapses (so that four points come together) to let the 4-valent vertex resolve itself into a pair of 3-valent vertices in whichever of the three combinatorial possibilities is locally most efficient. The limiting graph, if nonempty, will be trivalent, with geodesic edges, and vertices at which the three edges are all (tangentially) coplanar and meet at angles of 2\pi/3. Such a graph can be certified as \pi_1-injective provided the edges are sufficiently long (depending on the curvature K). After rescaling the metric on X so that the supremum of the curvatures is -1, a trivalent geodesic graph with angles 2\pi/3 at the vertices and edges at least 2\tanh^{-1}(1/2) = 1.0986\cdots is \pi_1-injective. To see this, lift to maps between universal covers, i.e. consider an equivariant map from a tree \widetilde{\Gamma} to \widetilde{X}. Let \ell be an embedded arc in \widetilde{\Gamma}, and consider the image in \widetilde{X}. Using Toponogov’s theorem, one can compare with a piecewise isometric map from \ell to \mathbb{H}^n. The worst case is when all the edges are contained in a single \mathbb{H}^2, and all corners “bend” the same way. Providing the image does not bend as much as a horocircle, the endpoints of the image of \ell stay far away in \mathbb{H}^2. An infinite sided convex polygon in \mathbb{H}^2 with all edges of length 2\tanh^{-1}(1/2) and all angles 2\pi/3 osculates a horocycle, so we are done.

Remark: The fundamental group of a negatively curved manifold is word-hyperbolic, and therefore contains many nonabelian free groups, which may be certified by pingpong applied to the action of the group on its Gromov boundary. The point of the previous example is therefore to certify that a certain subgroup is free in terms of local geometric data, rather than global dynamical data (so to speak). Incidentally, I would not swear to the correctness of the constants above.

Example: A given free group is the fundamental group of a surface with boundary in many different ways (this difference is one of the reasons that a group like \text{Out}(F_n) is so much more complicated than the mapping class group of a surface). Pick a realization F = \pi_1(S). Then a homomorphism \rho:F \to G up to conjugacy determines a homotopy class of map from S to X as above. If X is negatively curved as before, each boundary loop is homotopic to a unique geodesic, and we may try to find a “good” map f:S \to X with boundary on these geodesics. There are many possible classes of good maps to consider:

  1. Fix a conformal structure on S and pick a harmonic map in the homotopy class of f. Such a map exists since the target is nonpositively curved, by the famous theorem of Eells-Sampson. The image is real analytic if X is, and is at least as negatively curved as the target, and therefore there is an a priori upper bound on the intrinsic curvature of the image; if the supremum of the curvature on X is normalized to be -1, then the image surface is \text{CAT}(-1), which just means that pointwise it is at least as negatively curved as hyperbolic space. By Gauss-Bonnet, one obtains an a priori bound on the area of the image of S in terms of the Euler characteristic (which just depends on the rank of F). On the other hand, this map depends on a choice of marked conformal structure on S, and the space of such structures is noncompact.
  2. Vary over all conformal structures on S and choose a harmonic map of least energy (if one exists) or find a sequence of maps that undergo a “neck pinch” as a sequence of conformal structures on S degenerates. Such a neck pinch exhibits a simple curve in S that is essential in S but whose image is inessential in X; such a curve can be compressed, and the topology of S simplified. Since each compression increases \chi, after finitely many steps the process terminates, and one obtains the desired map. This is Schoen-Yau‘s method to construct a stable minimal surface representative of S. When the target is 3-dimensional, the surface may be assumed to be unbranched, by a trick due to Osserman. 
  3. Following Thurston, pick an ideal triangulation of S (i.e. a geodesic lamination of S whose complementary regions are all ideal triangles); since S has boundary, we may choose such a lamination by first picking a triangulation (in the ordinary sense) with all vertices on \partial S and then “spinning” the vertices to infinity. Unless \rho factors through a cyclic group, there is some choice of lamination so that the image of f can be straightened along the lamination, and then the image spanned with CAT(-1) ideal triangles to produce a pleated surface in X representing f (note: if X has constant negative curvature, these ideal triangles can be taken to be totally geodesic). The space of pleated surfaces in fixed (closed) X of given genus is compact, so this is a reasonable class of maps to work with.
  4. If G is merely a hyperbolic group, one can still construct pleated surfaces, not quite in X, but equivariantly in Mineyev’s flow space associated to \widetilde{X}. Here we are not really thinking of the triangles themselves, but the geodesic laminations they bound (which carry the same information). 
  5. If X is complete and 3-dimensional but noncompact, the space of pleated surfaces of given genus is generally not compact, and it is not always easy to find a pleated surface where you want it. This can sometimes be remedied by shrinkwrapping; one looks for a minimal/pleated/harmonic surface subject to the constraint that it cannot pass through some prescribed set of geodesics in X (which act as “barriers” or “obstacles”, and force the resulting surface to end up roughly where one wants it to).

Anyway, one way or another, one can usually find a map of a surface, or a space of maps of surfaces, representing a given homomorphism, with some kind of a priori control of the geometry. Usually, this control is not enough to certify that a given map is \pi_1-injective, but sometimes it might be. For instance, a totally geodesic (immersed) surface in a complete manifold of constant negative curvature is always \pi_1-injective, and any surface whose extrinsic curvature is small enough will also be \pi_1-injective.

Geometric methods to certify injectivity of free or surface groups are very useful and flexible, as far as they go. Unfortunately, I know of very few topological methods to certify injectivity. By far the most important exception is the following:

Example: In 3-dimensions, one should look for properly embedded surfaces. If M is a 3-manifold (possibly with boundary), and S is a two-sided properly embedded surface, the famous Dehn’s Lemma (proved by Papakyriakopoulos) implies that either S is \pi_1-injective, or there is an embedded essential loop in S that bounds an embedded disk in M on one side of S. Such a loop may be compressed (i.e. S may be cut open along the loop, and two copies of the compressing disk sewn in) preserving the property of embeddedness, but increasing \chi. After finitely many steps, either S compresses away entirely, or one obtains a \pi_1-injective surface. One way to ensure that S does not compress away entirely is to start with a surface that is essential in (relative) homology; another way is to look for a surface dual to an action (of \pi_1(M)) on a tree. In the latter case, one can often construct quite different free subgroups in \pi_1(M) by pingpong on the ends of the tree. Note by the way that this method produces closed surface subgroups as well as free subgroups. Note too that two-sidedness is essential to apply Dehn’s Lemma.

Remark: Modern 3-manifold topologists are sometimes unreasonably indifferent to the power of Dehn’s Lemma (probably because this tool has been incorporated so fully into their subconscious?); it is worth reading Ralph Fox’s review of Papakyriakopoulos’s paper (linked above). Of this paper, he writes:

. . . it has already led to renewed attack on the problem of classifying the 3-dimensional manifolds; significant results have been and are being obtained. A complete solution has suddenly become a definite possibility. 

Remember this was written more than 50 years ago — before the geometrization conjecture, before the JSJ decomposition, before the Scott core theorem, before Haken manifolds. The only reasonable reaction to this is: !!!

Example: The construction of injective surfaces by Dehn’s Lemma may be abstracted in the following way. Given a target space X, and a class of maps \mathcal{F} of surfaces into X (in some category; e.g. homotopy classes of maps, pleated surfaces, \text{CAT}(-1) surfaces, etc.) suppose one can find a complexity c:\mathcal{F} \to \mathcal{O} with values in some ordered set, such that if f \in \mathcal{F} is not injective, one can find f' \in \mathcal{F} of smaller complexity. Then if \mathcal{O} is well-ordered, an injective surface may be found. If \mathcal{O} is not well-ordered, one may ask at least that c is upper semi-continuous on \mathcal{F}, and hope to extend it upper semi-continuously to some suitable compactification of \mathcal{F}. Even if \mathcal{O} is not well-ordered, one can at least certify that a map is injective, by showing that it minimizes c. Here are some potential examples (none of them entirely satisfactory).

  1. Given a (homologically trivial) homotopy class of loop \gamma in X, one can look at all maps of orientable surfaces S to X with boundary factoring through \gamma. For such a surface, let n(S) denote the degree with which the (possibly multiple) boundary (components) of S wrap homologically around \gamma, and let -\chi^-(S) denote the sum of Euler characteristics of non-disk and non-sphere components of S. For each surface S, one considers the quantity -\chi^-(S)/2n(S) (the factor of 2 can be ignored if desired). The important feature of this quantity is that it does not change if S is replaced by a finite cover. If \pi_1(S) is not injective, let \alpha be an essential loop on S whose image in X is inessential. Peter Scott showed that any essential loop on a surface lifts to an embedded loop in some finite cover. Hence, after passing to such a cover, \alpha may be compressed, and the resulting surface S' satisfies -\chi^-(S')/2n(S') < -\chi^-(S)/2n(S). In other words, a global minimizer of this quantity is injective. Such a surface is called extremal. The problem is that extremal surfaces do not always exist; but this construction motivates one to look for them. 
  2. Given a \text{CAT}(-1) surface S with geodesic boundary in X, one can retract S to a geodesic spine, and encode the surface by the resulting fatgraph, with edges labelled by homotopy classes in X. Since Euler characteristic is local, one does not really care precisely how the pieces of the fatgraph are assembled, but only how many pieces of what kinds are needed for a given boundary. So if only finitely many such pieces appear in some infinite family of surfaces, one can in fact construct an extremal surface as above, which is necessarily injective (more technically, one reduces the computation of Euler characteristic to a linear programming problem, finds a rational extremal solution (which corresponds to a weighted sum of pieces of fatgraph), and glues together the pieces to construct the extremal surface; one situation in which this scheme can be made to work is explained in this paper of mine). Edges can be subdivided into a finite number of possibilities, so one just needs to ensure finiteness of the number of vertex types. One condition that ensures finiteness of vertex types is the existence of a uniform constant C>0 so that for each surface S in the given family, and for each point p \in S, there is an estimate \text{dist}(p,\partial S) \le C. If this condition is violated, one finds pairs p_i,S_i which converge in the geometric topology to a point in a complete (i.e. without boundary, but probably noncompact) surface.
  3. Given S \to X, either compress an embedded essential loop, or realize S by a least area surface. If S is not injective, pass to a cover, compress a loop, and realize the result by a least area surface. Repeat this process. One obtains in this way a sequence of least area surfaces in X (typically of bigger and bigger genus) and there is no reason to expect the process to terminate. If X is a 3-manifold, the curvature of a least area surface admits two-sided curvature bounds away from the boundary, by a theorem of Schoen (near the boundary, the negative curvature might blow up, but only in controlled ways — e.g. after rescaling about a sequence of points with the most negative curvature, one may obtain in the limit a helicoid). Away from the boundary, the family of surfaces one obtains vary precompactly in the C^\infty topology, and one may obtain a complete locally least area lamination \Lambda in the limit. If \pi_1(\Lambda) is not injective, one can continue to pass to covers (applying a version of Scott’s theorem for infinite surfaces) and compress, and by transfinite induction, eventually arrive at a locally least area lamination with injective \pi_1. Of course, such a limit might well be a lamination by planes. However, the lamination one obtains is not completely arbitrary: since it is a limit of limits of . . . compact surfaces, one can choose a limit that admits a nontrivial invariant transverse measure (one must be careful here, since the lamination will typically have boundary). Or, as in bullet 2. above, one may insist that this limit lamination is complete (i.e. without boundary). 

It is more tricky to find a limit lamination as in 3. without boundary and admitting an invariant transverse measure; in any case, this motivates the following:  

Question: Is there a closed hyperbolic 3-manifold M which admits a locally least area transversely measured complete immersed lamination \Lambda, all of whose leaves are disks? (note that the answer is negative if one asks for the lamination to be embedded (there are several easy proofs of this fact)).

Secretly, the function that assigns \inf_S -\chi^-(S)/2n(S) to a homologically trivial loop \gamma is the stable commutator length of the conjugacy class in \pi_1(X) represented by \gamma. Extremal surfaces can sometimes be certified by constructing certain functions on \pi_1(X) called homogeneous quasimorphisms, but a discussion of such functions will have to wait for another post.


Get every new post delivered to your Inbox.

Join 175 other followers