van Kampen soup and thermodynamics of DNA

The development and scope of modern biology is often held out as a fantastic opportunity for mathematicians. The accumulation of vast amounts of biological data, and the development of new tools for the manipulation of biological organisms at microscopic levels and with unprecedented accuracy, invites the development of new mathematical tools for their analysis and exploitation. I know of several examples of mathematicians who have dipped a toe, or sometimes some more substantial organ, into the water. But it has struck me that I know (personally) few mathematicians who believe they have something substantial to learn from the biologists, despite the existence of several famous historical examples.  This strikes me as odd; my instinctive feeling has always been that intellectual ruts develop so easily, so deeply, and so invisibly, that continual cross-fertilization of ideas is essential to escape ossification (if I may mix biological metaphors . . .)

It is not necessarily easy to come up with profound examples of biological ideas or principles that can be easily translated into mathematical ones, but it is sometimes possible to come up with suggestive ones. Let me try to give a tentative example.

Deoxiribonucleic acid (DNA) is a nucleic acid that contains the genetic blueprint for all known living things. This blueprint takes the form of a code — a molecule of DNA is a long polymer strand composed of simple units called nucleotides; such a molecule is typically imagined as a string in a four character alphabet \lbrace A,T,G,C \rbrace, which stand for the nucleotides Adenine, Thymine, Guanine, and Cytosine. These molecular strands like to arrange themselves in tightly bound oppositely aligned pairs, matching up nucleotides in one string with complementary nucleotides in the other, so that A matches with T, and C with G.

The geometry of a strand of DNA is very complicated — strands can be tangled, knotted, linked in complicated ways, and the fundamental interactions between strands (e.g. transcription, recombination) are facilitated or obstructed by mechanical processes depending on this geometry. Topology, especially knot theory, has been used in the study of some of these processes; the value of topological methods in this context include their robustness (fault-tolerance) and the discreteness of their invariants (similar virtues motivate some efforts to build topological quantum computers). A complete mathematical description of the salient biochemistry, mechanics, and semantic content of a configuration of DNA in a single cell is an unrealistic goal for the foreseeable future, and therefore attempts to model such systems depends on ignoring, or treating statistically, certain features of the system. One such framework ignores the ambient geometry entirely, and treats the system using symbolic, or combinatorial methods which have some of the flavor of geometric group theory.

One interesting approach is to consider a mapping from the alphabet of nucleotides to a standard generating set for F_2, the free group on two generators; for example, one can take the mapping T \to a, A \to A, C \to b, G \to B where a,b are free generators for F_2, and {}A,B denote their inverses. Then a pair of oppositely aligned strands of DNA translates into an edge of a van Kampen diagram — the “words” obtained by reading the letters along an edge on either side are inverse in F_2.

Strands of DNA in a configuration are not always paired along their lengths; sometimes junctions of three or more strands can form; certain mobile four-strand junctions, so-called “Holliday junctions”, perform important functions in the process of genetic recombination, and are found in a wide variety of organisms. A configuration of several strands with junctions of varying valences corresponds in the language of van Kampen diagrams to a fatgraph — i.e. a graph together with a choice of cyclic ordering of edges at each vertex — with edges labeled by inverse pairs of words in F_2 (note that this is quite different from the fatgraph model of proteins developed by Penner-Knudsen-Wiuf-Andersen). The energy landscape for branch migration (i.e. the process by which DNA strands separate or join along some segment) is very complicated, and it is challenging to model it thermodynamically. It is therefore not easy to predict in advance what kinds of fatgraphs are more or less likely to arise spontaneously in a prepared “soup” of free DNA strands.

As a thought experiment, consider the following “toy” model, which I do not suggest is physically realistic. We make the assumption that the energy cost of forming a junction of valence v is c(v-2) for some fixed constant c. Consequently, the energy of a configuration is proportional to -\chi, i.e. the negative of Euler characteristic of the underlying graph. Let w be a reduced word, representing an element of F_2, and imagine a soup containing some large number of copies of the strand of DNA corresponding to the string \dot{w}:=\cdots www \cdots. In thermodynamic equilibrium, the partition function has the form Z = \sum_i e^{-E_i/k_BT} where k_B is Boltzmann’s constant, T is temperature, and E_i is the energy of a configuration (which by hypothesis is proportional to -\chi). At low temperature, minimal energy configurations tend to dominate; these are those that minimize -\chi per unit “volume”. Topologically, a fatgraph corresponding to such a configuration can be thickened to a surface with boundary. The words along the edges determine a homotopy class of map from such a surface to a K(F_2,1) (e.g. a once-punctured torus) whose boundary components wrap multiply around the free homotopy class corresponding to the conjugacy class of w. The infimum of -\chi/2d where d is the winding degree on the boundary, taken over all configurations, is precisely the stable commutator length of w; see e.g. here for a definition.

Anyway, this example is perhaps a bit strained (and maybe it owes more to thermodynamics than to biology), but already it suggests a new mathematical object of study, namely the partition function Z as above, and one is already inclined to look for examples for which the partition function obeys a symmetry like that enjoyed by the Riemann zeta function, or to specialize temperature to other values, as in random matrix theory. The introduction of new methods into the study of a classical object — for example, the decision to use thermodynamic methods to organize the study of van Kampen diagrams — bends the focus of the investigation towards those examples and contexts where the methods and tools are most informative. Phenomena familiar in one context (power laws, frequency locking, phase transitions etc.) suggest new questions and modes of enquiry in another. Uninspired or predictable research programs can benefit tremendously from such infusions, whether the new methods are borrowed from other intellectual disciplines (biology, physics), or depend on new technology (computers), or new methods of indexing (google) or collaboration (polymath).

One of my intellectual heroes — Wolfgang Haken — worked for eight years in R+D for Siemens in Munich after completing his PhD. I have a conceit (unsubstantiated as far as I know by biographical facts) that his experience working for a big engineering firm colored his approach to mathematics, and made it possible for him to imagine using industrial-scale “engineering” tools (e.g. integer programming, exhaustive computer search of combinatorial possibilities) to solve two of the most significant “pure” mathematical open problems in topology at the time — the knot recognition problem, and the four-color theorem. It is an interesting exercise to try to imagine (fantastic) variations. If I sit down and decide to try to prove (for example) Cannon’s conjecture, I am liable to try minor variations on things I have tried before, appeal for my intuition to examples that I understand well, read papers by others working in similar ways on the problem, etc. If I imagine that I have been given a billion dollars to prove the conjecture, I am almost certain to prioritize the task in different ways, and to entertain (and perhaps create) much more ambitious or innovative research programs to tackle the task. This is the way in which I understand the following quote by John Dewey, which I used as the colophon of my first book:

Every great advance in science has issued from a new audacity of the imagination.

Posted in Biology, Dynamics, Groups | Tagged , , , , , , , | 4 Comments

Amenability of Thompson’s group F?

Geometric group theory is not a coherent and unified field of enquiry so much as a collection of overlapping methods, examples, and contexts. The most important examples of groups are those that arise in nature: free groups and fundamental groups of surfaces, the automorphism groups of the same, lattices, Coxeter and Artin groups, and so on; whereas the most important properties of groups are those that lend themselves to applications or can be used in certain proof templates: linearity, hyperbolicity, orderability, property (T), coherence, amenability, etc. It is natural to confront examples arising in one context with properties that arise in the other, and this is the source of a wealth of (usually very difficult) problems; e.g. do mapping class groups have property (T)? (no, by Andersen) or: is every lattice in \text{PSL}(2,\mathbb{C}) virtually orderable?

As remarked above, it is natural to formulate these questions, but not necessarily productive. Gromov, in his essay Spaces and Questions remarks that

often the mirage of naturality lures us into featureless desert with no clear perspective where the solution, even if found, does not quench our thirst for structural mathematics . . . Another approach . . . has a better chance for a successful outcome with questions following (rather than preceding) construction of new objects.

A famous question of the kind Gromov warns against is the following:

Question: Is Thompson’s group F amenable?

Recall that Thompson’s group is the group of (orientation-preserving) PL homeomorphisms of the unit interval with breakpoints at dyadic rationals (i.e. rational numbers of the form p/2^q for integers p,q) and derivatives all powers of 2. This group is a rich source of examples/counterexamples in geometric group theory: it is finitely presented (in fact FP_\infty) but “looks like” a transformation group; it contains no nonabelian free group (by Brin-Squier), but obeys no law. It is not elementary amenable (i.e. it cannot be built up from finite or abelian groups by elementary operations — subgroups, quotients, extensions, directed unions), so it is “natural” to wonder whether it is amenable at all, or whether it is one of the rare examples of nonamenable groups without nonabelian free subgroups (see this post for a discussion of amenability versus the existence of free subgroups, and von Neumann’s conjecture). This question has attracted a great deal of attention, possibly because of its long historical pedigree, rather than because of the potential applications of a positive (or negative) answer.

Recently, two papers were posted on the arXiv, promising competing resolutions of the question. In February, Azer Akhmedov posted a preprint claiming to show that the group F is not amenable. This preprint was revised, withdrawn, then revised again, and as of the end of April, Akhmedov continues to press his claim. Akhmedov’s argument depends on a new geometric criterion for nonamenability, roughly speaking, the existence of a 2-generator subgroup and a subadditive non-negative function on the group whose values grow at a definite rate on words in the subgroup whose exponents satisfy suitable parity conditions and inequalities. The non-negative function (Akhmedov calls it a “height function”) certifies the existence of a sufficiently “bushy” subset of the group to violate Folner’s criterion for amenability. Akhmedov’s paper reads like a “conventional” paper in geometric group theory, using methods from coarse geometry, careful combinatorial and counting arguments to establish the existence of a geometric object with certain large-scale properties, and an appeal to a standard geometric criterion to obtain the desired result. Akhmedov’s paper is part of a series, relating (non)amenability to certain other interesting geometric properties, some related to the so-called “traveling salesman” property, introduced earlier by Akhmedov.

On the other hand, in May, E. Shavgulidze posted a preprint claiming to show that the group F is amenable. Interestingly enough, Shavgulidze’s argument does not apply to the slightly more general class of Stein-Thompson groups in which slopes and denominators of break points can be divisible by an arbitrary (but prescribed) finite set of prime numbers. Moreover, his methods are very unlike any that one would expect to find in the typical geometric group theory paper. The argument depends on the construction, going back (in some sense) to a paper of Shavgulidze from 1978, of a measure on the space C(I) of continuous functions on the interval which is quasi-invariant under the natural action of the group of diffeomorphisms of the interval of regularity at least C^3. In more detail, let D^n denote the group of diffeomorphisms of the interval of regularity at least C^n for each n, and let C denote the Banach space of continuous functions on the interval that vanish at the origin. Define A:D^1 \to C by the formula A(f)(t) = \log(f'(t)) - \log(f'(0)). The space C can be equipped with a natural measure — the Wiener measure w_\sigma of variance \sigma, and this measure can be pulled back to D^1 by A, which is thought of as a topological space with the C^1 topology. Shavgulidze shows that the left action of D^3 on D^1 quasi-preserves this measure. Here the Wiener measure on C is the probability measure associated to Brownian motion (with given variance). A “sample” trajectory W_t from C is characterized by three properties: that it starts at the origin (i.e. W_0=0), that is it continuous almost surely (this is already implicit in the fact that the measure is supported on the space C and not some more general space), and that increments are independent, with the distribution of W_t - W_s equal to a Gaussian with mean zero and variance (t-s)\sigma. Shavgulidze’s argument depends first on an argument of Ghys-Sergiescu that shows Thompson’s group is conjugate (by a homeomorphism) to a discrete subgroup of the group of C^\infty diffeomorphisms of the interval. A bounded function f on F determines a continuous bounded function \pi_\delta(f) on D^{1+\delta} (for \delta<1/2) by a certain convolution trick, using both the group structure of F, and its discreteness in D^3. Roughly, given an element g \in D^{1+\delta}, the set of elements of F whose (group) composition with g is uniformly bounded in the C^{1+\delta} norm is finite; so the value of \pi_\delta(f) is obtained by taking a suitable average of the value of f on this finite subset of F. This reduces the problem of the amenability of F to the existence of a suitable functional on the space of bounded continuous functions on D^{1+\delta}, which is constructed via the pulled back Wiener measure as above.

There are several distinctive features of Shavgulidze’s preprint. One of the most striking is that it depends on very delicate analytic features of the Wiener measure, and the way it transforms under the action of D^3 on D^1 — a transformation law involving the Schwartzian derivative — and suggesting that certain parts of the argument could be clarified (at least from the point of view of a topologist?) by using projective geometry and Sturm-Liouville theory. Another is that the crucial analytic quality — namely differentiability of class C^{1+1/2} — is also crucial for many other natural problems in 1-dimensional analysis and geometry, from regularity estimates in the thin obstacle problem, to Navas’ work on actions of property (T) groups on the circle. At least one of the preprints by Akhmedov and Shavgulidze must be in error (in fact, a real skeptic’s skeptic such as Michael Aschbacher is not even willing to concede that much . . .) but even if wrong, it is possible that they contain things more valuable than a resolution of the question that prompted them.

Update (7/6): Azer Akhmedov sent me a construction of a (nonabelian) free subgroup of D^1 that is discrete in the C^1 topology. This is not quite enough regularity to intersect with Shavgulidze’s program, but it is interesting, and worth explaining. This is my (minor) modification of Azer’s construction (any errors are due to me):

Proposition: The group D^1 contains a discrete nonabelian free subgroup.

Sketch of Proof: First, decompose the interval [0,1] into countably many disjoint subintervals accumulating only at the endpoints. Choose a free action on two generators by doing something generic on each subinterval, in such a way that the derivative is equal to 1 at the endpoints. This can certainly be accomplished; for concreteness, choose the action so that for each subinterval I_i there is a point x_i in the interior of I_i whose stabilizer is trivial.

Second, for each pair of distinct words in the generators, choose a subinterval and modify the action there so that the derivatives of those words in that subinterval differ by at least some definite constant C at some point. In more detail: enumerate the pairs of words somehow p_1, p_2, p_3 where each p_i is a pair of words (w_{i1}, w_{i2}) in the generators, and modify the action on the subinterval I_i so the words in p_i differ by at least C in the C^1 norm on the interval I_i. Since we are modifying the generators infinitely many times, but in such a way that the support of the modification exits any compact subset of the interior, we just need to check that the modifications are C^1. Since there are only finitely many pairs of words, both of which are of bounded length (for any given bound), when i is sufficiently big, one of the words w_{i1},  w_{i2} has length at least n(i) where n(i) goes to infinity as i goes to infinity. Without loss of generality, we can order the pairs so that w_{i1} is the “long” word.

Now this is how we modify the action in I_i. Recall that the point x_i has trivial stabilizer, so the translates y_{ij} of x_i under the suffixes of w_{i1} are distinct. Take disjoint intervals about the y_{ij} and observe that each y_{ij} is taken to y_{ij+1} by one of the generators. Modify this generator inside this disjoint neighborhood so that y_{ij} is still taken to y_{ij+1}, but the derivative at that point is multiplied by 1+ C/n(i), and the derivative at nearby points is not multiplied by more than 1+C/n(i). Since the neighborhoods of the y_{ij} are disjoint, these modifications are all compatible, and the derivative of the generators does not change by more than 1+C/n(i) at any point. Since n(i) goes to infinity as i goes to infinity, we can perform such modifications for each i, and the resulting action is still C^1. But now the derivative of w_{i1} at x_i has been multiplied by 1+C, so w_{i1} and w_{i2} differ by at least C in the C^1 norm.  qed.

It is interesting to observe that this construction, while C^1, is not C^{1+\epsilon} for any \epsilon>0. For big i, we have n(i) \sim \log(i) whereas |I_i| = o(1/i). Introducing a “bump” which modifies the derivative by 1/\log(i) in a subinterval of size o(1/i) will blow up every Holder norm.

(Update 8/10): Mark Sapir has created a webpage to discuss Shavgulidze’s paper here. Also, Matt Brin has posted notes on Shavgulidze’s paper here. The notes are very nice, and go into great detail, as far as they go. Matt promises to update the notes periodically.

(Update 11/18): Matt Brin has let me know by email that a significant gap has emerged in Shavgulidze’s argument. He writes:

Lemma 5 is still unproven. It claims a property about the distributions u_n on the simplexes D_n that is needed for the second part of the paper. The main result does not need the particular distributions u_n given in the paper, but does need distributions on the D_n that satisfy the properties claimed by Lemmas 5, 6 and that cooperate with Lemma 9. Ufe Haagerup claims an argument that the u_n in the paper does not satisfy the conclusion of Lemma 5. Another distribution was said to be suggested by Shavgulidze, but at last report, it did not seem to be working out.

In light of this, it would seem to be reasonable to consider the question of whether F is amenable as wide open.

(Update 9/21/2012): Justin Moore has posted a preprint on the arXiv claiming to prove amenability of F. It is too early to suggest that there is expert consensus on the correctness of the proof, but certainly everything I have heard is promising. I have not had time to look carefully at the argument yet, but hope to get a chance to do so before too long.

(Update 10/2/2012): Justin has withdrawn his claim of a proof. A gap was found by Akhmedov.

Posted in Commentary, Groups | Tagged , , , , | 12 Comments

Orderability, and groups of homeomorphisms of the disk

I have struggled for a long time (and I continue to struggle) with the following question:

Question: Is the group of self-homeomorphisms of the unit disk (in the plane) that fix the boundary pointwise a left-orderable group?

Recall that a group G is left-orderable if there is a total order < on the elements satisfying g<h if and only if fg < fh for all f,g,h \in G. For a countable group, the property of being left orderable is equivalent to the property that the group admits a faithful action on the interval by orientation-preserving homeomorphisms; however, this equivalence is not “natural” in the sense that there is no natural way to extract an ordering from an action, or vice-versa. This formulation of the question suggests that one is trying to embed the group of homeomorphisms of the disk into the group of homeomorphisms of the interval, an unlikely proposition, made even more unlikely by the following famous theorem of Filipkiewicz:

Theorem: (Filipkiewicz) Let M_1,M_2 be two compact manifolds, and r_1,r_2 two non-negative integers or infinity. Suppose the connected components of the identity of \text{Diff}^{r_1}(M_1) and \text{Diff}^{r_2}(M_2) are isomorphic as abstract groups. Then r_1=r_2 and the isomorphism is induced by some diffeomorphism.

The hard(est?) part of the argument is to identify a subgroup stabilizing a point in purely algebraic terms. It is a fundamental and well-studied problem, in some ways a natural outgrowth of Klein’s Erlanger programme, to perceive the geometric structure on a space in terms of algebraic properties of its automorphism group. The book by Banyaga is the best reference I know for this material, in the context of “flexible” geometric structures, with big transformation groups (it is furthermore the only math book I know with a pink cover).

Left orderability is inherited under extensions. I.e. if K \to G \to H is a short exact sequence, and both K and H are left orderable, then so is G. Furthermore, it is a simple but useful theorem of Burns and Hale that a group G is left orderable if and only if for every finitely generated subgroup H there is a left orderable group H' and a surjective homomorphism H \to H'. The necessity of this condition is obvious: a subgroup of a left orderable group is left orderable (by restricting the order), so one can take H' to be H and the surjection to be the identity. One can exploit this strategy to show that certain transformation groups are left orderable, as follows:

Example: Suppose G is a group of homeomorphisms of some space X, with a nonempty fixed point set. If H is a finitely generated subgroup of G, then there is a point y in the frontier of \text{fix}(H) so that H has a nontrivial image in the group of germs of homeomorphisms of X at y. If this group of germs is left-orderable for all y, then so is G by Burns-Hale.

Example: (Rolfsen-Wiest) Let G be the group of PL homeomorphisms of the unit disk (thought of as a PL square in the plane) fixed on the boundary. If H is a finitely generated subgroup, there is a point p in the frontier of \text{fix}(H). Note that H has a nontrivial image in the group of piecewise linear homeomorphisms of the projective space of lines through p. Since the fixed point set of a finitely generated subgroup is equal to the intersection of the fixed point sets of a finite generating set, it is itself a polyhedron. Hence H fixes some line through p, and therefore has a nontrivial image in the group of homeomorphisms of an interval. By Burns-Hale, G is left orderable.

Example: Let G be the group of diffeomorphisms of the unit disk, fixed on the boundary. If H is a finitely generated subgroup, then at a non-isolated point p in \text{fix}(H) the group H fixes some tangent vector to p (a limit of short straight lines from p to nearby fixed points). Consequently the image of H in \text{GL}(T_p) is reducible, and is conjugate into an affine subgroup, which is left orderable. If the image is nontrivial, we are done by Burns-Hale. If it is trivial, then the linear part of H at p is trivial, and therefore by the Thurston stability theorem, there is a nontrivial homomorphism from H to the (orderable) group of translations of the plane. By Burns-Hale, we conclude that G is left orderable.

The second example does not require infinite differentiability, just C^1, the necessary hypothesis to apply the Thurston stability theorem. This is such a beautiful and powerful theorem that it is worth making an aside to discuss it. Thurston’s theorem says that if H is a finitely generated group of germs of diffeomorphisms of a manifold fixing a common point, then a suitable limit of rescaled actions of the group near the fixed point converge to a nontrivial action by translations. One way to think of this is in terms of power series: if H is a group of real analytic diffeomorphisms of the line, fixing the point 0, then every h \in H can be expanded as a power series: h(x) = c_1(h)x + c_2(h)x^2 + \cdots. The function h \to c_1(h) is a multiplicative homomorphism; however, if the logarithm of c_1 is identically zero, then if i is the first index for which some c_i(h) is nonzero, then h \to c_i(h) is an additive homomorphism. The choice of coefficient i is a “gauge”, adapted to H, that sees the most significant nontrivial part; this leading term is a character (i.e. a homomorphism to an abelian group), since the nonabelian corrections have strictly higher degree. Thurston’s insight was to realize that for a finitely generated group of germs of C^1 diffeomorphisms with trivial linear part, one can find some gauge that sees the most significant nontrivial part of the action of the group, and at this gauge, the action looks abelian. There is a subtlety, that one must restrict attention to finitely generated groups of homeomorphisms: on each scale of a sequence of finer and finer scales, one of a finite generating set differs the most from the identity; one must pass to a subsequence of scales for which this one element is constant (this is where the finite generation is used). The necessity of this condition is demonstrated by a theorem of Sergeraert: the group of germs of (C^\infty) diffeomorphisms of the unit interval, infinitely tangent to the identity at both endpoints (i.e. with trivial power series at each endpoint) is perfect, and therefore admits no nontrivial homomorphism to an abelian group.

Let us now return to the original question. The examples above suggest that it might be possible to find a left ordering on the group of homeomorphisms of the disk, fixed on the boundary. However, I think this is misleading. The construction of a left ordering in either category (PL or smooth) was ad hoc, and depended on locality in two different ways: the locality of the property of left orderability (i.e. Burns-Hale) and the tameness of groups of PL or smooth homeomorphisms blown up near a common fixed point. Rescaling an arbitrary homeomorphism about a fixed point does not make things any less complicated. Burns-Hale and Filipkiewicz together suggest that one should look for a structural dissimilarity between the group of homeomorphisms of the disk and of the interval that persists in finitely generated subgroups. The simplest way to distinguish between the two spaces algebraically is in terms of their lattices of closed (or equivalently, open) subsets. To a topological space X, one can associate the lattice \Lambda(X) of (nonempty, for the sake of argument) closed subsets of X, ordered by inclusion. One can reconstruct the space X from this lattice, since points in X correspond to minimal elements. However, any surjective map X \to Y defines an embedding \Lambda(Y) \to \Lambda(X), so there are many structure-preserving morphisms between such lattices. The lattice \Lambda(X) is an \text{Aut}(X)-space in an obvious way, and one can study algebraic maps \Lambda(Y) \to \Lambda(X) together with homomorphisms \rho:\text{Aut}(Y) \to \text{Aut}(X) for which the algebraic maps respect the induced \text{Aut}(Y)-structures. A weaker “localization” of this condition asks merely that for points (i.e. minimal elements) p,p' \in \Lambda(Y) in the same \text{Aut}(Y)-orbit, their images in \Lambda(X) are in the same \text{Aut}(X)-orbit. This motivates the following:

Proposition: There is a surjective map from the unit interval to the unit disk so that the preimages of any two points are homeomorphic.

Sketch of Proof: This proposition follows from two simpler propositions. The first is that there is a surjective map from the unit interval to itself so that every point preimage is a Cantor set. The second is that there is a surjective map from the unit interval to the unit disk so that the preimage of any point is finite. A composition of these two maps gives the desired map, since a finite union of Cantor sets is itself a Cantor set.

There are many surjective maps from the unit interval to the unit disk so that the preimage of any point is finite. For example, if M is a hyperbolic three-manifold fibering over the circle with fiber S, then the universal cover of a fiber \widetilde{S} is properly embedded in hyperbolic 3-space, and its ideal boundary (a circle) maps surjectively and finitely-to-one to the sphere at infinity of hyperbolic 3-space. Restricting to a suitable subinterval gives the desired map.

To obtain the first proposition, one builds a surjective map from the interval to itself inductively; there are many possible ways to do this, and details are left to the reader. qed.

It is not clear how much insight such a construction gives.

Another approach to the original question involves trying to construct an explicit (finitely generated) subgroup of the group of homeomorphisms of the disk that is not left orderable. There is a “cheap” method to produce finitely presented groups with no left-orderable quotients. Let G = \langle x,y \; | \; w_1, w_2 \rangle be a group defined by a presentation, where w_1 is a word in the letters x and y, and w_2 is a word in the letters x and y^{-1}. In any left-orderable quotient in which both x and y are nontrivial, after reversing the orientation if necessary, we can assume that x > \text{id}. If further y>\text{id} then w_1 >\text{id}, contrary to the fact that w_1 = \text{id}. If y^{-1} >\text{id}, then w_2 >\text{id}, contrary to the fact that w_2=\text{id}. In either case we get a contradiction. One can try to build by hand nontrivial homeomorphisms x,y of the unit disk, fixed on the boundary, that satisfy w_1,w_2 =\text{id}. Some evidence that this will be hard to do comes from the fact that the group of smooth and PL homeomorphisms of the disk are in fact left-orderable: any such x,y can be arbitrarily well-approximated by smooth x',y'; nevertheless at least one of the words w_1,w_2 evaluated on any smooth x',y' will be nontrivial. Other examples of finitely presented groups that are not left orderable include higher Q-rank lattices (e.g. subgroups of finite index in \text{SL}(n,\mathbb{Z}) when n\ge 3), by a result of Dave Witte-Morris. Suppose such a group admits a faithful action by homeomorphisms on some closed surface of genus at least 1. Since such groups do not admit homogeneous quasimorphisms, their image in the mapping class group of the surface is finite, so after passing to a subgroup of finite index, one obtains a (lifted) action on the universal cover. If the genus of the surface is at least 2, this action can be compactified to an action by homeomorphisms on the unit disk (thought of as the universal cover of a hyperbolic surface) fixed pointwise on the boundary. Fortunately or unfortunately, it is already known by Franks-Handel (see also Polterovich) that such groups admit no area-preserving actions on closed oriented surfaces (other than those factoring through a finite group), and it is consistent with the so-called “Zimmer program” that they should admit no actions even without the area-preserving hypothesis when the genus is positive (of course, \text{SL}(3,\mathbb{R}) admits a projective action on S^2). Actually, higher rank lattices are very fragile, because of Margulis’ normal subgroup theorem. Every normal subgroup of such a lattice is either finite or finite index, so to prove the results of Franks-Handel and Polterovich, it suffices to find a single element in the group of infinite order that acts trivially. Unipotent elements are exponentially distorted in the word metric (i.e. the cyclic subgroups they generate are not quasi-isometrically embedded), so one “just” needs to show that groups of area-preserving diffeomorphisms of closed surfaces (of genus at least 1) do not contain such distorted elements. Some naturally occurring non-left orderable groups include some (rare) hyperbolic 3-manifold groups, amenable but not locally indicable groups, and a few others. It is hard to construct actions of such groups on a disk, although certain flows on 3-manifolds give rise to actions of the fundamental group on a plane.

Posted in Dynamics, Groups | Tagged , , , , , | 4 Comments

The topological Cauchy-Schwarz inequality

I recently made the final edits to my paper “Positivity of the universal pairing in 3 dimensions”, written jointly with Mike Freedman and Kevin Walker, to appear in Jour. AMS. This paper is inspired by questions that arise in the theory of unitary TQFT’s. An n+1-dimensional TQFT (“topological quantum field theory”) is a functor Z from the category of smooth oriented n-manifolds and smooth cobordisms between them, to the category of (usually complex) vector spaces and linear maps, that obeys the (so-called) monoidal axiom Z(A \coprod B) = Z(A) \otimes Z(B). The monoidal axiom implies that Z(\emptyset)=\mathbb{C}. Roughly speaking, the functor associates to a “spacelike slice” — i.e. to each n-manifold A — the vector space of “quantum states” on A (whatever they are), denoted Z(A). A cobordism stands in for the physical idea of the universe and its quantum state evolving in time. An n+1-manifold W bounding A can be thought of as a cobordism from the empty manifold to A, so Z(W) is a linear map from \mathbb{C} to Z(A), or equivalently, a vector in Z(A) (the image of 1 \in \mathbb{C}).

Note that as defined above, a TQFT is sensitive not just to the underlying topology of a manifold, but to its smooth structure. One can define variants of TQFTs by requiring more or less structure on the underlying manifolds and cobordisms. One can also consider “decorated” cobordism categories, such as those whose objects are pairs (A,K) where A is a manifold and K is a submanifold of some fixed codimension (usually 2) and whose morphisms are pairs of cobordisms (W,S) (e.g.  Wilson loops in a 2+1-dimensional TQFT).

In realistic physical theories, the space of quantum states is a Hilbert space — i.e. it is equipped with a nondegenerate inner product. In particular, the result of pairing a vector with itself should be positive. One says that a TQFT with this property is unitary. In the TQFT, reversing the orientation of a manifold interchanges a vector space with its dual, and pairing is accomplished by gluing diffeomorphic manifolds with opposite orientations. It is interesting to note that many 3+1-dimensional TQFTs of interest to mathematicians are not unitary; e.g. Donaldson theory, Heegaard Floer homology, etc. These theories depend on a grading, which prevents attempts to unitarize them. It turns out that there is a good reason why this is true, discussed below.

Definition: For any n-manifold S, let \mathcal{M}(S) denote the complex vector space spanned by the set of n+1-manifolds bounding S, up to a diffeomorphism fixed on S. There is a pairing on this vector space — the universal pairing — taking values in the complex vector space \mathcal{M} spanned by the set of closed n+1-manifolds up to diffeomorphism. If \sum_i a_iA_i and \sum_j b_jB_j are two vectors in \mathcal{M}(A), the pairing of these two vectors is equal to the formal sum \sum_{ij} a_i\overline{b}_j A_i\overline{B}_j where overline is complex conjugation on numbers, and orientation-reversal on manifolds, and A_i\overline{B}_j denotes the closed manifold obtained by gluing {}A_i to \overline{B}_j along S.

The point of making this definition is the following. If v \in \mathcal{M}(S) is a vector with the property that \langle v,v\rangle_S = 0 (i.e. the result of pairing v with itself is zero), then Z(v)=0 for any unitary TQFT Z. One says that the universal pairing is positive in n+1 dimensions if every nonzero vector v pairs nontrivially with itself.

Example: The Mazur manifold M is a smooth 4-manifold with boundary S. There is an involution \theta of S that does not extend over M, so M,\theta(M) denote distinct elements of \mathcal{M}(S). Let v = M - \theta(M), their formal difference. Then the result of pairing v with itself has four terms: \langle v,v\rangle_S = M\overline{M} - \theta(M)\overline{M} - M\overline{\theta(M)} + \theta(M)\overline{\theta(M)}. It turns out that all four terms are diffeomorphic to S^4, and therefore this formal sum is zero even though v is not zero, and the universal pairing is not positive in dimension 4.

More generally, it turns out that unitary TQFTs cannot distinguish s-cobordant 4-manifolds, and therefore they are insensitive to essentially all “interesting” smooth 4-manifold topology! This “explains” why interesting 3+1-dimensional TQFTs, such as Donaldson theory and Heegaard Floer homology (mentioned above) are necessarily not unitary.

One sees that cancellation arises, and a pairing may fail to be positive, if there are some unusual “coincidences” in the set of terms A_i\overline{B}_j arising in the pairing. One way to ensure that cancellation does not occur is to control the coefficients for the terms appearing in some fixed diffeomorphism type. Observe that the “diagonal” coefficients a_i\overline{a}_i are all positive real numbers, and therefore cancellation can only occur if every manifold appearing as a diagonal term is diffeomorphic to some manifold appearing as an off-diagonal term. The way to ensure that this does not occur is to define some sort of ordering or complexity on terms in such a way that the term of greatest complexity can occur only on the diagonal. This property — diagonal dominance — can be expressed in the following way:

Definition: A pairing \langle \cdot,\cdot \rangle_S as above satisfies the topological Cauchy-Schwarz inequality if there is a complexity function \mathcal{C} defined on all closed n+1-manifolds, so that if {}A,B are any two n+1-manifolds with boundary S, there is an inequality \mathcal{C}(A\overline{B}) \le \max(\mathcal{C}(A\overline{A}),\mathcal{C}(B\overline{B})) with equality if and only if A=B.

The existence of such a complexity function ensures diagonal dominance, and therefore the positivity of the pairing \langle\cdot,\cdot\rangle_S.

Example: Define a complexity function \mathcal{C} on closed 1-manifolds, by defining \mathcal{C}(M) to be equal to the number of components of M. This complexity function satisfies the topological Cauchy-Schwarz inequality, and proves positivity for the universal pairing in 1 dimension.

Example: A suitable complexity function can also be found in 2 dimensions. The first term in the complexity is number of components. The second is a lexicographic list of the Euler characteristics of the resulting pieces (i.e. the complexity favors more components of bigger Euler characteristic). The first term is maximized if and only if the pieces of A and B are all glued up in pairs with the same number of boundary components in S; the second term is then maximized if and only if each piece of A is glued to a piece of B with the same Euler characteristic and number of boundary components — i.e. if and only if A=B.

Positivity holds in dimensions below 3, and fails in dimensions above 3. The main theorem we prove in our paper is that positivity holds in dimension 3, and we do this by constructing an explicit complexity function which satisfies the topological Cauchy-Schwarz inequality.

Unfortunately, the function itself is extremely complicated. At a first pass, it is a tuple c=(c_0,c_1,c_2,c_3) where c_0 treats number of components, c_1 treats the kernel of \pi_1(S) \to \pi_1(A) under inclusion, c_2 treats the essential 2-spheres, and c_3 treats prime factors arising in the decomposition.

The term c_1 is itself very interesting: for each finite group G Witten and Dijkgraaf constructed a real unitary TQFT Z_G (i.e. one for which the resulting vector spaces are real), so that roughly speaking Z_G(S) is the vector space spanned by representations of \pi_1(S) into G up to conjugacy, and Z_G(A) is the vector that counts (in a suitable sense) the number of ways each such representation extends over \pi_1(A). The value of Z_G on a closed manifold is roughly just the number of representations of the fundamental group in G, up to conjugacy. The complexity c_1 is obtained by first enumerating all isomorphism classes of finite groups G_1,G_2,G_3 \cdots and then listing the values of Z_{G_i} in order. If the kernel of \pi_1(S) \to \pi_1(A) is different from the kernel of \pi_1(S) \to \pi_1(B), this difference can be detected by some finite group (this fact depends on the fact that 3-manifold groups are residually finite, proved in this context by Hempel); so c_1 is diagonal dominant unless these two kernels are equal; equivalently, if the maximal compression bodies of S in A and B are diffeomorphic rel. S. It is essential to control these compression bodies before counting essential 2-spheres, so this term must come before c_2 in the complexity.

The term c_3 has a contribution c_p from each prime summand. The complexity c_p itself is a tuple c_p = (c_S,c_h,c_a) where c_S treats Seifert-fibered pieces, c_h treats hyperbolic pieces, and c_a treats the way in which these are assembled in the JSJ decomposition. The term c_h is quite interesting; evaluated on a finite volume hyperbolic 3-manifold M it gives as output the tuple c_h(M) = (-\text{vol}(M),\sigma(M)) where \text{vol}(M) denotes hyperbolic volume, and \sigma(M) is the geodesic length spectrum, or at least those terms in the spectrum with zero imaginary part. The choice of the first term depends on the following theorem:

Theorem: Let S be an orientable surface of finite type so that each component has negative Euler characteristic, and let {}A,B be irreducible, atoroidal and acylindrical, with boundary S. Then A\overline{A},A\overline{B},B\overline{B} admit unique complete hyperbolic structures, and either 2\text{vol}(A\overline{B}) > \text{vol}(A\overline{A})+\text{vol}(B\overline{B}) or else 2\text{vol}(A\overline{B}) = \text{vol}(A\overline{A}) + \text{vol}(B\overline{B}) and S is totally geodesic in A\overline{B}.

This theorem is probably the most technically difficult part of the paper. Notice that even though in the end we are only interested in closed manifolds, we must prove this theorem for hyperbolic manifolds with cusps, since these are the pieces that arise in the JSJ decomposition. This theorem was proved for closed manifolds by Agol-Storm-Thurston, and our proof follows their argument in general terms, although there are more technical difficulties in the cusped case. One starts with the hyperbolic manifold A\overline{B}, and finds a least area representative of the surface S. Cut along this surface, and double (metrically) to get two singular metrics on the topological manifolds A\overline{A} and B\overline{B}. The theorem will be proved if we can show the volume of this singular metric is bigger than the volume of the hyperbolic metric. Such comparison theorems for volume are widely studied in geometry; in many circumstances one defines a geometric invariant of a Riemannian metric, and then shows that it is minimized/maximized on a locally symmetric metric (which is usually unique in dimensions >2). For example, Besson-Courtois-Gallot famously proved that a negatively curved locally symmetric metric on a manifold uniquely minimizes the volume entropy over all metrics with fixed volume (roughly, the entropy of the geodesic flow, at least when the curvature is negative).

Hamilton proved that if one rescales Ricci flow to have constant volume, then scalar curvature R satisfies R' = \Delta R + 2|\text{Ric}_0|^2 + \frac 2 3 R(R-r) where \text{Ric}_0 denotes the traceless Ricci tensor, and r denotes the spatial average of the scalar curvature R. If the spatial minimum of R is negative, then at a point achieving the minimum, \Delta R is non-negative, as are the other two terms; in other words, if one does Ricci flow rescaled to have constant volume, the minimum of scalar curvature increases (this fact remains true for noncompact manifolds, if one substitutes infimum for maximum). Conversely, if one rescales to keep the infimum of scalar curvature constant, volume decreases under flow. In 3 dimensions, Perelman shows that Ricci flow with surgery converges to the hyperbolic metric. Surgery at finite times occurs when scalar curvature blows up to positive infinity, so surgery does not affect the infimum of scalar curvature, and only makes volume smaller (since things are being cut out). Consequently, Perelman’s work implies that of all metrics on a hyperbolic 3-manifold with the infimum of scalar curvature equal to -6, the constant curvature metric is the unique metric minimizing volume.

Now, the metric on A\overline{A} obtained by doubling along a minimal surface is not smooth, so one cannot even define the curvature tensor. However, if one interprets scalar curvature as an “average” of Ricci curvature, and observes that a minimal surface is flat “on average”, then one should expect that the distributional scalar curvature of the metric is equal to what it would be if one doubled along a totally geodesic surface, i.e. identically equal to -6. So Perelman’s inequality should apply, and prove the desired volume estimate.

To make this argument rigorous, one must show that the singular metric evolves under Ricci flow, and instantaneously becomes smooth, with R \ge -6. A theorem of Miles Simon says that this follows if one can find a smooth background metric with uniform bounds on the curvature and its first derivatives, and which is 1+\epsilon-bilipschitz to the singular metric. The existence of such a background metric is essentially trivial in the closed case, but becomes much more delicate in the cusped case. Basically, one needs to establish the following comparison lemma, stated somewhat informally:

Lemma: Least area surfaces in cusps of hyperbolic 3-manifolds become asymptotically flat faster than the thickness of the cusp goes to zero.

In other words, if one lifts a least area surface S to a surface \tilde{S} in the universal cover, there is a (unique) totally geodesic surface \pi (the “osculating plane”) asymptotic to \tilde{S} at the fixed point of the parabolic element corresponding to the cusp, and satisfying the following geometric estimate. If B_t is the horoball centered at the parabolic fixed point at height t (for some horofunction), then the Hausdorff distance between \tilde{S} \cap B_t and \pi \cap B_t is o(e^{-t}). One must further prove that if a surface S has multiple ends in a single cusp, these ends osculate distinct geodesic planes. Given this, it is not too hard to construct a suitable background metric. Between ends of S, the geometry looks more and more like a slab wedged between two totally geodesic planes. The double of this is a nonsingular hyperbolic manifold, so it certainly enjoys uniform control on the curvature and its first derivatives; this gives the background metric in the thin part. In the thick part, one can convolve the singular metric with a bump function to find a bilipschitz background metric; compactness of the thick part implies trivially that any smooth metric enjoys uniform bounds on the curvature and its first derivatives. Hence one may apply Simon, and then Perelman, and the volume estimate is proved.

The Seifert fibered case is very fiddly, but ultimately does not require many new ideas. The assembly complexity turns out to be surprisingly involved. Essentially, one thinks of the JSJ decomposition as defining a decorated graph, whose vertices correspond to the pieces in the decomposition, and whose edges control the gluing along tori. One must prove an analogue of the topological Cauchy-Schwarz inequality in the context of (decorated) graphs. This ends up looking much more like the familiar TQFT picture of tensor networks, but a more detailed discussion will have to wait for another post.

Posted in 3-manifolds, TQFT | Tagged , , , , , , , , , , , , , , , , | 3 Comments

Big mapping class groups and dynamics

Mapping class groups (also called modular groups) are of central importance in many fields of geometry. If S is an oriented surface (i.e. a 2-manifold), the group \text{Homeo}^+(S) of orientation-preserving self-homeomorphisms of S is a topological group with the compact-open topology. The mapping class group of S, denoted \text{MCG}(S) (or \text{Mod}(S) by some people) is the group of path-components of \text{Homeo}^+(S), i.e. \pi_0(\text{Homeo}^+(S)), or equivalently \text{Homeo}^+(S)/\text{Homeo}_0(S) where \text{Homeo}_0(S) is the subgroup of homeomorphisms isotopic to the identity.

When S is a surface of finite type (i.e. a closed surface minus finitely many points), the group \text{MCG}(S) is finitely presented, and one knows a great deal about the algebra and geometry of this group. Less well-studied are groups of the form \text{MCG}(S) when S is of infinite type. However, such groups do arise naturally in dynamics.

Example: Let G be a group of (orientation-preserving) homeomorphisms of the plane, and suppose that G has a bounded orbit (i.e. there is some point p for which the orbit Gp is contained in a compact subset of the plane). The closure of such an orbit Gp is compact and G-invariant. Let K be the union of the closure of Gp with the set of bounded open complementary regions. Then K is compact, G-invariant, and has connected complement. Define an equivalence relation \sim on the plane whose equivalence classes are the points in the complement of K, and the connected components of K. The quotient of the plane by this equivalence relation is again homeomorphic to the plane (by a theorem of R. L. Moore), and the image of K is a totally disconnected set k. The original group G admits a natural homomorphism to the mapping class group of \mathbb{R}^2 - k. After passing to a G-invariant closed subset of k if necessary, we may assume that k is minimal (i.e. every orbit is dense). Since k is compact, it is either a finite discrete set, or it is a Cantor set.

The mapping class group of \mathbb{R}^2 - \text{finite set} contains a subgroup of finite index fixing the end of \mathbb{R}^2; this subgroup is the quotient of a braid group by its center. There are many tools that show that certain groups G cannot have a big image in such a mapping class group.

Much less studied is the case that k is a Cantor set. In the remainder of this post, we will abbreviate \text{MCG}(\mathbb{R}^2 - \text{Cantor set}) by \Gamma. Notice that any homeomorphism of \mathbb{R}^2 - \text{Cantor set} extends in a unique way to a homeomorphism of S^2, fixing the point at infinity, and permuting the points of the Cantor set (this can be seen by thinking of the “missing points” intrinsically as the space of ends of the surface). Let \Gamma' denote the mapping class group of S^2 - \text{Cantor set}. Then there is a natural surjection \Gamma \to \Gamma' whose kernel is \pi_1(S^2 - \text{Cantor set}) (this is just the familiar Birman exact sequence).

The following is proved in the first section of my paper “Circular groups, planar groups and the Euler class”. This is the first step to showing that any group G of orientation-preserving diffeomorphisms of the plane with a bounded orbit is circularly orderable:

Proposition: There is an injective homomorphism \Gamma \to \text{Homeo}^+(S^1).

Sketch of Proof: Choose a complete hyperbolic structure on S^2 - \text{Cantor set}. The Birman exact sequence exhibits \Gamma as a group of (equivalence classes) of homeomorphisms of the universal cover of this hyperbolic surface which commute with the deck group. Each such homeomorphism extends in a unique way to a homeomorphism of the circle at infinity. This extension does not depend on the choice of a representative in an equivalence class, and one can check that the extension of a nontrivial mapping class is nontrivial at infinity. qed.

This property of the mapping class group \Gamma does not distinguish it from mapping class groups of surfaces of finite type (with punctures); in fact, the argument is barely sensitive to the topology of the surface at all. By contrast, the next theorem demonstrates a significant difference between mapping class groups of surfaces of finite type, and \Gamma. Recall that for a surface S of finite type, the group \text{MCG}(S) acts simplicially on the complex of curves \mathcal{C}(S), a simplicial complex whose simplices are the sets of isotopy classes of essential simple closed curves in S that can be realized mutually disjointly. A fundamental theorem of Masur-Minsky says that \mathcal{C}(S) (with its natural simplicial path metric) is \delta-hyperbolic (though it is not locally finite). Bestvina-Fujiwara show that any reasonably big subgroup of \text{MCG}(S) contains lots of elements that act on \mathcal{C}(S) weakly properly, and therefore such groups admit many nontrivial quasimorphisms. This has many important consequences, and shows that for many interesting classes of groups, every homomorphism to a mapping class group (of finite type) factors through a finite group. In view of the potential applications to dynamics as above, one would like to be able to construct quasimorphisms on mapping class groups of infinite type.

Unfortunately, this does not seem so easy.

Proposition: The group \Gamma' is uniformly perfect.

Proof: Remember that \Gamma' denotes the mapping class group of S^2 - \text{Cantor set}. We denote the Cantor set in the sequel by C.

A closed disk D is a dividing disk if its boundary is disjoint from C, and separates C into two components (both necessarily Cantor sets). An element g \in \Gamma is said to be local if it has a representative whose support is contained in a dividing disk. Note that the closure of the complement of a dividing disk is also a dividing disk. Given any dividing disk D, there is a homeomorphism of the sphere \varphi permuting C, that takes D off itself, and so that the family of disks \varphi^n(D) are pairwise disjoint, and converge to a limiting point x \in C. Define h to be the infinite product h = \prod_i \varphi^i g \varphi^{-i}. Notice that h is a well-defined homeomorphism of the plane permuting C. Moreover, there is an identity [h^{-1},\varphi] = g, thereby exhibiting g as a commutator. The theorem will therefore be proved if we can exhibit any element of \Gamma' as a bounded product of local elements.

Now, let g be an arbitrary homeomorphism of the sphere permuting C. Pick an arbitrary p \in C. If g(p)=p then let h be a local homeomorphism taking p to a disjoint point q, and define g' = hg. So without loss of generality, we can find g' = hg where h is local (possibly trivial), and g'(p) = q \ne p. Let {}E be a sufficiently small dividing disk containing p so that g'(E) is disjoint from {}E, and their union does not contain every point of C. Join {}E to g'(E) by a path in the complement of C, and let D be a regular neighborhood, which by construction is a dividing disk. Let f be a local homeomorphism, supported in D, that interchanges {}E and g'(E), and so that f g' is the identity on D. Then fg' is itself local, because the complement of the interior of a dividing disk is also a dividing disk, and we have expressed g as a product of at most three local homeomorphisms. This shows that the commutator length of g is at most 3, and since g was arbitrary, we are done. qed.

The same argument just barely fails to work with \Gamma in place of \Gamma'. One can also define dividing disks and local homeomorphisms in \Gamma, with the following important difference. One can show by the same argument that local homeomorphisms in \Gamma are commutators, and that for an arbitrary element g \in \Gamma there are local elements h,f so that fhg is the identity on a dividing disk; i.e. this composition is anti-local. However, the complement of the interior of a dividing disk in the plane is not a dividing disk; the difference can be measured by keeping track of the point at infinity. This is a restatement of the Birman exact sequence; at the level of quasimorphisms, one has the following exact sequence: Q(\Gamma') \to Q(\Gamma) \to Q(\pi_1(S^2 - C))^{\Gamma'}.

The so-called “point-pushing” subgroup \pi_1(S^2 - C) can be understood geometrically by tracking the image of a proper ray from C to infinity. We are therefore motivated to consider the following object:

Definition: The ray graph R is the graph whose vertex set is the set of isotopy classes of proper rays r, with interior in the complement of C, from a point in C to infinity, and whose edges are the pairs of such rays that can be realized disjointly.

One can verify that the graph R is connected, and that the group \Gamma acts simplicially on R by automorphisms, and transitively on vertices.

Lemma: Let g \in \Gamma and suppose there is a vertex v \in R such that v,g(v) share an edge. Then g is a product of at most two local homeomorphisms.

Sketch of proof: After adjusting g by an isotopy, assume that r and g(r) are actually disjoint. Let E,g(E) be sufficiently small disjoint disks about the endpoint of r and g(r), and \alpha an arc from {}E to g(E) disjoint from r and g(r), so that the union r \cup E \cup \alpha \cup g(E) \cup g(r) does not separate the part of C outside E \cup g(E). Then this union can be engulfed in a punctured disk D' containing infinity, whose complement contains some of C. There is a local h supported in a neighborhood of E \cup \alpha \cup g(E) such that hg is supported (after isotopy) in the complement of D' (i.e. it is also local). qed.

It follows that if g \in\Gamma has a bounded orbit in R, then the commutator lengths of the powers of g are bounded, and therefore \text{scl}(g) vanishes. If this is true for every g \in \Gamma, then Bavard duality implies that \Gamma admits no nontrivial homogeneous quasimorphisms. This motivates the following questions:

Question: Is the diameter of R infinite? (Exercise: show \text{diam}(R)\ge 3)

Question: Does any element of \Gamma act on R with positive translation length?

Question: Can one use this action to construct nontrivial quasimorphisms on \Gamma?

Posted in Dynamics, Groups | Tagged , , , , , , , , , , , | 1 Comment

Measure theory, topology, and the role of examples

Bill Thurston once observed that topology and measure theory are very immiscible (i.e. they don’t mix easily); this statement has always resonated with me, and I thought I would try to explain some of the (personal, psychological, and mathematical) reasons why. On the face of it, topology and measure theory are very closely related. Both are concerned with spaces equipped with certain algebras of sets (open sets, measurable sets) and classes of functions (continuous functions, measurable functions). Continuous functions (on reasonable spaces) are measurable, and (some) measures can be integrated to define continuous functions. However, in my mind at least, they are very different in a psychological sense, and one of the most important ways in which they differ concerns the role of examples.

At the risk of oversimplifying, one might say that one modern mathematical tradition, perhaps exemplified by the Bourbakists, insists that examples are either irrelevant or misleading. There is a famous story about Grothendieck, retold in this article by Allyn Jackson, which goes as follows:

One striking characteristic of Grothendieck’s mode of thinking is that it seemed to rely so little on examples. This can be seen in the legend of the so-called “Grothendieck prime”. In a mathematical conversation, someone suggested to Grothendieck that they should consider a particular prime number. “You mean an actual number?” Grothendieck asked. The other person replied, yes, an actual prime number. Grothendieck suggested, “All right, take 57”.

Leaving aside the “joke” of Grothendieck’s (supposed) inability to factor 57, this anecdote has an instructive point. No doubt Grothendieck’s associate was expecting a small prime number such as 2 or 3. What would have been the reaction if Grothendieck had said “All right, take 2147483647“? When one considers examples, one is prone to consider simple examples; of course this is natural, but one must be aware that such examples can be misleading. Morwen Thistlethwaite once made a similar observation about knot theory; from memory he said something like:

When someone asks you to think about a knot, you usually imagine a trefoil, or a figure 8, or maybe a torus knot. But the right image to have in your mind is a room entirely filled with a long, tangled piece of string.

Note that there is another crucial function of examples, namely their role as counterexamples, which certify the invalidity of a general claim — such counterexamples should, of course, be as simple as possible (and even Grothendieck was capable of coming up with some); but I am concerned here and in the sequel with the role of “confirming” examples, so to speak.

At the other extreme(?), and again at the risk of oversimplifying, one might take the “A=B” (or Petkovsek-Wilf-Zeilberger) point of view, that sufficiently good/many examples are proofs. They give a beautifully simple but psychologically interesting example (Theorem 1.4.2 in A=B): to show that the angle bisectors of a triangle are coincident, it suffices to verify this for a sufficiently large but finite (explicit) number of examples. The reason such a proof is valid is that the co-ordinates of the pairwise intersections of the angle bisectors are rational functions of (certain trigonometric functions of) the angles, of an explicit (and easily determined) degree, and to prove an identity between rational functions, it suffices to prove that it holds for enough values. Another aspect of the A=B philosophy is that by the process of abstraction, a theorem in one context can become an example in another. For example, “even plus odd equals odd” might be a theorem over \mathbb{Z}, but an example over \mathbb{Z}/2\mathbb{Z}. One might say that the important thing about examples is that they should be sufficiently general that they exhibit all or enough of the complexity of the general case, and that if enough features of an example can be reimagined or abstracted as parameters, an example can become (or be translated into) a theorem.

In some fields of mathematics, one can make the idea of a “general example” rigorous. In algebraic geometry, one has the concept of a generic point on a scheme; in differential topology, one considers submanifolds in general position; in ergodic theory, one considers a normal number (or sequence in some fixed alphabet). In fact, it is not so clear whether a “formal” generic object in some domain should be thought of as the ultimate example, or as the ultimate rejection of the use of examples! In any case, in practice, when as mathematicians we select examples to test our ideas on, we rarely adhere to a rigorous procedure to ensure that our examples are good ones, and we are therefore susceptible to certain well-known psychological biases. The first is the availability heuristic, as defined by the psychologists Kahneman and Tversky, which says roughly that people tend to overemphasize the importance of examples that they can think of most easily. Why exactly is this bad? Well, because it interacts with another bias, that it is easier to think of an example which is more specific — e.g. it is easier to think of a fruit that starts with the letter “A” than just to think of a fruit. One might argue that this bias is unavoidable, given the nature of the task “think of an example of X” — e.g. it is much easier to find a unique solution (of a differential equation, of a system of linear equations, etc.) than to find a solution to an underdetermined problem. In fact, finding a unique solution is so much easier than solving an underdetermined problem, that one often tries to solve the underdetermined problem by adding constraints until the solution is unique and can be found (e.g. simplex method in linear programming). Conversely, this bias is also part of the explanation for why examples are so useful: the mind devotes more attention and mental resources to a more specific object. So even if one is interested in finding a rigorous and abstract proof, it is often easier to find a proof for a specific example, and then to “rewrite” the proof, replacing the specific example by the general case, and checking that no additional hypotheses are used. The second psychological bias is that of framing. A frame consists of a collection of schemata and stereotypes that provide a context in which an event is interpreted. Many mathematical concepts or objects can be formulated in many different ways which are all logically equivalent, but which frame the concept or object quite differently. The word “bird” suggests (to most people) a schema which involves flight, wings, beaks, etc. The mental image it conjures up will (almost never) resemble a flightless bird like a penguin, or a kiwi, unless extra cues are given, like “a bird indigenous to New Zealand”. A statement about covering spaces might be equivalent to a statement in group theory, but the first might bring to mind topological ideas like paths, continuous maps, compact subsets etc. while the second might suggest homomorphisms, exact sequences, extensions etc., and the examples suggested by the frames might be substantially (mathematically) different, sometimes in crucial ways.

Back to measure theory and topology. In topology, one is frequently (always?) interested in a topological space. Here context is very important — a “topological space” could be a finite set, a graph, a solenoid, a manifold, a Cantor set, a sheaf, a CW complex, a lamination, a profinite group, a Banach space, the space of all compact metric spaces (up to isometry) in the Gromov-Hausdorff metric, etc. By contrast, a “measure space” is an interval, plus some countable (possibly empty) collection of atoms. Of course, one thinks of a measure space much more concretely, by adding some incidental extra structure which is irrelevant to the measure structure, but crucial to the mathematical (and psychological) interest of the space; hence a “measure space” could be the space of infinite sequences in a fixed alphabet, the Sierpinski gasket in the plane, the attractor of an Axiom A diffeomorphism, and so on. In other words, one is typically interested in measure theory as a tool to explore some mathematical object with additional structure, whereas one is frequently interested in topological spaces as objects of intrinsic interest in their own right. Many interesting classes of topological objects can be visualized in great detail — sometimes so much detail that in practice one generates proofs by examining sufficiently complicated examples, and building up clear and detailed mental models of the kinds of phenomena that can occur in a given context. Visualizing a “typical” measurable set (even a subset of \mathbb{R} or the plane) or map is much more difficult, if it is even possible (or, for that matter, a non-measurable set). In fact, one tends routinely to bump up against important subtleties in mathematical logic (especially set theory) when trying even to define such elusive entities as a “typical” measurable subset of the interval. For instance, Solovay’s famous theorem says (amongst other things) that the statement “every set of real numbers is Lebesgue measurable” is consistent with Zermelo-Frankel set theory without the axiom of choice (in fact, Solovay’s result is relative to the existence of certain large cardinals — so-called inaccessible cardinals). Solovay explicitly addresses in his paper the issue of explicitly describing a non-Lesbesgue measurable set or reals:

Of course the axiom of choice is true, and so there are non-measurable sets. It is natural to ask if one can explicitly describe a non-Lesbegue measurable set.

Say that a set of reals A is definable from a set x_0 if there is a formula \Psi(x,y) having free only the variables x and y, so that A = \lbrace y : \Psi(x_0,y)\rbrace. Solovay shows that (again, assuming the existence of an inaccessible cardinal), even if the axiom of choice is true, every set of reals definable from a countable sequence of ordinals is Lebesgue measurable (interestingly enough, one of the most important concepts introduced by Solovay in his paper is the notion of a random real, namely a real number that is not contained in any of a certain class of Borel sets of measure zero, namely those that are rational (i.e. those that can be encoded in a certain precise way); this resonates somewhat with the “generic points” and “normal numbers” mentioned earlier).

If imagining “good” examples in measure theory is hard, what is the alternative? Evidently, it is to imagine “bad” examples, or at least very non-generic ones. Under many circumstances, the “standard” mental image of a measurable map is a piecewise-constant map from the unit interval to a countable (even finite) set. This example rests on two approximations: the process of building up an arbitrary Borel set (in \mathbb{R}, say) from half-open intervals by complementing, intersections and unions; and the process of defining an arbitrary measurable function as a limit of a sequence of finite sums of multiples of indicator functions. Such a mental image certainly has its uses, but for my own part I think that if one is going to use such a mental model anyway, one should be aggressive about using one’s intuition about continuous functions and open sets to make the example as specific, as rich and as “generic” as possible, while understanding that the result is not the measurable function or set itself, but only an approximation to it, and one should try to keep in mind a sequence of such maps, with increasing complexity and richness (if possible).

Of course, non-measurable sets do arise in practice. If one wants in the end to prove a theorem whose truth or falsity does not depend on the Axiom of choice, then by Solovay one could do without such sets if necessary. The fact that we do not must mean that the use of non-measurable sets (necessarily constructed using the Axiom of choice) leads to shorter/more findable proofs, or more understandable proofs, or both. Let me mention a few examples of situations in which the Axiom of choice is extremely useful:

  1. The Hahn-Banach theorem in functional analysis
  2. The existence of ultralimits of sequences of metric spaces (equivalently, the existence of Stone-Cech compactifications)
  3. A group G is said to be left orderable if there is a total order < on G such that g<h implies fg < fh for all f,g,h \in G. If S is a finite subset of nontrivial elements of G, the order partitions S into S^+ \cup S^-, where the superscript denotes the elements that are greater than, respectively less than the identity element. Suppose for some finite set S, and every partition into S^+ \cup S^- some product of elements of one of the subsets (with repeats allowed) is equal to the identity. Then necessarily G is not left orderable. In fact, the converse is also true: if no such “poison” subset exists, then G is left orderable. This follows from the compactness of the set of partitions of G-\text{id} into two subsets G^+ \cup G^- (equivalently, the compactness of the set \lbrace -1,1\rbrace^{G-\text{id}}) which follows from Tychonoff’s theorem.
  4. The existence of Galois automorphisms of \mathbb{C} over \mathbb{Q} (other than complex conjugation). Such automorphisms are necessarily non-measurable, and therefore (by Solovay) cannot be constructed without the axiom of choice. In fact, this follows from a theorem of Mackey, that any measurable homomorphism between (sufficiently nice, i.e. locally compact, second countable) topological groups is continuous. We give the sketch of a proof. Suppose f:G \to H is given, and without loss of generality, assume it is surjective. Let U be a neighborhood of the identity in H, and let V be a symmetric open neighborhood of the identity with V\cdot V \subset U. The group G is covered by countably many translates of f^{-1}(V), and therefore the measure of f^{-1}(V) is positive. Let X \subset f^{-1}(V) \subset W where X is compact, W is open, and such that the (Haar) measure of W is less than twice the Haar measure of X (the existence of such an open set W depends on the fact that measure agrees with outer measure for measurable sets). Since W is open, there is an open neighborhood T of the identity in G so that tX \subset W for all t \in T. But tX and X both have measure more than half the measure of W, so they intersect. Since V is symmetric, so is f^{-1}(V), and therefore T \subset f^{-1}(V\cdot V). This implies f is continuous, as claimed. A continuous Galois automorphism of \mathbb{C} is either the identity, or complex conjugation.

Personally, I think that one of the most compelling reasons to accept the Axiom of choice is psychological, and is related to the phenomenon of closure. If we see a fragment of a scene or a pattern, our mind fills in the rest of the scene or pattern for us. We have no photoreceptor cells in our eyes where the optic nerve passes through the retina, but instead of noticing this gap, we have an unperceived blind spot in our field of vision. If we can choose an element of a finite set whenever we want to, we feel as though nothing would stop us from making such a choice infinitely often. We are inclined to accept a formula with a “for all” quantifier  ranging over an infinite set, if the formula holds every time we check it. We are inclined to see patterns — even where they don’t exist. This is the seductive and dangerous (?) side of examples, and maybe a reason to exercise a little caution.

In fact, this discussion barely scratches the surface (and does not really probe into either topology or measure theory in any deep way). I would be very curious to hear contrasting opinions.

Update (6/20): There are many other things that I could/should have mentioned about the interaction between measure theory and topology, and the difficulty of finding good generic examples in measure theory. For example, I definitely should have mentioned:

  1. Lusin’s theorem, which says that a measurable function is continuous on almost all its domain; e.g. if f is any measurable function on an interval [a,b], then for any positive \epsilon there is a compact subset E \subset [a,b] so that the measure of [a,b] - E is at most \epsilon, and f is continuous on E.
  2. von Neumann’s theorem, that a Borel probability measure \mu on the unit cube I^n in \mathbb{R}^n is equivalent to Lesbesgue measure (on the cube) by a self-homeomorphism of the cube (which can be taken to be the identity on the boundary) if and only if it is nonatomic, gives the boundary measure zero, and is positive on nonempty relatively open subsets.
  3. Pairs of mutually singular measures of full support on simple sets. For example, let C be the Cantor set of infinite strings in the alphabet \lbrace 0,1\rbrace with its product topology, and define an infinite string W_\infty in C inductively as follows. For any string w, define the complement w^c to be the string whose digits are obtained from w by interchanging 0 and 1. Then define W_1 to be the string 1, and inductively define W_{n+1}= W_n W_n^c W_n^c \cdots W_n^c where there are f(n)-1 copies of W_n^c, and f(n) is chosen so that \prod_n (f(n)-1)/f(n) = r > 1/2. Let \Lambda be the set of accumulation points of W_\infty under the shift map. Any finite string that appears in W_\infty appears with definite density, so \Lambda is invariant and minimal (i.e. every orbit is dense) under the shift map. However, the proportion of 1‘s in W_n is at least r for n odd, and at most 1-r for n even. Let d_n denote the Dirac measure on the infinite string W_nW_nW_n\cdots, and let e_n denote the average of d_n over its (finite) orbit under the shift map. Define \mu_0^i = \frac 1 i \sum_{j\le i} e_{2j} and \mu_1^i = \frac 1 i \sum_{j\le i} e_{2j+1}. These probability measures are shift-invariant, and have shift-invariant weak limits \mu_0,\mu_1 as i \to \infty with support in \Lambda. Moreover, if \Lambda_j denotes the strings in \Lambda that start with j for j=0,1, then \mu_j(\Lambda_j) \ge r. In particular, the space of shift invariant probability measures on \Lambda is at least 1-dimensional, and we may therefore obtain distinct mutually singular ergodic shift-invariant probability measures on \Lambda. Since \Lambda is minimal, both measures have full support.
  4. Shelah’s theorem that if one works in (ZF) plus the axiom of dependent choice, if there is an uncountable well-ordered set of reals, then there is a non-(Lebesgue) measurable set, which shows the necessity of Solovay’s use of inaccessible cardinals. (By Solovay, the axiom of dependent choice is consistent with the statement that every set of reals is Lebesgue-measurable).
Posted in Psychology | Tagged , , , , , , , , , , | 4 Comments

Round slices of pointy objects

A regular tetrahedron (in \mathbb{R}^3) can be thought of as the convex hull of four pairwise non-adjacent vertices of a regular cube. A bisecting plane parallel to a face of the cube intersects the tetrahedron in a square (one can think of this as the product of two intervals, contained as the middle slice of the join of two intervals). A plane bisecting the long diagonal of a regular cube intersects the cube in a regular hexagon. In each case, the “slice”  one obtains is “rounder” (in some sense) than the original pointy object.

The unit ball in the L_1 norm on \mathbb{R}^n is a “diamond”, the dual polyhedron to an n-cube (which is the unit ball in the L_\infty norm). In three dimensions, the unit cube is an octahedron, the dual of an (ordinary) cube. This is certainly a very pointy object — in fact, for very large n, almost all the mass of such an object is arbitrarily close to the origin (in the ordinary Euclidean norm). Suppose one intersects such a diamond with a “random” m-dimensional linear subspace V. The intersection is a polyhedron, which is the unit ball in the restriction of the L_1 norm to the subspace V. A somewhat surprising phenomenon is that when n is very big compared to m, and V is chosen “randomly”, the intersection of V with this diamond is very round — i.e. a “random” small dimensional slice of L_1 looks like (a scaled copy of) L_2. In fact, one can replace L_1 by L_p here for any p (though of course, one must be a bit more precise what one means by “random”).

We can think of obtaining a “random” m-dimensional subspace of n-dimensional space by choosing n linear maps L_i \in (\mathbb{R}^m)^*:= \text{Hom}(\mathbb{R}^m,\mathbb{R}) and using them as the co-ordinates of a linear map L = \oplus_i L_i:\mathbb{R}^m \to \mathbb{R}^n. For a generic choice of the L_i, the image has full rank, and defines an m-dimensional subspace. So let \mu be a probability measure on (\mathbb{R}^m)^*, and let L define a random embedding of \mathbb{R}^m into \mathbb{R}^n. The co-ordinates of L determine a finite subset of (\mathbb{R}^m)^* of cardinality n; the uniform probability measure with this subset as support is itself a measure \nu, and we can easily compute that \|L(v)\|_p = \left( \int_{\pi \in (\mathbb{R}^m)^*} |\pi(v)|^p d\nu \right)^{1/p}. For n big compared to m, the measure \nu is almost surely very close (in the weak sense) to \mu.  If we choose \mu to be \text{O}(m)-invariant, it follows that the pullback of the L_p norm on \mathbb{R}^n to \mathbb{R}^m under a random L is itself almost \text{O}(m)-invariant, and is therefore very nearly propotional to the L_2 norm. In particular, the pullback of the L_2 norm on \mathbb{R}^n is very nearly equal to (a multiple of) the L_2 norm on \mathbb{R}^m, so (after rescaling), L is very close to an isometry, and the intersection of L(\mathbb{R}^m) with the unit ball in \mathbb{R}^n in the L_p norm is very nearly round.

Dvoretzky’s theorem says that any infinite dimensional Banach space contains finite dimensional subspaces that are arbitrarily close to L_2 in given finite dimension m. In fact, any symmetric convex body in \mathbb{R}^n for large n depending only on m,\epsilon, admits an m-dimensional slice which is within \epsilon of being spherical. On the other hand, Pelczynski showed that any infinite dimensional subspace of \ell_1 contains a further subspace which is isomorphic to \ell_1, and is complemented in \ell_1; in particular, \ell_1 does not contain an isometric copy of \ell_2, or in fact of any infinite dimensional Banach space with a separable dual (I learned these facts from Assaf Naor).

Posted in Convex geometry | Tagged , , , | Leave a comment