You are currently browsing the category archive for the ‘Ergodic Theory’ category.

A few weeks ago, Ian Agol, Vlad Markovic, Ursula Hamenstadt and I organized a “hot topics” workshop at MSRI with the title Surface subgroups and cube complexes. The conference was pretty well attended, and (I believe) was a big success; the organizers clearly deserve a great deal of credit. The talks were excellent, and touched on a wide range of subjects, and to those of us who are mid-career or older it was a bit shocking to see how quickly the landscape of low-dimensional geometry/topology and geometric group theory has been transformed by the recent breakthrough work of (Kahn-Markovic-Haglund-Wise-Groves-Manning-etc.-) Agol. Incidentally, when I first started as a graduate student, I had a vague sense that I had somehow “missed the boat” — all the exciting developments in geometry due to Thurston, Sullivan, Gromov, Freedman, Donaldson, Eliashberg etc. had taken place 10-20 years earlier, and the subject now seemed to be a matter of fleshing out the consequences of these big breakthroughs. 20 years and several revolutions later, I no longer feel this way. (Another slightly shocking aspect of the workshop was for me to realize that I am older or about as old as 75% of the speakers . . .)

The rationale for the workshop (which I had some hand in drafting, and therefore feel comfortable quoting here) was the following:

Recently there has been substantial progress in our understanding of the related questions of which hyperbolic groups are cubulated on the one hand, and which contain a surface subgroup on the other. The most spectacular combination of these two ideas has been in 3-manifold topology, which has seen the resolution of many long-standing conjectures. In turn, the resolution of these conjectures has led to a new point of view in geometric group theory, and the introduction of powerful new tools and structures. The goal of this conference will be to explore the further potential of these new tools and perspectives, and to encourage communication between researchers working in various related fields.

I have blogged a bit about cubulated groups and surface subgroups previously, and I even began this blog (almost 4 years ago now) initially with the idea of chronicling my efforts to attack Gromov’s surface subgroup question. This question asks the following:

Gromov’s Surface Subgroup Question: Does every one-ended hyperbolic group contain a subgroup which is isomorphic to the fundamental group of a closed surface of genus at least 2?

The restriction to one-ended groups is just meant to rule out silly examples, like finite or virtually cyclic groups (i.e. “elementary” hyperbolic groups), or free products of simpler hyperbolic groups. Asking for the genus of the closed surface to be at least 2 rules out the sphere (whose fundamental group is trivial) and the torus (whose fundamental group \mathbb{Z}^2 cannot be a subgroup of a hyperbolic group). It is the purpose of this blog post to say that Alden Walker and I have managed to show that Gromov’s question has a positive answer for “most” hyperbolic groups; more precisely, we show that a random group (in the sense of Gromov) contains a surface subgroup (in fact, many surface subgroups) with probability going to 1 as a certain natural parameter (the “length” n of the random relators) goes to infinity. (update April 8: the preprint is available from the arXiv here.)

Read the rest of this entry »

Let F=\langle a,b\rangle be the free group on two generators, and let \phi:F \to F be the endomorphism defined on generators by \phi(a)=ab and \phi(b)=ba. We define Sapir’s group C to be the ascending HNN extension

F*_\phi:=\langle a,b,t\; | \; a^t=ab,b^t=ba\rangle

This group was studied by Crisp-Sageev-Sapir in the context of their work on right-angled Artin groups, and independently by Feighn (according to Mark Sapir); both sought (unsuccessfully) to determine whether C contains a subgroup isomorphic to the fundamental group of a closed, oriented surface of genus at least 2. Sapir has conjectured in personal communication that C does not contain a surface subgroup, and explicitly posed this question as Problem 8.1 in his problem list.

After three years of thinking about this question on and off, Alden Walker and I have recently succeeded in finding a surface subgroup of C, and it is the purpose of this blog post to describe this surface, how it was found, and some related observations. By pushing the technique further, Alden and I managed to prove that for a fixed free group F of finite rank, and for a random endomorphism \phi of length n (i.e. one taking the generators to random words of length n), the associated HNN extension contains a closed surface subgroup with probability going to 1 as n \to \infty. This result is part of a larger project which we expect to post to the arXiv soon.

Read the rest of this entry »

Ian gave his second and third talks this afternoon, completing his (quite detailed) sketch of the proof of the Virtual Haken Theorem. Recall that after work of Kahn-Markovic, Wise, Haglund-Wise and Bergeron-Wise, the proof reduces to showing the following:

Theorem (Agol): Let G be a hyperbolic group acting properly discontinuously and cocompactly on a CAT(0) cube complex X. Then there is a finite index subgroup G’ so that X/G’ is special; in other words, G is virtually special.

Read the rest of this entry »

This morning I was playing trains with my son Felix. At the moment he is much more interested in laying the tracks than putting the trains on and moving them around, but he doesn’t tend to get concerned about whether the track closes up to make a loop. The pieces of track are all roughly the following shape:

Read the rest of this entry »

I am in Kyoto right now, attending the twenty-first Nevanlinna colloquium (update: took a while to write this post – now I’m in Sydney for the Clay lectures). Yesterday, Junjiro Noguchi gave a plenary talk on Nevanlinna theory in higher dimensions and related Diophantine problems. The talk was quite technical, and I did not understand it very well; however, he said a few suggestive things early on which struck a chord.

The talk started quite accessibly, being concerned with the fundamental equation

a +b = c

where a,b,c are coprime positive integers. The abc conjecture, formulated by Oesterlé and Masser, says that for any positive real number \epsilon, there is a constant C_\epsilon so that

\max(a,b,c) \le C_\epsilon\text{rad}(abc)^{1+\epsilon}

where \text{rad}(abc) is the product of the distinct primes appearing in the product abc. Informally, this conjecture says that for triples a,b,c satisfying the fundamental equation, the numbers a,b,c are not divisible by “too high” powers of a prime. The abc conjecture is known to imply many interesting number theoretic statements, including (famously) Fermat’s Last Theorem (for sufficiently large exponents), and Roth’s theorem on diophantine approximation (as observed by Bombieri).

Roth’s theorem is the following statement:

Theorem(Roth, 1955): Let \alpha be a real algebraic number. Then for any \epsilon>0, the inequality |\alpha - p/q| < q^{-(2+\epsilon)} has only finitely many solutions in coprime integers p,q.

This inequality is best possible, in the sense that every irrational number can be approximated by infinitely many rationals p/q to within 1/2q^2. In fact, the rationals appearing in the continued fraction approximation to \alpha have this property. There is a very short and illuminating geometric proof of this fact.

In the plane, construct a circle packing with a circle of radius 1/2q^2 with center p/q,1/2q^2 for each coprime pair p,q of integers.

circles_1This circle packing nests down on the x-axis, and any vertical line (with irrational x-co-ordinate) intersects infinitely many circles. If the x co-ordinate of a vertical line is \alpha, every circle the line intersects gives a rational p/q which approximates \alpha to within 1/2q^2. qed.

On the other hand, consider the corresponding collection of circles with radius 1/2q^{2+\epsilon}. Some “space” appears between neighboring circles, and they no longer pack tightly (the following picture shows \epsilon = 0.2).

circles_2The total cross-sectional width of these circles, restricted to pairs p/q in the interval [0,1), can be estimated as follows. Each p/q contributes a width of 1/2q^{2+\epsilon}. Ignoring the coprime condition, there are q fractions of the form p/q in the interval [0,1), so the total width is less than \frac 1 2 \sum_q q^{-1-\epsilon} which converges for positive \epsilon. In other words, the total cross-sectional width of all circles is finite. It follows that almost every vertical line intersects only finitely many circles.

Some vertical lines do, in fact, intersect infinitely many circles; i.e. some real numbers are approximated by infinitely many rationals to better than quadratic accuracy; for example, a Liouville number like \sum_{n=1}^\infty 10^{-n!}.

Some special cases of Roth’s theorem are much easier than others. For instance, it is very easy to give a proof when \alpha is a quadratic irrational; i.e. an element of \mathbb{Q}(\sqrt{d}) for some integer d. Quadratic irrationals are characterized by the fact that their continued fraction expansions are eventually periodic. One can think of this geometrically as follows. The group \text{PSL}(2,\mathbb{Z}) acts on the upper half-plane, which we think of now as the complex numbers with non-negative imaginary part, by fractional linear transformations z \to (az+b)/(cz+d). The quotient is a hyperbolic triangle orbifold, with a cusp. A vertical line in the plane ending at a point \alpha on the x-axis projects to a geodesic ray in the triangle orbifold. A rational number p/q approximating \alpha to within 1/2q^2 is detected by the geodesic entering a horoball centered at the cusp. If \alpha is a quadratic irrational, the corresponding geodesic ray eventually winds around a periodic geodesic (this is the periodicity of the continued fraction expansion), so it never gets too deep into the cusp, and the rational approximations to \alpha never get better than C/2q^2 for some constant C depending on \alpha, as required. A different vertical line intersecting the x-axis at some \beta corresponds to a different geodesic ray; the existence of good rational approximations to \beta corresponds to the condition that the corresponding geodesic goes deeper and deeper into the cusp infinitely often at a definite rate (i.e. at a distance which is at least some fixed (fractional) power of time). A “random” geodesic on a cusped hyperbolic surface takes time n to go distance \log{n} out the cusp (this is a kind of equidistribution fact – the thickness of the cusp goes to zero like e^{-t}, so if one chooses a sequence of points in a hyperbolic surface at random with respect to the uniform (area) measure, it takes about n points to find one that is distance \log{n} out the cusp). If one expects that every geodesic ray corresponding to an algebraic number looks like a “typical” random geodesic, one would conjecture (and in fact, Lang did conjecture) that there are only finitely many p/q for which |p/q - \alpha| < q^{-2}(\log{q})^{-1-\epsilon} for any \epsilon > 0.

A slightly different (though related) geometric way to see the periodicity of the continued fraction expansion of a quadratic irrational is to use diophantine geometry. This is best illustrated with an example. Consider the golden number \alpha = (1+\sqrt{5})/2. The matrix A=\left( \begin{smallmatrix} 2 & 1 \ 1 & 1 \end{smallmatrix} \right) has \left( \begin{smallmatrix} \alpha \ 1 \end{smallmatrix} \right) and \left( \begin{smallmatrix} \bar{\alpha} \ 1 \end{smallmatrix} \right) as eigenvectors (here \bar{\alpha} denotes the “conjugate” 1-\alpha), and thus preserves a “wedge” in \mathbb{R}^2 bounded by lines with slopes \alpha and \bar{\alpha}. The set of integer lattice points in this wedge is permuted by A, and therefore so is the boundary of the convex hull of this set (the sail of the cone). Lattice points on the sail correspond to rational approximations to the boundary slopes; the fact that A permutes this set corresponds to the periodicity of the continued fraction expansion of \alpha (and certifies the fact that \alpha cannot be approximated better than quadratically by rational numbers).

There is an analogue of this construction in higher dimensions: let A be an n\times n integer matrix whose eigenvalues are all real, positive, irrational and distinct. A collection of n suitable eigenvectors spans a polyhedral cone which is invariant under A. The  convex hull of the set of integer lattice points in this cone is a polyhedron, and the vertices of this polyhedron (the vertices on the sail) are  the “best” integral approximations to the eigenvectors. In fact, there is a \mathbb{Z}^{n-1} subgroup of \text{SL}(n,\mathbb{Z}) consisting of matrices with the same set of eigenvectors (this is a consequence of Dirichlet’s theorem on the structure of the group of units in the integers in a number field). Hence there is a group that acts discretely and co-compactly on the vertices of the sail, and one gets a priori estimates on how well the eigenvectors can be approximated by integral vectors. It is interesting to ask whether one can give a proof of Roth’s theorem along these lines, at least for algebraic numbers in totally real fields, but I don’t know the answer.

Jeremy Kahn kindly sent me a more detailed overview of his argument with Vlad Markovic, that I blogged earlier about here (also see Jesse Johnson’s blog for other commentary). With his permission, this is reproduced below in its entirety.

Editorial note: I have latexified Jeremy’s email; hence “dhat-mu” becomes \hat{d}\mu, “boundary-hat” becomes \hat{d}, and “boundary-tilde” becomes \tilde{d}. I also linkified the link to Caroline Series’ paper.


Hi Danny,


I was busy with the conference on Thursday and Friday, and taking a break on Saturday, and now I’ve finally had a chance to read your blog, and reply to your message. I decided (especially as Jesse had requested it) to write out a complete outline of the theorem. I’m sending a copy of this message to you, Jesse Johnson, Ian Agol, and Francois Labourie: you are all welcome to reproduce it, as long as it is reproduced in its entirety, and states clearly that this is joint work with Vladimir Markovic. Of course, time and energy permitting, I’ll be happy to answer any questions.

Here is an outline of the argument, working backwards to make it clearer:

1. We want to construct a surface made out of skew pants, each of which has complex half-length close to R, and which are joined together so that the complex twist-bends are within o(1/R) of 1. Using a paper of Caroline
Series (published in the Pacific J. of Mathematics) we show that these surfaces are quasi-isometrically embedded in the universal cover of the three-manifold.

2. Consider the following two conditions on two Borel measures \mu and \nu on a metric space X with the same (finite) total measure:

A. For every Borel subset A of X, \mu(A) is less than or equal to the \nu-measure of an \epsilon neighborhood of A.

B. There is a measure space (Y, \eta) and functions f: Y \to X and g: Y \to X such that \mu and \nu are the push-forwards by f and g respectively of the measure \eta, and the distance in X between f(y) and g(y) is less than \epsilon for almost every y \in Y.

It is easy to show that B implies A (also that A is symmetric in \mu and \nu!). In the case where \mu and \nu are discrete and integral measures (the measure of every point is a non-negative integer), we can show that A implies B (and Y will be a finite set with the counting measure) using Hall’s marriage theorem. In fact, the statement that A implies B for discrete and integral measures is easily shown to be equivalent to Hall’s marriage theorem. I don’t know if A implies B in general because I don’t know how to replace the inductive algorithm for Hall’s marriage theorem with a method that works for a relation between two general measure spaces.

We call \mu and \nu \epsilon-equivalent if they satisfy condition A, and note that the condition is additively transitive: if \mu is \epsilon-equivalent to \nu, and \nu is \delta-equivalent to \rho, then \mu and \rho are (\epsilon+\delta)-equivalent.

3. Suppose that \gamma is one boundary component of a pair of skew pants P. We can form the common orthogonals in P from \gamma to each of other other two cuffs. For each common orthogonal, at the point where it meets \gamma, we can find a unit normal vector to \gamma that points along this common orthogonal. The two resulting normal vectors are related by a translation along the half-length of \gamma (the suitable square root of the loxodromic element for \gamma), so we will call them a pair of opposite unit normal vectors (or pounv for short) and they live in the live in the bundle of pounv’s which is conformally equivalent to the complex plane mod the lattice generated by the half-length of \gamma and 2\pi i. We give the bundle of pounv’s the Euclidean metric inherited from the complex plane, and also the Lebesgue measure.

4. Given a measure on pants we can produce a measure on the union pounv bundles of the boundary geodesics as follows: if the measure is a unit atom on one pair of skew pants, the resulting measure on pounv bundles is a unit atom on the pounv bundle of each the cuffs, at the pounv described in step 3. We extend to a general measure by linearity. This produces a linear operator we will call the \hat{d} operator.

If we are given a positive integral formal sum of pants (or a multi-set of pants) we can think of it as an integral measure on the space of pants.

5. On the pounv bundle for each closed geodesic we can apply a translation of 1 + i \pi; we will call this translation \tau. We can think of \tau as a map from the union of the pounv bundles to itself.

6. Let \mu be an integral measure on pants with cuff half-lengths close to R. We can apply the \hat{d} operator described in step 4 to obtain a measure on the union of pounv bundles of all the boundary geodesics; we will call the measure \hat{d}\mu. If \hat{d}\mu and the translation of \hat{d}\mu by \tau are \epsilon/R equivalent, then we can take two oriented pants for each pair of pants in our multi-set (taking each of the two possible orientations) and then fit all of these oriented pants into an oriented surface of the type described in step 1. We use Hall’s marriage theorem as described in step 2, and a very small amount of combinatorics.

If the measure \hat{d}\mu, restricted to a given pounv bundle, is \epsilon/R equivalent to a rescaling of Lebesgue measure on that torus, then \hat{d}\mu and \tau of \hat{d}\mu are 2\epsilon/R-equivalent, which is what we wanted.


This is as far as I got in the first talk at Utah, so it would be best to stop and take a breath for a moment. We haven’t really done anything, but we’ve reformulated the problem: the type of surface we want has been well-defined, and the problem of finding this surface has been reformulated as finding a measure on pairs of pants that satisfies a given criterion.


7. A two-frame for M will comprise a tangent vector and a normal vector both at the same point, unit length and orthogonal. Given a two-frame we can rotate the tangent vector 120 degrees around the normal vector, using the right-hand rule; the orbit of this action is an ordered triple of two-frames, which will call a tripod. We can also rotate 120 degrees in the opposite direction, and obtain an anti-tripod.

8. A connected pair of two-frames is a pair of two frames along with a geodesic segment connecting them. Given \epsilon and r, with r large in terms of \epsilon, we can find a weighting function on connected two-frames such that the following properties hold whenever the weight is non-zero:

A. The length of the connecting segment is within \epsilon of r.

B. If the normal vector of one two-frame is parallel translated along the connecting segment, then it forms an angle of less then \epsilon with the normal vector of the other two-frame.

C. The angle between the the tangent vector of the two frame and (the tangent vector to) the connecting geodesic segment is exponentially small in r.


D. Given a pair of two-frames, the sum of the weights of the connecting geodesic segments is exponentially close (in r) to 1.

E. The weighting is geometrically natural, in that it depends only the length of the connecting segment, the angle between the parallel translated normal vectors, and the angles between the connecting segment and the tangent vectors.

We will describe the (relatively simple) weighting function in the end; we will use the exponential mixing of geodesic flow to obtain property D.

9. Given a tripod and an anti-tripod, we can form three pairs of two-frames by pairing the frames in order, and then we can measures (or weightings) on the connected pairs of two-frames, and then form the product measure (or weighting) by multiplying the weights of the three connections. This gives us a weighting on “connected pairs of tripods” (really a tripod and an anti-tripod) that is supported on connections that satisfy properties A, B, and C.

10. We call a perfect connection between two two-frames a geodesic segment that has a length of r, and angle of zero between the segment and the tangent vectors, and translates one normal vector to the other. If a tripod and an anti-tripod were connected by three perfect connection, then they would be a 1-dimensional retract of a flat pair of pants with three cuffs of equal length R, where R is approximately r + \log \cos \pi/6 when r is large. If the tripod and anti-tripod are connected by arcs that satisfy properties A and B, then the connected pair of tripods is still a retract of a skew pair of pants, whose cuffs have half-length within \epsilon (or 10\epsilon) of R. Thus there is a map from good connected pairs of tripods to good pairs of pants, which we will denote by \pi.

11. We can let \tilde{\mu} be the measure on connected pairs of tripods, given by integrating the weighting of steps 8 and 9 with respect to the Liouville measure on pairs of tripods (or pairs of two-frames). We then push this measure forward by \pi to obtain a measure \mu on pairs of pants; after finding a rational approximation and clearing denominators, it will be the \mu that was asked for in step 6. We will show that \hat{d}\mu (taking the original irrational \mu) is \epsilon/R-equivalent to a rescaling of Lebesgue measure on each pounv bundle and thereby complete the proof.

12. A partially connected pair of tripods T is a pair of tripods where we have connected two out of the three pairs of two-frames. To a partially connected pair of tripods we can assign a single closed geodesic \gamma that is homotopic to the concatenation (at both ends) of the two connecting segments. If we connect the third pair of two-frames and apply \pi we obtain a pair of pants P, and we can then find a pair of opposite unit normal vectors for gamma pointing to the two cuffs of P (as described in step 3). We will describe a method for predicting the pounv for \gamma and P knowing only the partially connected tripod T: First, lift T to the solid torus cover of M determined by \gamma, and then follow geodesic segments from the tangent vectors of the two unconnected two frames of (the lift of) T to the ideal boundary of this \gamma-cover. We can connect these two points in the boundary by two geodesics, each of which goes about half-way around this solid torus cover. We can then find the common orthogonals from each of these geodesics to (the lift of) \gamma, and then obtain two normal vectors to \gamma pointing along these common orthogonals; it is easy to verify that these are half-way along \gamma from each other (in the complex sense) and hence form a pounv. Property C of the connections between two-frames (and hence tripods) implies that this predicted pounv will be exponentially close (in r) to the actually pounv of any pair of pants P.

To summarize: given a good connected pair of tripods, we get a good pair of pants P, and taking one cuff gamma of P, we get a pounv for \gamma as described in step 3. But we only need two out of the three connecting segments to get \gamma, and using the third pair of two frames, without even knowing the third connecting segment, we can predict the pounv for \gamma and P to very high accuracy.

13. We can then define the \tilde{d} operator from measures on partially connected pairs of tripods to measures on the pounv bundles for the associated geodesics; this operator is just the linear extension of the operation in step 12. Given a connected pair of tripods, we can get three partially connected pairs of tripods in the obvious way; we can thereby extend \tilde{d} to map measures on connected pairs of tripods to measures on the bundles of pounv’s; because the predicted pounv described in step 12 is exponentially close to the actual pounv described in step 3, the two measures \tilde{d} \tilde{\mu} and \hat{d}\mu are \exp(-\alpha r)-equivalent, by the B => A of step 2.

14. For each closed geodesic \gamma, we can lift all the partially connected tripods that give \gamma to the \gamma cover of M described in step 12. There is a natural torus action on the normal bundle of \gamma, and this extends to an action on all of the solid torus cover associated to \gamma. Moreover, it acts on the (lifts of) partially connected tripods, and it does not change the weightings of the two established connecting segments, because of property E of the weighting function.

This is the crucial point: the effective weighting on a partially connected pair of tripods is not just the product of the weights of the two established connections, but that product times the sum of the weights of all possible third connections. By property D of the weighting function, this sum, while not constant, is exponentially close to being constant, so the effective weighting is exponentially close to being invariant under the torus action. Because the predicted pounv for a partially connected pair of tripods is equivariant for the torus action, the measure \tilde{d} \tilde{\mu} is exponentially close to a torus invariant measure on the pounv bundle (which is necessary a rescaling of Lebesgue measure), in the sense that the Radon-Nikodym derivative is exponentially close to 1. It is then an easy lemma that the two measures are exponentially close in the sense of step 2. And then we’re finished: \hat{d}\mu is exponentially close to \tilde{d} \tilde{\mu}, which is exponentially close to a rescaling of Lebesgue measure, which is what we wanted (with
overkill) in step 6.

15. It remains only to define the weighting function described in step 8, which is surprisingly simple: We take some left-invariant metric on \text{PSL}_2(\bf{C}), and hence on the two-frame bundle for M and its universal cover. Given a connected pair of two-frames in M, we lift to the universal cover, to obtain two two-frames v and w. We then flow v and w forward by the frame flow for time r/4 to obtain v' and w'. We let V be the \epsilon neighborhood of v', and W be the \epsilon neighborhood of w', with the tangent vector of w' replaced by its negation. Then the weighting of the connection is the volume of the intersection of W with the image of V under the frame flow for time r/2.

Properties A, B, and C are not difficult to verify. Property D follows immediately from exponential mixing: If we have v and w downstairs without any connection, and similarly define v', w', V and W, then the sum of the weights of the possible connections will just be the volume of the intersection of the downstairs W with the frame flow of V. By exponential mixing, this converges at the rate \exp(-\alpha r) to the square of the volume of an \epsilon neighborhood, divided by the volume of M.

We can normalize the weights by dividing by this constant.



I will try to add comments as they occur to me.


One obvious comment to make is that the argument is remarkably short, and does not depend on any very delicate or complicated analytic estimates (maybe the argument that the glued up surfaces are quasi-geodesic is the most delicate part). It is fair to say that it defies the conventional wisdom in that respect — I was personally very surprised that the general method could be made to work, especially in light of the failure of Bowen’s program. Kudos to Jeremy and Vlad for their boldness and ingenuity.

Another comment to make is that the matching argument is surprisingly robust and general, and I expect it to have many broader applications. One thing I was confused about in my last post seems to be resolved by Jeremy’s sketch above — if I understand it correctly, one first (almost) pairs continuous measures, and only then approximates them by discrete integral measures (with a little bit of combinatorics at the end). And one really does need exponential mixing rather than just mixing.

Incidentally, apropos the matching argument, there are some interesting and well-known variations where things go haywire. For example, papers by Burago-Kleiner and (Curt) McMullen show that there are examples of separated nets in Euclidean space which are not bilipschitz to a lattice (though, interestingly, Curt shows that they are Holder equivalent). No such examples exist in hyperbolic space, because of — nonamenability and Hall’s marriage theorem! Roughly, when trying to match up points in two nets in hyperbolic space, one doesn’t need to look very far because the number of options grows exponentially. This is one reason why Kahn-Markovic need to control the matchings of their measures carefully, because it must be done on a very small scale (where the exponential growth does not kick in).

I thought I would also mention that in case my previous comments lead one to believe otherwise, exponential mixing of the geodesic flow on a hyperbolic manifold is somewhat delicate. Exponential mixing under a flow g_t on a space X preserving a probability measure \mu means that for all (sufficiently nice) functions f and h on X, the correlations \rho(h,f,t):= \int_X h(x)f(g_tx) d\mu - \int_X h(x) d\mu \int_X f(x) d\mu are bounded in absolute value by an expression of the form C_1e^{-tC_2} for suitable constants C_1,C_2 (which might depend on the analytic quality of f and h). For example, one takes X to be the unit tangent bundle of a hyperbolic manifold, and g_t the geodesic flow (i.e. the flow which pushes vectors along the geodesics they are tangent to, at constant speed). Exponential mixing should be contrasted with the much slower mixing of the horocycle flow on a hyperbolic surface, for which the correlation is bounded by an expression like C_1(\log t)^{C_2}t^{-1}. The geodesic flow on a hyperbolic manifold is an example of what is called an Anosov flow; i.e. the tangent bundle TM splits equivariantly under the flow into three subbundles E^0, E^s, E^u where E^0 is 1-dimensional and tangent to the flow, E^s is contracted uniformly exponentially by the flow, and E^u is expanded uniformly exponentially by the flow. The best one knows for (certain) Anosov flows (by Chernov) is that the flow is stretched exponentially mixing, i.e. with an estimate of the form C_1e^{-\sqrt{t}C_2}. One knows exponential mixing for the geodesic flow on variable negative curvature surfaces by Dolgopyat, and on certain locally symmetric spaces, using representation theory. See Pollicott’s lecture notes here for more details. I don’t know if exponential mixing for geodesic flows is known on manifolds of variable negative curvature in high dimensions. Also I’d appreciate it if any reader who knows some ergodic theory can confirm/deny/clarify this paragraph . . .

(Update 8/12): Jeremy tells me that he and Vladimir only need “sufficiently high degree polynomial” mixing, so perhaps there is a decent chance the methods can be extended to variable negative curvature.

(Update 10/29): The paper is now available from the arXiv.

I just learned from Jesse Johnson’s blog that Vlad Markovic and Jeremy Kahn have announced a proof of the surface subgroup conjecture, that every complete hyperbolic 3-manifold M contains a closed \pi_1-injective surface. Equivalently, \pi_1(M) contains a closed surface subgroup. Apparently, Jeremy made the announcement at an FRG conference in Utah. This answers a long-standing question in 3-manifold topology, which is a variation on some problems originally posed by Waldhausen. If one further knew that hyperbolic 3-manifold groups were LERF, one would be able to deduce that all hyperbolic 3-manifolds are virtually Haken, and (by a recent theorem of Agol), virtually fibered. Dani Wise (and others) have programs to show that hyperbolic 3-manifold groups are LERF; if successful, this would therefore resolve some of the most important outstanding problems in 3-manifold topology (in fact, I would say: the most important outstanding problems, by a substantial margin).

In fact, the argument appears to work for hyperbolic manifolds of every dimension \ge 3, and possibly more generally still. Details on the argument of Markovic-Kahn are scarce (Vlad informs me that they expect to have a preprint in a few weeks) but the sketch of the argument presented by Kahn is compelling. Roughly speaking, the argument (as summarized by Ian Agol in a comment at Jesse’s blog) takes the following form:

  1. Given M, for a sufficiently big constant R, one can find “many” immersed, almost totally-geodesic pairs of pants (i.e. thrice-punctured spheres) with geodesic boundary components (i.e. “cuffs”) of length very close to 2R. In fact, one can further insist that the complex length of the boundary geodesic is very close to 2R (i.e. holonomy transport around this geodesic does not rotate the normal bundle very much).
  2. Conversely, given any geodesic of complex length very close to 2R, one can find many such pairs of pants that it bounds, and moreover one can find them so that the normal to the geodesic pointing in to the surface is prescribed.
  3. If one takes a sufficiently big collection of such geodesic pairs of pants, one has enough of them in oppositely-aligned pairs along each boundary component, that they can be matched up (by some version of Hall’s marriage theorem), and furthermore, matched up with a definite prescribed “twist” along the boundary components
  4. One checks that the resulting (closed) surface is sufficiently close to totally geodesic that the ambient negative curvature certifies it is \pi_1-injective

Many aspects of this argument have a lot in common with some previous attempts on the surface subgroup conjecture, including one recent approach by Bowen (note: Bowen’s approach is known to have some fatal difficulties; the “twist” in 3. above specifically addresses some of them). All of these points deserve some comments.

First, where do the pairs of pants come from? If P is a totally geodesic pair of pants with boundary components of length close to 2R, the pants P retract onto a geodesic spine, i.e. an immersed totally geodesic theta graph, whose edges all have length close to 2R, and which meet at angles very close to 120 degrees. One can cut this spine up into two pieces, which are obtained by exponentiating the edges of an infinitesimal (almost)-planar tripod for length R.

Given a tripod T in some plane in the tangent space at some point of M, one can exponentiate the edges for length R to construct such a half-spine; if T and T' are a pair of tripods for which the exponentiated endpoints nearly match up, with almost opposite tangent vectors, then the resulting half-spines can be glued up to make a spine, and thickened to make a pair of pants. One key idea is to use the exponential mixing property of the geodesic flow on a hyperbolic manifold, e.g. as proved by Pollicott. Given some tolerance \epsilon, once R is sufficiently large, the mixing result shows that the set of such pairs of tripods for which such a matching occurs have a definite density in the space of all pairs (and in fact, are more and more equidistributed in this space, in probability). In fact, one may even insist that two of the pairs of prongs join up to make some specific closed geodesic of length almost 2R, and vary the pair of third prongs a very small amount so that they glue up. This takes care of the first two points; this seems quite uncontroversial (exponential mixing comes in, I suspect, to know that one doesn’t need to wiggle the pair of third prongs much, having paired the first two pairs).

The matching (i.e. the gluing up of opposite pant cuffs) apparently is done by some variant of Hall’s marriage theorem. One needs to know (I think) that for any finite set of cuffs to be glued, the set of other cuffs that they could potentially be glued to is at least as big in cardinality. This probably needs some thought, but it is plausibly true: given a cuff, it can be glued to any cuff which is almost oppositely aligned to it, and since there is some tolerance in the angle of gluing — this is where dimension at least 3 is necessary — and moreover, since oriented cuffs are almost equidistributed, one can always find “more” cuffs that are opposite, up to a bit of tolerance, to any given subset of cuffs (of course, more details are necessary here). There is an extra wrinkle to the argument, which is that the gluing must be done with a “twist” of a definite amount, so that cuffs are not glued up in such a way that the perpendicular geodesic arcs joining pairs of cuffs match up.

(Update 8/8: I think there must necessarily be more details to the matching argument, as very loosely described above. There are at least two additional issues that must be dealt with in order to perform a matching: a parity issue (since each pants has an odd number of cuffs) and a homology issue (if the argument relativizes, so that one fixes some collection of cuffs in advance and glues up everything else, one concludes a posteriori that the union of the unglued cuffs is homologically inessential). Probably the parity issue (and more subtle divisibility issues) can be solved by gluing with real-valued weights, then approximating a real solution by a rational solution, and multiplying through to clear denominators. Maybe the homology issue does not arise, if in fact the argument doesn’t relativize.) Both these issues suggest that one does not specify in advance a collection of pants to be glued up, but rather wants to glue up a definite number of pants from some subset.)

This issue of a twist is important for the 4th point, which is perhaps the most delicate. In order to know that the resulting surface is \pi_1-injective, one must use geometry. A closed (immersed) surface in a hyperbolic manifold which is (locally) very close to being totally geodesic is \pi_1-injective. One way to see this is to observe that a geodesic loop in the surface is almost geodesic in the manifold; the ambient negative curvature means that the geodesic can be shrunk (by the negative of the gradient of length in the space of loops) to become geodesic in the ambient manifold; if it is close to being geodesic at the start, it very quickly becomes totally geodesic, without getting much shorter. Any closed geodesic in a hyperbolic manifold is essential.

If one builds a surface by gluing up almost totally geodesic pieces in such a way that there is almost no angle along the gluing, the resulting surface is almost geodesic, and therefore injective. However, one must be very careful to control the geometry of the pieces that are glued, and this is hard to do if the injectivity radius is very small. A geodesic pair of pants has area 2\pi no matter how long its boundary components are. So if the boundary components have length 2R, then at the points where they are thinnest, they are only e^{-R} across. If cuffs are glued where the pants are thinnest, even if the gluing angle is very small, the surfaces themselves might twist through a big angle in a very short time. So one needs to make sure that the thinnest part of one pants are glued up to a thicker part of the next, which is glued to a thicker part of the next . . . and so on. This is the point of introducing the twist before gluing: the twists accumulate, and before one has glued R pieces together, one has entered the thick part of some pants, where the injectivity radius is bounded below by some universal constant.

Anyway, this seems like a really spectacular development, with an excellent chance of working out. Some of the ingredients — e.g. the exponential mixing of the geodesic flow — work just as well in variable negative curvature. In fact, some version of it should work for arbitrary hyperbolic groups (using Mineyev’s flow space). Without knowing more details of the argument, one can’t say how delicate the last part of the argument is, and how far it generalizes (but readers are invited to speculate . . .)

The purpose of this post is to discuss my recent paper with Koji Fujiwara, which will shortly appear in Ergodic Theory and Dynamical Systems, both for its own sake, and in order to motivate some open questions that I find very intriguing. The content of the paper is a mixture of ergodic theory, geometric group theory, and computer science, and was partly inspired by a paper of Jean-Claude Picaud. To state the results of the paper, I must first introduce a few definitions and some background.

Let \Gamma be a finite directed graph (hereafter a digraph) with an initial vertex, and edges labeled by elements of a finite set S in such a way that each vertex has at most one outgoing edge with any given label. A finite directed path in \Gamma starting at the initial vertex determines a word in the alphabet S, by reading the labels on the edges traversed (in order). The set L \subset S^* of words obtained in this way is an example of what is called a regular language, and is said to be parameterized by \Gamma. Note that this is not the most general kind of regular language; in particular, any language L of this kind will necessarily be prefix-closed (i.e. if w \in L then every prefix of w is also in L). Note also that different digraphs might parameterize the same (prefix-closed) regular language L.

If S is a set of generators for a group G, there is an obvious map L \to G called the evaluation map that takes a word w to the element of G represented by that word.

Definition: Let G be a group, and S a finite generating set. A combing of G is a (prefix-closed) regular language L for which the evaluation map L \to G is a bijection, and such that every w \in L represents a geodesic in G.

The intuition behind this definition is that the set of words in L determines a directed spanning tree in the Cayley graph C_S(G) starting at \text{id}, and such that every directed path in the tree is a geodesic in C_S(G). Note that there are other definitions of combing in the literature; for example, some authors do not require the evaluation map to be a bijection, but only a coarse bijection.

Fundamental to the theory of combings is the following Theorem, which paraphrases one of the main results of this paper:

Theorem: (Cannon) Let G be a hyperbolic group, and let S be a finite generating set. Choose a total order on the elements of S. Then the language L of lexicographically first geodesics in G is a combing.

The language L described in this theorem is obviously geodesic and prefix-closed, and the evaluation map is bijective; the content of the theorem is that L is regular, and parameterized by some finite digraph \Gamma. In the sequel, we restrict attention exclusively to hyperbolic groups G.

Given a (hyperbolic) group G, a generating set S, a combing L, one makes the following definition:

Definition: A function \phi:G \to \mathbb{Z} is weakly combable (with respect to S,L) if there is a digraph \Gamma parameterizing L and a function d\phi from the vertices of \Gamma to \mathbb{Z} so that for any w \in L, corresponding to a path \gamma in \Gamma, there is an equality \phi(w) = \sum_i d\phi(\gamma(i)).

In other words, a function \phi is weakly combable if it can be obtained by “integrating” a function d\phi along the paths of a combing. One furthermore says that a function is combable if it changes by a bounded amount under right-multiplication by an element of S, and bicombable if it changes by a bounded amount under either left or right multiplication by an element of S. The property of being (bi-)combable does not depend on the choice of a generating set S or a combing L.

Example: Word length (with respect to a given generating set S) is bicombable.

Example: Let \phi:G \to \mathbb{Z} be a homomorphism. Then \phi is bicombable.

Example: The Brooks counting quasimorphisms (on a free group) and the Epstein-Fujiwara counting quasimorphisms are bicombable.

Example: The sum or difference of two (bi-)combable functions is (bi-)combable.

A particularly interesting example is the following:

Example: Let S be a finite set which generates G as a semigroup. Let \phi_S denote word length with respect to S, and \phi_{S^{-1}} denote word length with respect to S^{-1} (which also generates G as a semigroup). Then the difference \psi_S:= \phi_S - \phi_{S^{-1}} is a bicombable quasimorphism.

The main theorem proved in the paper concerns the statistical distribution of values of a bicombable function.

Theorem: Let G be a hyperbolic group, and let \phi be a bicombable function on G. Let \overline{\phi}_n be the value of \phi on a random word in G of length n (with respect to a certain measure \widehat{\nu} depending on a choice of generating set). Then there are algebraic numbers E and \sigma so that as distributions, n^{-1/2}(\overline{\phi}_n - nE) converges to a normal distribution with standard deviation \sigma.

One interesting corollary concerns the length of typical words in one generating set versus another. The first thing that every geometric group theorist learns is that if S_1, S_2 are two finite generating sets for a group G, then there is a constant K so that every word of length n in one generating set has length at most nK and at least n/K in the other generating set. If one considers an example like \mathbb{Z}^2, one sees that this is the best possible estimate, even statistically. However, if one restricts attention to a hyperbolic group G, then one can do much better for typical words:

Corollary: Let G be hyperbolic, and let S_1,S_2 be two finite generating sets. There is an algebraic number \lambda_{1,2} so that almost all words of length n with respect to the S_1 generating set have length almost equal to n\lambda_{1,2} with respect to the S_2 generating set, with error of size O(\sqrt{n}).

Let me indicate very briefly how the proof of the theorem goes.

Sketch of Proof: Let \phi be bicombable, and let d\phi be a function from the vertices of \Gamma to \mathbb{Z}, where \Gamma is a digraph parameterizing L. There is a bijection between the set of elements in G of word length n and the set of directed paths in \Gamma of length n that start at the initial vertex. So to understand the distribution of \phi, we need to understand the behaviour of a typical long path in \Gamma.

Define a component of \Gamma to be a maximal subgraph with the property that there is a directed path (in the component) from any vertex to any other vertex. One can define a new digraph C(\Gamma) without loops, with one vertex for each component of \Gamma, in an obvious way. Each component C determines an adjacency matrix M_C, with ij-entry equal to 1 if there is a directed edge from vertex i to vertex j, and equal to 0 otherwise. A component C is big if the biggest real eigenvalue \lambda of M_C is at least as big as the biggest real eigenvalue of the matrices associated to every other component. A random long walk in \Gamma will spend most of its time entirely in big components, so these are the only components we need to consider to understand the statistical distribution of \phi.

A theorem of Coornaert implies that there are no big components of C(\Gamma) in series; i.e. there are no directed paths in C(\Gamma) from one big component to another (one also says that the big components do not communicate). This means that a typical long walk in \Gamma is entirely contained in a single big component, except for a (relatively short) path at the start and the end of the walk. So the distribution of \phi gets independent contributions, one from each big component.

The contribution from an individual big component is not hard to understand: the central limit theorem for stationary Markov chains says that for elements of G corresponding to paths that spend almost all their time in a given big component C there is a central limit theorem  n^{-1/2}(\overline{\phi}_n - nE_C) \to N(0,\sigma_C) where the mean E_C and standard deviation \sigma_C depend only on C. The problem is to show that the means and standard deviations associated to different big components are the same. Everything up to this point only depends on weak combability of \phi; to finish the proof one must use bicombability.

It is not hard to show that if \gamma is a typical infinite walk in a component C, then the subpaths of \gamma of length n are distributed like random walks of length n in C. What this means is that the mean and standard deviation E_C,\sigma_C associated to a big component C can be recovered from the distribution of \phi on a single infinite “typical” path in C. Such an infinite path corresponds to an infinite geodesic in G, converging to a definite point in the Gromov boundary \partial G. Another theorem of Coornaert (from the same paper) says that the action of G on its boundary \partial G is ergodic with respect to a certain natural measure called a Patterson-Sullivan measure (see Coornaert’s paper for details). This means that there are typical infinite geodesics \gamma,\gamma' associated to components C and C' for which some g \in G takes \gamma to a geodesic g\gamma ending at the same point in \partial G as \gamma'. Bicombability implies that the values of \phi on \gamma and g\gamma differ by a bounded amount. Moreover, since g\gamma and \gamma' are asymptotic to the same point at infinity, combability implies that the values of \phi on g\gamma and \gamma' also differ by a bounded amount. This is enough to deduce that E_C = E_{C'} and \sigma_C = \sigma_{C'}, and one obtains a (global) central limit theorem for \phi on G. qed.

This obviously raises several questions, some of which seem very hard, including:

Question 1: Let \phi be an arbitrary quasimorphism on a hyperbolic group G (even the case G is free is interesting). Does \phi satisfy a central limit theorem?

Question 2: Let \phi be an arbitrary quasimorphism on a hyperbolic group G. Does \phi satisfy a central limit theorem with respect to a random walk on G? (i.e. one considers the distribution of values of \phi not on the set of elements of G of word length n, but on the set of elements obtained by a random walk on G of length n, and lets n go to infinity)

All bicombable quasimorphisms satisfy an important property which is essential to our proof of the central limit theorem: they are local, which is to say, they are defined as a sum of local contributions. In the continuous world, they are the analogue of the so-called de Rham quasimorphisms on \pi_1(M) where M is a closed negatively curved Riemannian manifold; such quasimorphisms are defined by choosing a 1-form \alpha, and defining \phi_\alpha(g) to be equal to the integral \int_{\gamma_g} \alpha, where \gamma_g is the closed oriented based geodesic in M in the homotopy class of g. De Rham quasimorphisms, being local, also satisfy a central limit theorem.

This locality manifests itself in another way, in terms of defects. Let \phi be a quasimorphism on a hyperbolic group G. Recall that the defect D(\phi) is the supremum of |\phi(gh) - \phi(g) -\phi(h)| over all pairs of elements g,h \in G. A quasimorphism is further said to be homogeneous if \phi(g^n) = n\phi(g) for all integers n. If \phi is an arbitrary quasimorphism, one may homogenize it by taking a limit \psi(g) = \lim_{n \to \infty} \phi(g^n)/n; one says that \psi is the homogenization of \phi in this case. Homogenization typically does not preserve defects; however, there is an inequality D(\psi) \le 2D(\phi). If \phi is local, one expects this inequality to be an equality. For, in a hyperbolic group, the contribution to the defect of a local quasimorphism all arises from the interaction of the suffix of (a geodesic word representing the element) g with the prefix of h (with notation as above). When one homogenizes, one picks up another contribution to the defect from the interaction of the prefix of g with the suffix of h; since these two contributions are essentially independent, one expects that homogenizing a local quasimorphism should exactly double the defect. This is the case for bicombable and de Rham quasimorphisms, and can perhaps be used to define locality for a quasimorphism on an arbitrary group.

This discussion provokes the following key question:

Question 3: Let G be a group, and let \psi be a homogeneous quasimorphism. Is there a quasimorphism \phi with homogenization \psi, satisfying D(\psi) = 2D(\phi)?

Example: The answer to question 3 is “yes” if \psi is the rotation quasimorphism associated to an action of G on S^1 by orientation-preserving homeomorphisms (this is nontrivial; see Proposition 4.70 from my monograph).

Example: Let C be any homologically trivial group 1-boundary. Then there is some extremal homogeneous quasimorphism \psi for C (i.e. a quasimorphism achieving equality \text{scl}(C) = \psi(C)/2D(\psi) under generalized Bavard duality; see this post) for which there is \phi with homogenization \psi satisfying D(\psi) = 2D(\phi). Consequently, if every point in the boundary of the unit ball in the \text{scl} norm is contained in a unique supporting hyperplane, the answer to question 3 is “yes” for any quasimorphism on G.

Any quasimorphism on G can be pulled back to a quasimorphism on a free group, but this does not seem to make anything easier. In particular, question 3 is completely open (as far as I know) when G is a free group. An interesting test case might be the homogenization of an infinite sum of Brooks functions \sum_w h_w for some infinite non-nested family of words \lbrace w \rbrace.  

If the answer to this question is false, and one can find a homogeneous quasimorphism \psi which is not the homogenization of any “local” quasimorphism, then perhaps \psi does not satisfy a central limit theorem. One can try to approach this problem from the other direction:

Question 4: Given a function f defined on the ball of radius n in a free group F, one defines the defect D(f) in the usual way, restricted to pairs of elements g,h for which g,h,gh are all of length at most n. Under what conditions can f be extended to a function on the ball of radius n+1 without increasing the defect?

If one had a good procedure for building a quasimorphism “by hand” (so to speak), one could try to build a quasimorphism that failed to satisfy a central limit theorem, or perhaps find reasons why this was impossible.


Get every new post delivered to your Inbox.

Join 175 other followers