The purpose of this post is to discuss my recent paper with Koji Fujiwara, which will shortly appear in Ergodic Theory and Dynamical Systems, both for its own sake, and in order to motivate some open questions that I find very intriguing. The content of the paper is a mixture of ergodic theory, geometric group theory, and computer science, and was partly inspired by a paper of Jean-Claude Picaud. To state the results of the paper, I must first introduce a few definitions and some background.
Let
be a finite directed graph (hereafter a digraph) with an initial vertex, and edges labeled by elements of a finite set
in such a way that each vertex has at most one outgoing edge with any given label. A finite directed path in
starting at the initial vertex determines a word in the alphabet
, by reading the labels on the edges traversed (in order). The set
of words obtained in this way is an example of what is called a regular language, and is said to be parameterized by
. Note that this is not the most general kind of regular language; in particular, any language
of this kind will necessarily be prefix-closed (i.e. if
then every prefix of
is also in
). Note also that different digraphs might parameterize the same (prefix-closed) regular language
.
If
is a set of generators for a group
, there is an obvious map
called the evaluation map that takes a word
to the element of
represented by that word.
Definition: Let
be a group, and
a finite generating set. A combing of
is a (prefix-closed) regular language
for which the evaluation map
is a bijection, and such that every
represents a geodesic in
.
The intuition behind this definition is that the set of words in
determines a directed spanning tree in the Cayley graph
starting at
, and such that every directed path in the tree is a geodesic in
. Note that there are other definitions of combing in the literature; for example, some authors do not require the evaluation map to be a bijection, but only a coarse bijection.
Fundamental to the theory of combings is the following Theorem, which paraphrases one of the main results of this paper:
Theorem: (Cannon) Let
be a hyperbolic group, and let
be a finite generating set. Choose a total order on the elements of
. Then the language
of lexicographically first geodesics in
is a combing.
The language
described in this theorem is obviously geodesic and prefix-closed, and the evaluation map is bijective; the content of the theorem is that
is regular, and parameterized by some finite digraph
. In the sequel, we restrict attention exclusively to hyperbolic groups
.
Given a (hyperbolic) group
, a generating set
, a combing
, one makes the following definition:
Definition: A function
is weakly combable (with respect to
) if there is a digraph
parameterizing
and a function
from the vertices of
to
so that for any
, corresponding to a path
in
, there is an equality
.
In other words, a function
is weakly combable if it can be obtained by “integrating” a function
along the paths of a combing. One furthermore says that a function is combable if it changes by a bounded amount under right-multiplication by an element of
, and bicombable if it changes by a bounded amount under either left or right multiplication by an element of
. The property of being (bi-)combable does not depend on the choice of a generating set
or a combing
.
Example: Word length (with respect to a given generating set
) is bicombable.
Example: Let
be a homomorphism. Then
is bicombable.
Example: The Brooks counting quasimorphisms (on a free group) and the Epstein-Fujiwara counting quasimorphisms are bicombable.
Example: The sum or difference of two (bi-)combable functions is (bi-)combable.
A particularly interesting example is the following:
Example: Let
be a finite set which generates
as a semigroup. Let
denote word length with respect to
, and
denote word length with respect to
(which also generates
as a semigroup). Then the difference
is a bicombable quasimorphism.
The main theorem proved in the paper concerns the statistical distribution of values of a bicombable function.
Theorem: Let
be a hyperbolic group, and let
be a bicombable function on
. Let
be the value of
on a random word in
of length
(with respect to a certain measure
depending on a choice of generating set). Then there are algebraic numbers
and
so that as distributions,
converges to a normal distribution with standard deviation
.
One interesting corollary concerns the length of typical words in one generating set versus another. The first thing that every geometric group theorist learns is that if
are two finite generating sets for a group
, then there is a constant
so that every word of length
in one generating set has length at most
and at least
in the other generating set. If one considers an example like
, one sees that this is the best possible estimate, even statistically. However, if one restricts attention to a hyperbolic group
, then one can do much better for typical words:
Corollary: Let
be hyperbolic, and let
be two finite generating sets. There is an algebraic number
so that almost all words of length
with respect to the
generating set have length almost equal to
with respect to the
generating set, with error of size
.
Let me indicate very briefly how the proof of the theorem goes.
Sketch of Proof: Let
be bicombable, and let
be a function from the vertices of
to
, where
is a digraph parameterizing
. There is a bijection between the set of elements in
of word length
and the set of directed paths in
of length
that start at the initial vertex. So to understand the distribution of
, we need to understand the behaviour of a typical long path in
.
Define a component of
to be a maximal subgraph with the property that there is a directed path (in the component) from any vertex to any other vertex. One can define a new digraph
without loops, with one vertex for each component of
, in an obvious way. Each component
determines an adjacency matrix
, with
-entry equal to
if there is a directed edge from vertex
to vertex
, and equal to
otherwise. A component
is big if the biggest real eigenvalue
of
is at least as big as the biggest real eigenvalue of the matrices associated to every other component. A random long walk in
will spend most of its time entirely in big components, so these are the only components we need to consider to understand the statistical distribution of
.
A theorem of Coornaert implies that there are no big components of
in series; i.e. there are no directed paths in
from one big component to another (one also says that the big components do not communicate). This means that a typical long walk in
is entirely contained in a single big component, except for a (relatively short) path at the start and the end of the walk. So the distribution of
gets independent contributions, one from each big component.
The contribution from an individual big component is not hard to understand: the central limit theorem for stationary Markov chains says that for elements of
corresponding to paths that spend almost all their time in a given big component
there is a central limit theorem
where the mean
and standard deviation
depend only on
. The problem is to show that the means and standard deviations associated to different big components are the same. Everything up to this point only depends on weak combability of
; to finish the proof one must use bicombability.
It is not hard to show that if
is a typical infinite walk in a component
, then the subpaths of
of length
are distributed like random walks of length
in
. What this means is that the mean and standard deviation
associated to a big component
can be recovered from the distribution of
on a single infinite “typical” path in
. Such an infinite path corresponds to an infinite geodesic in
, converging to a definite point in the Gromov boundary
. Another theorem of Coornaert (from the same paper) says that the action of
on its boundary
is ergodic with respect to a certain natural measure called a Patterson-Sullivan measure (see Coornaert’s paper for details). This means that there are typical infinite geodesics
associated to components
and
for which some
takes
to a geodesic
ending at the same point in
as
. Bicombability implies that the values of
on
and
differ by a bounded amount. Moreover, since
and
are asymptotic to the same point at infinity, combability implies that the values of
on
and
also differ by a bounded amount. This is enough to deduce that
and
, and one obtains a (global) central limit theorem for
on
. qed.
This obviously raises several questions, some of which seem very hard, including:
Question 1: Let
be an arbitrary quasimorphism on a hyperbolic group
(even the case
is free is interesting). Does
satisfy a central limit theorem?
Question 2: Let
be an arbitrary quasimorphism on a hyperbolic group
. Does
satisfy a central limit theorem with respect to a random walk on
? (i.e. one considers the distribution of values of
not on the set of elements of
of word length
, but on the set of elements obtained by a random walk on
of length
, and lets
go to infinity)
All bicombable quasimorphisms satisfy an important property which is essential to our proof of the central limit theorem: they are local, which is to say, they are defined as a sum of local contributions. In the continuous world, they are the analogue of the so-called de Rham quasimorphisms on
where
is a closed negatively curved Riemannian manifold; such quasimorphisms are defined by choosing a
-form
, and defining
to be equal to the integral
, where
is the closed oriented based geodesic in
in the homotopy class of
. De Rham quasimorphisms, being local, also satisfy a central limit theorem.
This locality manifests itself in another way, in terms of defects. Let
be a quasimorphism on a hyperbolic group
. Recall that the defect
is the supremum of
over all pairs of elements
. A quasimorphism is further said to be homogeneous if
for all integers
. If
is an arbitrary quasimorphism, one may homogenize it by taking a limit
; one says that
is the homogenization of
in this case. Homogenization typically does not preserve defects; however, there is an inequality
. If
is local, one expects this inequality to be an equality. For, in a hyperbolic group, the contribution to the defect of a local quasimorphism all arises from the interaction of the suffix of (a geodesic word representing the element)
with the prefix of
(with notation as above). When one homogenizes, one picks up another contribution to the defect from the interaction of the prefix of
with the suffix of
; since these two contributions are essentially independent, one expects that homogenizing a local quasimorphism should exactly double the defect. This is the case for bicombable and de Rham quasimorphisms, and can perhaps be used to define locality for a quasimorphism on an arbitrary group.
This discussion provokes the following key question:
Question 3: Let
be a group, and let
be a homogeneous quasimorphism. Is there a quasimorphism
with homogenization
, satisfying
?
Example: The answer to question 3 is “yes” if
is the rotation quasimorphism associated to an action of
on
by orientation-preserving homeomorphisms (this is nontrivial; see Proposition 4.70 from my monograph).
Example: Let
be any homologically trivial group
-boundary. Then there is some extremal homogeneous quasimorphism
for
(i.e. a quasimorphism achieving equality
under generalized Bavard duality; see this post) for which there is
with homogenization
satisfying
. Consequently, if every point in the boundary of the unit ball in the
norm is contained in a unique supporting hyperplane, the answer to question 3 is “yes” for any quasimorphism on
.
Any quasimorphism on
can be pulled back to a quasimorphism on a free group, but this does not seem to make anything easier. In particular, question 3 is completely open (as far as I know) when
is a free group. An interesting test case might be the homogenization of an infinite sum of Brooks functions
for some infinite non-nested family of words
.
If the answer to this question is false, and one can find a homogeneous quasimorphism
which is not the homogenization of any “local” quasimorphism, then perhaps
does not satisfy a central limit theorem. One can try to approach this problem from the other direction:
Question 4: Given a function
defined on the ball of radius
in a free group
, one defines the defect
in the usual way, restricted to pairs of elements
for which
are all of length at most
. Under what conditions can
be extended to a function on the ball of radius
without increasing the defect?
If one had a good procedure for building a quasimorphism “by hand” (so to speak), one could try to build a quasimorphism that failed to satisfy a central limit theorem, or perhaps find reasons why this was impossible.
The (strengthened) Hanna Neumann Conjecture
A few days ago, Joel Friedman posted a paper on the arXiv purporting to give a proof of the (strengthened) Hanna Neumann conjecture, a well-known problem in geometric group theory.
Simply stated, the problem is as follows.
Conjecture (Hanna Neumann): Let
be a free group, and let
and
be finitely generated subgroups. For a subgroup
of
, let
. Then there is an inequality
.
This conjecture was further strengthened by Walter Neumann (her son):
Conjecture (strengthened Hanna Neumann): With notation above, there is an inequality
where the sum is taken over
, i.e. the double coset representatives.
Notice by the way that since any free group embeds into
, the free group of rank
, one can assume that
has rank
above. This fact is implicit in the discussion below.
Friedman’s paper seems to be very carefully written, and contains some new ideas (which I do not yet really understand), namely an approach using sheaf theory. But in this post I want to restrict myself to some simple (and probably well-known) geometric observations.
The first step is to reduce the problem to a completely graph-theoretic one, following Stallings; in fact, Benson Farb tells me that he thinks this reduction was known to Stallings, or at least to Dicks/Formanek (and in any case is very close to some ideas Stallings and Gersten introduced to study the problem; more on that in a later post). Friedman makes the following definition:
Definition: Let
be a finite group and
be two elements (that do not necessarily generate
). The directed Cayley graph
is the graph with vertex set
and with a directed edge from
to
labeled
for each
and
.
In other words,
is a graph whose edges are oriented and labeled with either
or
in such a way that each vertex has at most one outgoing and one incoming edge with each label, and such that there is a transitive (on the vertices) free action of a group
on
. (Note: for some reason, Friedman wants his group to act on the right, and therefore has directed edges from
to
, but this is just a matter of convention).
For any finite graph
, not necessarily connected, let
; i.e.
where the sum is taken over the connected components
of
. Friedman shows (but this reduction is well-known) that the SHNC is equivalent to the following graph-theoretic inequality:
Theorem: The SHNC is equivalent to the following statement. For any graph
as above, and any two subgraphs
we have
.
The purpose of this blog entry is to show that there is a very simple proof of this inequality when
is replaced with
. This is not such a strange thing to do, since
and
are equal for graphs without acyclic components (i.e. without components that are trees), and for “random” graphs
one does not expect the difference between
and
to be very big. The argument proceeds as follows. Suppose
has
vertices and
edges of kind
respectively, and define
similarly for
. Then
On the other hand, since Euler characteristic is local, we just need to count how many vertices and edges of each kind turn up in each
. But this is easy: every vertex of
is equal to exactly one translate of every vertex of
, and similarly for edges of each kind. Hence
So the inequality one wants to show is
which simplifies to
On the other hand, each graph
has at most two edges at any vertex with either label, and therefore we have inequalities
. Subject to these constraints, the inequality above is straightforward to prove. To see this, first fix some non-negative values of
and let
be the four-dimensional cube of possible values of
. Since both sides of the inequality are linear as a function of each
or
, if the inequality is violated at any point in
one may draw a straight line in
corresponding to varying one of the co-ordinates (e.g.
) while keeping the others fixed, and deduce that the inequality must be violated on one of the faces of
. Inductively, if the inequality is violated at all, it is violated at a vertex of
, which may be ruled out by inspection; qed.
This argument shows that the whole game is to understand the acyclic components of
; i.e. those which are topologically trees, and therefore contribute
to
, but
to
.
Incidentally, for all I know, this simple argument is explicitly contained in either Stallings’ or Gersten’s paper (it is surely not original in any case). If a reader can verify this, please let me know!
Update: Walter Neumann informs me that this observation (that the inequality is true with
in place of
) is in his paper in which he introduces the SHNC! He further shows in that paper that for “most”
, the SHNC is true for all
.
Update (6/29): Warren Dicks informs me that he was not aware of the reduction of SHNC to the graph-theoretic formulation described above. Friedman’s webpage acknowledges the existence of an error in the paper, and says that he is working to correct it. One problem that I know of (discovered mostly by my student Steven Frankel) concerns the commutativity of the diagram on page 10.
Update (10/22): It has been a few months since I last edited this page, and Joel Friedman has not updated either the arXiv paper, or the statement on his webpage that he is “trying to fix the error”. Since wikipedia mentions Friedman’s announcement, I thought it would be worth going on record at this point to say that Friedman’s arXiv paper (version 1 — the only version at the point I write this) is definitely in error, and that I believe the error is fundamental, and cannot be repaired (this is not to say that the paper does not contain some things of interest (it does), or that Friedman does not acknowledge the error (he does), just that it is worth clearing up any possible ambiguity about the situation for readers who are wondering about the status of the SHNC). The problem is the “not entirely standard” (quote from Friedman’s paper) diagrams, like the one on page 10. In particular, the claimed proof of Theorem 5.6, that the projections constructed in Lemma 5.5 (by a very general dimension counting argument) fit into a diagram with the desired properties is false. Any construction of projections satisfying the desired properties must be quite special. Nevertheless, one can certainly still define Friedman’s sheaf
, and ask whether it has
(in Friedman’s sense); this would, as far as I can tell, prove SHNC; however, I do not know of any reason why it should hold (or whether there are any counterexamples, which might exist even if SHNC is true).