You are currently browsing the category archive for the ‘Psychology’ category.
The other day at lunch, one of my colleagues — let’s call her “Wendy Hilton” to preserve her anonymity (OK, this is pretty bad, but perhaps not quite as bad as Clive James’s use of “Romaine Rand” as a pseudonym for “Germaine Greer” in Unreliable Memoirs . . .) — expressed some skepticism about a somewhat unusual assertion that I make at the start of my scl monograph. Since it is my monograph, I feel free to quote the offending paragraphs:
It is unfortunate in some ways that the standard way to refer to the plane emphasizes its product structure. This product structure is topologically unnatural, since it is deﬁned in a way which breaks the natural topological symmetries of the object in question. This fact is thrown more sharply into focus when one discusses more rigid topologies.
At this point I give an example, namely that of the Zariski topology, pointing out that the product topology of two copies of the affine line with the Zariski topology is not the same as the Zariski topology on the affine plane. All well and good. I then go on to claim that part of the bias is biological in origin, citing the following example as evidence:
Example 1.2 (Primary visual cortex). The primary visual cortex of mammals (including humans), located at the posterior pole of the occipital cortex, contains neurons hardwired to ﬁre when exposed to certain spatial and temporal patterns. Certain speciﬁc neurons are sensitive to stimulus along speciﬁc orientations, but in primates, more cortical machinery is devoted to representing vertical and horizontal than oblique orientations (see for example  for a discussion of this eﬀect).
(Note:  is a reference to the paper “The distribution of oriented contours in the real world” by David Coppola, Harriett Purves, Allison McCoy, and Dale Purves, Proc. Natl. Acad. Sci. USA 95 (1998), no. 7, 4002–4006)
I think Wendy took this to be some kind of poetic license or conceit, and perhaps even felt that it was a bit out of place in a serious research monograph. On balance, I think I agree that it comes across as somewhat jarring and unexpected to the reader, and the tone and focus is somewhat inconsistent with that of the rest of the book. But I also think that in certain subjects in mathematics — and I would put low-dimensional geometry/topology in this category — we are often not aware of the extent to which our patterns of reasoning and imagination are shaped, limited, or (mis)directed by our psychological — and especially psychophysical — natures.
The particular question of how the mind conceives of, imagines, or perceives any mathematical object is complicated and multidimensional, and colored by historical, social, and psychological (not to mention mathematical) forces. It is generally a vain endeavor to find precise physical correlates of complicated mental objects, but in the case of the plane (or at least one cognitive surrogate, the subjective visual field) there is a natural candidate for such a correlate. Cells on the rear of the occipital lobe are arranged in a “map” in the region of the occipital lobe known as the “primary visual cortex”, or V1. There is a precise geometric relationship between the location of neurons in V1 and the points in the subjective visual field they correspond to. Further visual processing is done by other areas V2, V3, V4, V5 of the visual cortex. Information is fed forward from Vi to Vj with , but also backward from Vj to Vi regions, so that visual information is processed at several levels of abstraction simultaneously, and the results of this processing compared and refined in a complicated synthesis (this tends to make me think of the parallel terraced scan model of analogical reasoning put forward by Douglas Hofstadter and Melanie Mitchell; see Fluid concepts and creative analogies, Chapter 5).
The initial processing done by the V1 area is quite low-level; individual neurons are sensitive to certain kind of stimuli, e.g. color, spatial periodicity (on various scales), motion, orientation, etc. As remarked earlier, more neurons are devoted to detecting horizontally or vertically aligned stimuli; in other words, our brains literally devote more hardware to perceiving or imagining vertical and horizontal lines than to lines with an oblique orientation. This is not to say that at some higher, more integrated level, our perception is not sensitive to other symmetries that our hardware does not respect, just as a random walk on a square lattice in the plane converges (after a parabolic rescaling) to Brownian motion (which is not just rotationally but conformally invariant). However the fact is that humans perform statistically better on cognitive tasks that involve the perception of figures that are aligned along the horizontal and vertical axes, than on similar tasks that differ only by a rotation of the figures.
It is perhaps interesting therefore that the earliest (?) mathematical conception of the plane, due to the Greeks, did not give a privileged place to the horizontal or vertical directions, but treats all orientations on an equal footing. In other words, in Greek (Euclidean) geometry, the definitions respect the underlying symmetries of the objects. Of course, from our modern perspective we would not say that the Greeks gave a definition of the plane at all, or at best, that the definition is woefully inadequate. According to one well-known translation, the plane is introduced as a special kind of surface as follows:
A surface is that which has length and breadth.
When a surface is such that the right line joining any two arbitrary points in it lies wholly in the surface, it is called a plane.
This definition of a surface looks as though it is introducing coordinates, but in fact one might just as well interpret it as defining a surface in terms of its dimension; having defined a surface (presumably thought of as being contained in some ambient undefined three-dimensional space) one defines a plane to be a certain kind of surface, namely one that is convex. Horizontal and vertical axes are never introduced. Perpendicularity is singled out as important, but the perpendicularity of two lines is a relative notion, whereas horizontality and verticality are absolute. In the end, Euclidean geometry is defined implicitly by its properties, most importantly isotropy (i.e. all right angles are equal to one another) and the parallel postulate, which singles it out from among several alternatives (elliptic geometry, hyperbolic geometry). In my opinion, Euclidean geometry is imprecise but natural (in the sense of category theory), because objects are defined in terms of the natural transformations they admit, and in a way that respects their underlying symmetries.
In the 15th century, the Italian artists of the Renaissance developed the precise geometric method of perspective painting (although the technique of representing more distant objects by smaller figures is extremely ancient). Its invention is typically credited to the architect and engineer Filippo Brunelleschi; one may speculate that the demands of architecture (i.e. the representation of precise 3 dimensional geometric objects in 2 dimensional diagrams) was one of the stimuli that led to this invention (perhaps this suggestion is anachronistic?). Mathematically, this gives rise to the geometry of the projective plane, i.e. the space of lines through the origin (the “eye” of the viewer of a scene). In principle, one could develop projective geometry without introducing “special” directions or families of lines. However, in one, two, or three point perspective, families of lines parallel to one or several “special” coordinate axes (along which significant objects in the painting are aligned) appear to converge to one of the vanishing points of the painting. In his treatise “De pictura” (on painting), Leon Battista Alberti (a friend of Brunelleschi) explicitly described the geometry of vision in terms of projections on to a (visual) plane. Amusingly (in the context of this blog post), he explicitly distinguishes between the mathematical and the visual plane:
In all this discussion, I beg you to consider me not as a mathematician but as a painter writing of these things.
Mathematicians measure with their minds alone the forms of things separated from all matter. Since we wish the object to be seen, we will use a more sensate wisdom.
I beg to differ: similar parts of the brain are used for imagining a triangle and for looking at a painting. Alberti’s claim sounds a bit too much like Gould’s “non-overlapping magisteria”, and in a way it is disheartening that it was made at a place and point in history at which mathematics and the visual arts were perhaps at their closest.
In the 17th century René Descartes introduced his coordinate system and thereby invented “analytic geometry”. To us it might not seem like such a big leap to go from a checkerboard floor in a perspective painting (or a grid of squares to break up the visual field) to the introduction of numerical coordinates to specify a geometrical figure, but Descartes’s ideas for the first time allowed mathematicians to prove theorems in geometry by algebraic methods. Analytic geometry is contrasted with “synthetic geometry”, in which theorems are deduced logically from primitive axioms and rules of inference. In some abstract sense, this is not a clear distinction, since algebra and analysis also rests on primitive axioms, and rules of deduction. In my opinion, this terminology reflects a psychological distinction between “analytic methods” in which one computes blindly and then thinks about what the results of the calculation mean afterwards, and “synthetic methods” in which one has a mental model of the objects one is manipulating, and directly intuits the “meaning” of the operations one performs. Philosophically speaking, the first is formal, the second is platonic. Biologically speaking, the first does not make use of the primary visual cortex, the second does.
As significant as Descartes ideas were, mathematicians were slow to take real advantage of them. Complex numbers were invented by Cardano in the mid 16th century, but the idea of representing complex numbers geometrically, by taking the real and imaginary parts as Cartesian coordinates, had to wait until Argand in the early 19th.
Incidentally, I have heard it said that the Greeks did not introduce coordinates because they drew their figures on the ground and looked at them from all sides, whereas Descartes and his contemporaries drew figures in books. Whether this has any truth to it or not, I do sometimes find it useful to rotate a mathematical figure I am looking at, in order to stimulate my imagination.
After Poincaré’s invention of topology in the late 19th century, there was a new kind of model of the plane to be (re)imagined, namely the plane as a topological space. One of the most interesting characterizations was obtained by the brilliantly original and idiosyncratic R. L. Moore in his paper, “On the foundations of plane analysis situs”. Let me first remark that the line can be characterized topologically in terms of its natural order structure; one might argue that this characterization more properly determines the oriented line, and this is a fair comment, but at least the object has been determined up to a finite ambiguity. Let me second of all remark that the characterization of the line in terms of order structures is useful; a (countable) group is abstractly isomorphic to a group of (orientation-preserving) homeomorphisms of the line if and only if admits an (abstract) left-invariant order.
Given points and the line, Moore proceeds to list a collection of axioms which serve to characterize the plane amongst topological spaces. The axioms are expressed in terms of separation properties of primitive undefined terms called points and regions (which correspond more or less to ordinary points and open sets homeomorphic to the interiors of closed disks respectively) and non-primitive objects called “simple closed curves” which are (eventually) defined in terms of simpler objects. Moore’s axioms are “natural” in the sense that they do not introduce new, unnecessary, unnatural structure (such as coordinates, a metric, special families of “straight” lines, etc.). The basic principle on which Moore’s axioms rest is that of separation — which continua separate which points from which others? If there is a psychophysical correlate of this mathematical intuition, perhaps it might be the proliferation of certain neurons in the primary visual cortex which are edge detectors — they are sensitive, not to absolute intensity, but to a spatial discontinuity in the intensity (associated with the “edge” of an object). The visual world is full of objects, and our eyes evolved to detect them, and to distinguish them from their surroundings (to distinguish figure from ground as it were). If I have an objection to Cartesian coordinates on biological grounds (I don’t, but for the sake of argument let’s suppose I do) then perhaps Moore should also be disqualified for similar reasons. Or rather, perhaps it is worth being explicitly aware, when we make use of a particular mathematical model or intellectual apparatus, of which aspects of it are necessary or useful because of their (abstract) applications to mathematics, and which are necessary or useful because we are built in such a way as to need or to be able to use them.
Bill Thurston once observed that topology and measure theory are very immiscible (i.e. they don’t mix easily); this statement has always resonated with me, and I thought I would try to explain some of the (personal, psychological, and mathematical) reasons why. On the face of it, topology and measure theory are very closely related. Both are concerned with spaces equipped with certain algebras of sets (open sets, measurable sets) and classes of functions (continuous functions, measurable functions). Continuous functions (on reasonable spaces) are measurable, and (some) measures can be integrated to define continuous functions. However, in my mind at least, they are very different in a psychological sense, and one of the most important ways in which they differ concerns the role of examples.
At the risk of oversimplifying, one might say that one modern mathematical tradition, perhaps exemplified by the Bourbakists, insists that examples are either irrelevant or misleading. There is a famous story about Grothendieck, retold in this article by Allyn Jackson, which goes as follows:
One striking characteristic of Grothendieck’s mode of thinking is that it seemed to rely so little on examples. This can be seen in the legend of the so-called “Grothendieck prime”. In a mathematical conversation, someone suggested to Grothendieck that they should consider a particular prime number. “You mean an actual number?” Grothendieck asked. The other person replied, yes, an actual prime number. Grothendieck suggested, “All right, take ”.
Leaving aside the “joke” of Grothendieck’s (supposed) inability to factor , this anecdote has an instructive point. No doubt Grothendieck’s associate was expecting a small prime number such as or . What would have been the reaction if Grothendieck had said “All right, take “? When one considers examples, one is prone to consider simple examples; of course this is natural, but one must be aware that such examples can be misleading. Morwen Thistlethwaite once made a similar observation about knot theory; from memory he said something like:
When someone asks you to think about a knot, you usually imagine a trefoil, or a figure , or maybe a torus knot. But the right image to have in your mind is a room entirely filled with a long, tangled piece of string.
Note that there is another crucial function of examples, namely their role as counterexamples, which certify the invalidity of a general claim — such counterexamples should, of course, be as simple as possible (and even Grothendieck was capable of coming up with some); but I am concerned here and in the sequel with the role of “confirming” examples, so to speak.
At the other extreme(?), and again at the risk of oversimplifying, one might take the “” (or Petkovsek-Wilf-Zeilberger) point of view, that sufficiently good/many examples are proofs. They give a beautifully simple but psychologically interesting example (Theorem 1.4.2 in ): to show that the angle bisectors of a triangle are coincident, it suffices to verify this for a sufficiently large but finite (explicit) number of examples. The reason such a proof is valid is that the co-ordinates of the pairwise intersections of the angle bisectors are rational functions of (certain trigonometric functions of) the angles, of an explicit (and easily determined) degree, and to prove an identity between rational functions, it suffices to prove that it holds for enough values. Another aspect of the philosophy is that by the process of abstraction, a theorem in one context can become an example in another. For example, “even plus odd equals odd” might be a theorem over , but an example over . One might say that the important thing about examples is that they should be sufficiently general that they exhibit all or enough of the complexity of the general case, and that if enough features of an example can be reimagined or abstracted as parameters, an example can become (or be translated into) a theorem.
In some fields of mathematics, one can make the idea of a “general example” rigorous. In algebraic geometry, one has the concept of a generic point on a scheme; in differential topology, one considers submanifolds in general position; in ergodic theory, one considers a normal number (or sequence in some fixed alphabet). In fact, it is not so clear whether a “formal” generic object in some domain should be thought of as the ultimate example, or as the ultimate rejection of the use of examples! In any case, in practice, when as mathematicians we select examples to test our ideas on, we rarely adhere to a rigorous procedure to ensure that our examples are good ones, and we are therefore susceptible to certain well-known psychological biases. The first is the availability heuristic, as defined by the psychologists Kahneman and Tversky, which says roughly that people tend to overemphasize the importance of examples that they can think of most easily. Why exactly is this bad? Well, because it interacts with another bias, that it is easier to think of an example which is more specific — e.g. it is easier to think of a fruit that starts with the letter “A” than just to think of a fruit. One might argue that this bias is unavoidable, given the nature of the task “think of an example of X” — e.g. it is much easier to find a unique solution (of a differential equation, of a system of linear equations, etc.) than to find a solution to an underdetermined problem. In fact, finding a unique solution is so much easier than solving an underdetermined problem, that one often tries to solve the underdetermined problem by adding constraints until the solution is unique and can be found (e.g. simplex method in linear programming). Conversely, this bias is also part of the explanation for why examples are so useful: the mind devotes more attention and mental resources to a more specific object. So even if one is interested in finding a rigorous and abstract proof, it is often easier to find a proof for a specific example, and then to “rewrite” the proof, replacing the specific example by the general case, and checking that no additional hypotheses are used. The second psychological bias is that of framing. A frame consists of a collection of schemata and stereotypes that provide a context in which an event is interpreted. Many mathematical concepts or objects can be formulated in many different ways which are all logically equivalent, but which frame the concept or object quite differently. The word “bird” suggests (to most people) a schema which involves flight, wings, beaks, etc. The mental image it conjures up will (almost never) resemble a flightless bird like a penguin, or a kiwi, unless extra cues are given, like “a bird indigenous to New Zealand”. A statement about covering spaces might be equivalent to a statement in group theory, but the first might bring to mind topological ideas like paths, continuous maps, compact subsets etc. while the second might suggest homomorphisms, exact sequences, extensions etc., and the examples suggested by the frames might be substantially (mathematically) different, sometimes in crucial ways.
Back to measure theory and topology. In topology, one is frequently (always?) interested in a topological space. Here context is very important — a “topological space” could be a finite set, a graph, a solenoid, a manifold, a Cantor set, a sheaf, a CW complex, a lamination, a profinite group, a Banach space, the space of all compact metric spaces (up to isometry) in the Gromov-Hausdorff metric, etc. By contrast, a “measure space” is an interval, plus some countable (possibly empty) collection of atoms. Of course, one thinks of a measure space much more concretely, by adding some incidental extra structure which is irrelevant to the measure structure, but crucial to the mathematical (and psychological) interest of the space; hence a “measure space” could be the space of infinite sequences in a fixed alphabet, the Sierpinski gasket in the plane, the attractor of an Axiom A diffeomorphism, and so on. In other words, one is typically interested in measure theory as a tool to explore some mathematical object with additional structure, whereas one is frequently interested in topological spaces as objects of intrinsic interest in their own right. Many interesting classes of topological objects can be visualized in great detail — sometimes so much detail that in practice one generates proofs by examining sufficiently complicated examples, and building up clear and detailed mental models of the kinds of phenomena that can occur in a given context. Visualizing a “typical” measurable set (even a subset of or the plane) or map is much more difficult, if it is even possible (or, for that matter, a non-measurable set). In fact, one tends routinely to bump up against important subtleties in mathematical logic (especially set theory) when trying even to define such elusive entities as a “typical” measurable subset of the interval. For instance, Solovay’s famous theorem says (amongst other things) that the statement “every set of real numbers is Lebesgue measurable” is consistent with Zermelo-Frankel set theory without the axiom of choice (in fact, Solovay’s result is relative to the existence of certain large cardinals — so-called inaccessible cardinals). Solovay explicitly addresses in his paper the issue of explicitly describing a non-Lesbesgue measurable set or reals:
Of course the axiom of choice is true, and so there are non-measurable sets. It is natural to ask if one can explicitly describe a non-Lesbegue measurable set.
Say that a set of reals is definable from a set if there is a formula having free only the variables and , so that . Solovay shows that (again, assuming the existence of an inaccessible cardinal), even if the axiom of choice is true, every set of reals definable from a countable sequence of ordinals is Lebesgue measurable (interestingly enough, one of the most important concepts introduced by Solovay in his paper is the notion of a random real, namely a real number that is not contained in any of a certain class of Borel sets of measure zero, namely those that are rational (i.e. those that can be encoded in a certain precise way); this resonates somewhat with the “generic points” and “normal numbers” mentioned earlier).
If imagining “good” examples in measure theory is hard, what is the alternative? Evidently, it is to imagine “bad” examples, or at least very non-generic ones. Under many circumstances, the “standard” mental image of a measurable map is a piecewise-constant map from the unit interval to a countable (even finite) set. This example rests on two approximations: the process of building up an arbitrary Borel set (in , say) from half-open intervals by complementing, intersections and unions; and the process of defining an arbitrary measurable function as a limit of a sequence of finite sums of multiples of indicator functions. Such a mental image certainly has its uses, but for my own part I think that if one is going to use such a mental model anyway, one should be aggressive about using one’s intuition about continuous functions and open sets to make the example as specific, as rich and as “generic” as possible, while understanding that the result is not the measurable function or set itself, but only an approximation to it, and one should try to keep in mind a sequence of such maps, with increasing complexity and richness (if possible).
Of course, non-measurable sets do arise in practice. If one wants in the end to prove a theorem whose truth or falsity does not depend on the Axiom of choice, then by Solovay one could do without such sets if necessary. The fact that we do not must mean that the use of non-measurable sets (necessarily constructed using the Axiom of choice) leads to shorter/more findable proofs, or more understandable proofs, or both. Let me mention a few examples of situations in which the Axiom of choice is extremely useful:
- The Hahn-Banach theorem in functional analysis
- The existence of ultralimits of sequences of metric spaces (equivalently, the existence of Stone-Cech compactifications)
- A group is said to be left orderable if there is a total order on such that implies for all . If is a finite subset of nontrivial elements of , the order partitions into , where the superscript denotes the elements that are greater than, respectively less than the identity element. Suppose for some finite set , and every partition into some product of elements of one of the subsets (with repeats allowed) is equal to the identity. Then necessarily is not left orderable. In fact, the converse is also true: if no such “poison” subset exists, then is left orderable. This follows from the compactness of the set of partitions of into two subsets (equivalently, the compactness of the set ) which follows from Tychonoff’s theorem.
- The existence of Galois automorphisms of over (other than complex conjugation). Such automorphisms are necessarily non-measurable, and therefore (by Solovay) cannot be constructed without the axiom of choice. In fact, this follows from a theorem of Mackey, that any measurable homomorphism between (sufficiently nice, i.e. locally compact, second countable) topological groups is continuous. We give the sketch of a proof. Suppose is given, and without loss of generality, assume it is surjective. Let be a neighborhood of the identity in , and let be a symmetric open neighborhood of the identity with . The group is covered by countably many translates of , and therefore the measure of is positive. Let where is compact, is open, and such that the (Haar) measure of is less than twice the Haar measure of (the existence of such an open set depends on the fact that measure agrees with outer measure for measurable sets). Since is open, there is an open neighborhood of the identity in so that for all . But and both have measure more than half the measure of , so they intersect. Since is symmetric, so is , and therefore . This implies is continuous, as claimed. A continuous Galois automorphism of is either the identity, or complex conjugation.
Personally, I think that one of the most compelling reasons to accept the Axiom of choice is psychological, and is related to the phenomenon of closure. If we see a fragment of a scene or a pattern, our mind fills in the rest of the scene or pattern for us. We have no photoreceptor cells in our eyes where the optic nerve passes through the retina, but instead of noticing this gap, we have an unperceived blind spot in our field of vision. If we can choose an element of a finite set whenever we want to, we feel as though nothing would stop us from making such a choice infinitely often. We are inclined to accept a formula with a “for all” quantifier ranging over an infinite set, if the formula holds every time we check it. We are inclined to see patterns — even where they don’t exist. This is the seductive and dangerous (?) side of examples, and maybe a reason to exercise a little caution.
In fact, this discussion barely scratches the surface (and does not really probe into either topology or measure theory in any deep way). I would be very curious to hear contrasting opinions.
Update (6/20): There are many other things that I could/should have mentioned about the interaction between measure theory and topology, and the difficulty of finding good generic examples in measure theory. For example, I definitely should have mentioned:
- Lusin’s theorem, which says that a measurable function is continuous on almost all its domain; e.g. if is any measurable function on an interval , then for any positive there is a compact subset so that the measure of is at most , and is continuous on .
- von Neumann’s theorem, that a Borel probability measure on the unit cube in is equivalent to Lesbesgue measure (on the cube) by a self-homeomorphism of the cube (which can be taken to be the identity on the boundary) if and only if it is nonatomic, gives the boundary measure zero, and is positive on nonempty relatively open subsets.
- Pairs of mutually singular measures of full support on simple sets. For example, let be the Cantor set of infinite strings in the alphabet with its product topology, and define an infinite string in inductively as follows. For any string , define the complement to be the string whose digits are obtained from by interchanging and . Then define to be the string , and inductively define where there are copies of , and is chosen so that . Let be the set of accumulation points of under the shift map. Any finite string that appears in appears with definite density, so is invariant and minimal (i.e. every orbit is dense) under the shift map. However, the proportion of ‘s in is at least for odd, and at most for even. Let denote the Dirac measure on the infinite string , and let denote the average of over its (finite) orbit under the shift map. Define and . These probability measures are shift-invariant, and have shift-invariant weak limits as with support in . Moreover, if denotes the strings in that start with for , then . In particular, the space of shift invariant probability measures on is at least -dimensional, and we may therefore obtain distinct mutually singular ergodic shift-invariant probability measures on . Since is minimal, both measures have full support.
- Shelah’s theorem that if one works in (ZF) plus the axiom of dependent choice, if there is an uncountable well-ordered set of reals, then there is a non-(Lebesgue) measurable set, which shows the necessity of Solovay’s use of inaccessible cardinals. (By Solovay, the axiom of dependent choice is consistent with the statement that every set of reals is Lebesgue-measurable).