## van Kampen soup and thermodynamics of DNA

The development and scope of modern biology is often held out as a fantastic opportunity for mathematicians. The accumulation of vast amounts of biological data, and the development of new tools for the manipulation of biological organisms at microscopic levels and with unprecedented accuracy, invites the development of new mathematical tools for their analysis and exploitation. I know of several examples of mathematicians who have dipped a toe, or sometimes some more substantial organ, into the water. But it has struck me that I know (personally) few mathematicians who believe they have something substantial to learn from the biologists, despite the existence of several famous historical examples.  This strikes me as odd; my instinctive feeling has always been that intellectual ruts develop so easily, so deeply, and so invisibly, that continual cross-fertilization of ideas is essential to escape ossification (if I may mix biological metaphors . . .)

It is not necessarily easy to come up with profound examples of biological ideas or principles that can be easily translated into mathematical ones, but it is sometimes possible to come up with suggestive ones. Let me try to give a tentative example.

Deoxiribonucleic acid (DNA) is a nucleic acid that contains the genetic blueprint for all known living things. This blueprint takes the form of a code — a molecule of DNA is a long polymer strand composed of simple units called nucleotides; such a molecule is typically imagined as a string in a four character alphabet $\lbrace A,T,G,C \rbrace$, which stand for the nucleotides Adenine, Thymine, Guanine, and Cytosine. These molecular strands like to arrange themselves in tightly bound oppositely aligned pairs, matching up nucleotides in one string with complementary nucleotides in the other, so that $A$ matches with $T$, and $C$ with $G$.

The geometry of a strand of DNA is very complicated — strands can be tangled, knotted, linked in complicated ways, and the fundamental interactions between strands (e.g. transcription, recombination) are facilitated or obstructed by mechanical processes depending on this geometry. Topology, especially knot theory, has been used in the study of some of these processes; the value of topological methods in this context include their robustness (fault-tolerance) and the discreteness of their invariants (similar virtues motivate some efforts to build topological quantum computers). A complete mathematical description of the salient biochemistry, mechanics, and semantic content of a configuration of DNA in a single cell is an unrealistic goal for the foreseeable future, and therefore attempts to model such systems depends on ignoring, or treating statistically, certain features of the system. One such framework ignores the ambient geometry entirely, and treats the system using symbolic, or combinatorial methods which have some of the flavor of geometric group theory.

One interesting approach is to consider a mapping from the alphabet of nucleotides to a standard generating set for $F_2$, the free group on two generators; for example, one can take the mapping $T \to a, A \to A, C \to b, G \to B$ where $a,b$ are free generators for $F_2$, and ${}A,B$ denote their inverses. Then a pair of oppositely aligned strands of DNA translates into an edge of a van Kampen diagram — the “words” obtained by reading the letters along an edge on either side are inverse in $F_2$.

Strands of DNA in a configuration are not always paired along their lengths; sometimes junctions of three or more strands can form; certain mobile four-strand junctions, so-called “Holliday junctions”, perform important functions in the process of genetic recombination, and are found in a wide variety of organisms. A configuration of several strands with junctions of varying valences corresponds in the language of van Kampen diagrams to a fatgraph — i.e. a graph together with a choice of cyclic ordering of edges at each vertex — with edges labeled by inverse pairs of words in $F_2$ (note that this is quite different from the fatgraph model of proteins developed by Penner-Knudsen-Wiuf-Andersen). The energy landscape for branch migration (i.e. the process by which DNA strands separate or join along some segment) is very complicated, and it is challenging to model it thermodynamically. It is therefore not easy to predict in advance what kinds of fatgraphs are more or less likely to arise spontaneously in a prepared “soup” of free DNA strands.

As a thought experiment, consider the following “toy” model, which I do not suggest is physically realistic. We make the assumption that the energy cost of forming a junction of valence $v$ is $c(v-2)$ for some fixed constant $c$. Consequently, the energy of a configuration is proportional to $-\chi$, i.e. the negative of Euler characteristic of the underlying graph. Let $w$ be a reduced word, representing an element of $F_2$, and imagine a soup containing some large number of copies of the strand of DNA corresponding to the string $\dot{w}:=\cdots www \cdots$. In thermodynamic equilibrium, the partition function has the form $Z = \sum_i e^{-E_i/k_BT}$ where $k_B$ is Boltzmann’s constant, $T$ is temperature, and $E_i$ is the energy of a configuration (which by hypothesis is proportional to $-\chi$). At low temperature, minimal energy configurations tend to dominate; these are those that minimize $-\chi$ per unit “volume”. Topologically, a fatgraph corresponding to such a configuration can be thickened to a surface with boundary. The words along the edges determine a homotopy class of map from such a surface to a $K(F_2,1)$ (e.g. a once-punctured torus) whose boundary components wrap multiply around the free homotopy class corresponding to the conjugacy class of $w$. The infimum of $-\chi/2d$ where $d$ is the winding degree on the boundary, taken over all configurations, is precisely the stable commutator length of $w$; see e.g. here for a definition.

Anyway, this example is perhaps a bit strained (and maybe it owes more to thermodynamics than to biology), but already it suggests a new mathematical object of study, namely the partition function $Z$ as above, and one is already inclined to look for examples for which the partition function obeys a symmetry like that enjoyed by the Riemann zeta function, or to specialize temperature to other values, as in random matrix theory. The introduction of new methods into the study of a classical object — for example, the decision to use thermodynamic methods to organize the study of van Kampen diagrams — bends the focus of the investigation towards those examples and contexts where the methods and tools are most informative. Phenomena familiar in one context (power laws, frequency locking, phase transitions etc.) suggest new questions and modes of enquiry in another. Uninspired or predictable research programs can benefit tremendously from such infusions, whether the new methods are borrowed from other intellectual disciplines (biology, physics), or depend on new technology (computers), or new methods of indexing (google) or collaboration (polymath).

One of my intellectual heroes — Wolfgang Haken — worked for eight years in R+D for Siemens in Munich after completing his PhD. I have a conceit (unsubstantiated as far as I know by biographical facts) that his experience working for a big engineering firm colored his approach to mathematics, and made it possible for him to imagine using industrial-scale “engineering” tools (e.g. integer programming, exhaustive computer search of combinatorial possibilities) to solve two of the most significant “pure” mathematical open problems in topology at the time — the knot recognition problem, and the four-color theorem. It is an interesting exercise to try to imagine (fantastic) variations. If I sit down and decide to try to prove (for example) Cannon’s conjecture, I am liable to try minor variations on things I have tried before, appeal for my intuition to examples that I understand well, read papers by others working in similar ways on the problem, etc. If I imagine that I have been given a billion dollars to prove the conjecture, I am almost certain to prioritize the task in different ways, and to entertain (and perhaps create) much more ambitious or innovative research programs to tackle the task. This is the way in which I understand the following quote by John Dewey, which I used as the colophon of my first book:

Every great advance in science has issued from a new audacity of the imagination.

This entry was posted in Biology, Dynamics, Groups and tagged , , , , , , , . Bookmark the permalink.

### 4 Responses to van Kampen soup and thermodynamics of DNA

1. Dear D,

Inspiring.

Best wishes,

E

• Danny Calegari says:

Dear Cap’n E – coming from you, that’s a significant compliment! Welcome to the blog (Frank already complained that I forgot to tell anyone about it . . .)

2. Albion Lawrence says:

I don’t think your conceit is at all implausible. Peter Galison has a few examples where he claims that results in fairly pure theoretical physics (special relativity, Feynman diagrams) arose from the inventors’ contact with real-world technological problems (clock synchronization, safe storage of fissile material).

Talk to your colleague Rob Phillips much?

• Danny Calegari says:

Dear Albion – those are some pretty impressive examples. I think I might have browsed through “Einstein’s clocks” in a bookstore once; maybe I should take a closer look.

I think it’s amusing that I come to learn about the interesting work my colleagues (e.g. Phillips) are doing via Dipankar (I hope you don’t mind being characterized as “via Dipankar” . . .)