PhD Thesis

I got an email recently from someone who said that they read my PhD thesis multiple times. This produced a strange and maybe a somewhat sad feeling all at once. Sad because over the past few years I have not had much time to invest into that line of work, though I loved it very dearly and thought that it was important work.

Strange because the typical paper one writes probably only gets read in passing and most of the time the reading stops with the abstract. So reading an entire dissertation in comparison seemed like a uniquely strange and rare event and reading one multiple times borders on disbelief. Anyway, the arrival of this email caused me to go back and look through my dissertation after many years and I felt like reproducing the preface of the dissertation here.

Preface to `The Computational Content of Isomorphisms’

Is there a unified theory of computational effects? This question was the object of my curiosity for much of my time as a graduate student. The word effect is semi-formally used in much of programming language theory even though it has no definition. Does such a definition exist?

The folklore on computational effects in the functional languages community describe effects as things that breaks referential transparency (or some similar substitutivity argument) or that they break equational reasoning of the λ-calculus or that they violate the functional property (in that λ-calculus functions do not behave like set-theoretic functions anymore) and variations of these themes.

However, none of these ad hoc arguments hold under scrutiny. For the first, see Søndergaard and Sestoft [1990], who formalize referential transparency. They start from the original remarks made by Quine [1953] in the context of natural language and describe its import into computer science by Strachey [2000] and Stoy [1977]. Søndergaard and Sestoft [1990] compare various notions of referential transparency, unfoldability and related notions, showing why these are largely orthogonal to effects.

Second, defining effects in terms of not respecting equational reasoning does not work very well in two ways. If we add operational rules for, say, assignments to the λ-calculus, it is by definition not the λ-calculus anymore. In fact, if the resulting system is inconsistent it is also not a calculus anymore and hence it is unclear what has been achieved. However, if we were to correctly add an equational notion of assignments, we do get an extended calculus, i.e. λ-calculus with assignments and λ-calculus with continuations have been developed [Felleisen et al., 1987, Felleisen and Hieb, 1992a] where equational reasoning holds. Third, there is nothing in the syntactic theory of the λ-calculus (neither in the callby-name nor the call-by-value variants [Plotkin, 1975]) that require them to behave like set-theoretic functions and there is no a priori reason to expect set-theoretic notions to hold in the system of equalities that is the λ-calculus. Stated differently, expecting a set-theoretic notion of functions to be respected by certain λ-calculus terms is the same as expecting as expecting any other arbitrarily chosen property to hold in the λ-calculus. There is no reason why it should hold.

That said, all sorts of computational phenomena are called effects: assignments, jumps, continuations, mutable state of various sorts, recursion, non-termination, non-determinism and many such things came under the umbrella of effects including some fairly exotic devices such as ‘parallel or’ operator of Plotkin [1977]. Various other things behaved like effects even though they are not usually classified as effects such as the time takes for program execution, heat-dissipated by the execution of a program and such. Even purely semantic concerns seemed “effectful” in nature such as the evaluation of so called higher order contracts and redefinition of symbols in a REPL.

Commonality between these varied things, if any, is not obvious. In some cases effects were things that had access to some hidden information. For example, mutable state has the store as the hidden information, non-determinism has access to a secret coin-toss, parallel-or has access to an oracle that predicts the termination of its arguments, heat dissipation and time have access to some heat signature data or to the clock and so on. However at any point in program execution a vast number of changes occur and a theory that identifies any number of these as effects seems useless. Sabry [2008] explains this in terms of the situation of consuming a pain-killer that also thins the blood. If the intention of taking the pain-killer was indeed to thin the blood, then it is not considered a side effect. Can we build a theory of effects that classifies what things are observable in a computation and what experiments an observer (say the program itself) is allowed to make? This may partly address the issue of “hidden information” effects. If the hidden information is observable then anything that changes it should be considered effectful. That said, it is unclear how such a formal syntactic theory of observers and observables should be built. More importantly, this line of reasoning did not readily apply to many effects such as grabbing/invoking continuations and jumps where there was no simple notion of hidden information.

The term effect is commonly used in the sense of an action rather than a pervasive property. Consequently, for me, a theory of effects needed to have an operational flavor. Any explanation that was purely denotational would be unsatisfying. I was less concerned with abstracting effects and more concerned about explaining them. In the words of The Meditations, “This thing, what is it in itself, in its own constitution? What is its substance and material? And what is its causal nature? And what is it doing in the world? And how long does it subsist?” Was there a unified way to look at all effects? And if such a theory existed, what questions could it answer?

Compelling work had been done that hinted that a unified notion of effects existed. Moggi’s work on monads as an approach to structuring denotational semantics [Moggi, 1989, 1991] led to a pragmatic means of encapsulating effects in programming practice [Wadler, 1992]. Filinski [1996] showed that monadic effects can be captured in direct style using operators derived from delimited continuations. Several other intermediate structures had been discovered for encapsulating effects that did not naturally fit into the monadic framework such as arrows [Hughes, 2000], idioms and applicative functors [Mcbride and Paterson, 2007]. At the time when I obsessed over these matters I was largely unaware of the work on the algebraic theory of effects by Plotkin, Power, Pretnar and others [Bauer and Pretnar, 2012, Plotkin and Power, 2002, 2004, Plotkin and Pretnar, 2009].

Probably the paper that influenced my ideas the most was a lesser known document, Sabry [1998]. Here Amr Sabry tries to propose a definition for what a ‘purely functional language’ is, i.e. what it means to not have any effects. Through a series of thought experiments, the paper proposes that a pure language must be invariant under call-by-value, call-by-name and call-by-need evaluation strategies. The paper was essentially an analysis of purity and effects from Plotkin-Felleisen point of view of seeing computation as syntactic theories and their equivalences [Felleisen and Hieb, 1992b, Plotkin, 1975]. In other words, Sabry chose to propose one notion of equivalence as canonical (somewhat generously choosing a join of the most popular three calculi for λ-terms) and defining effects as everything that fell outside this space.

While there is no compelling reason to chose the equalities of Sabry [1998] as the “right” ones, the underlying idea seemed sound – i.e. the paper was proposing to define purity by introducing a canonical notion of equivalence. Or in other words, effects arise from ad hoc choices of equalities. Instead of accepting $\beta$ , $\beta_v$ or $\eta$ relations a priori as the basis of our computational equivalences, can we have a more principled approach to the equalities that we chose in our computational model? But what do we mean when we talk of a principled approach to equalities? The most primitive notion of equality that we have is the notion of isomorphism. Even before we define a theory that can be parametrized observability, it seemed imperative to study what computational content remains if we only admit only primitive equalities like isomorphisms of simple values. Thus attention was drawn to the computational content of the isomorphisms of finite types. It came as a surprise that even basic isomorphisms like identity, commutativity, associativity and distributivity already contained much of the structure required to express computation and the resulting computational model had several interesting properties. What we found was a startlingly beautiful world of computation, computation in a very pure form, with connections to a several areas in semantics, logic, type theory, quantum physics, information theory, thermodynamics and more. This was the origin of the ideas that lead to this thesis.

Let me end with a few words of thanks…