Lisp AI: The logic of functional programming

A trend toward functional programming has emerged in the last fifteen years. Functional programming languages have seen greater adoption, and several functional features have finally started to make their way into mainstream languages. Lambda expressions have been added to C++ (with C++11), Java (with Java 8), and C# (with version 2).

I rode on the wave of this rise in functional programming. When functional programming started becoming mainstream, I decided to join in. As I went through this personal journey, there have been various problems I have sought to solve along the way and ideas that I have sought to make sense of.

Functional modification

I made my own journey from imperative languages, primarily JavaScript, to more functional ones. One of the first things we are all faced with when making the transition to functional programming is making sense of immutability and functional purity. How is it possible to program if we cannot change anything?

As it turns out, changing things in functional languages is possible. You are just encouraged to do so with pure functions. A case in point is the assoc function in Clojure. This pure function takes in a composite structure and returns a changed version without affecting the original structure.

(def coll [1 2 3 4])
(def new-coll (assoc coll 0 10))
(= coll new-coll) ;=> false

The assoc function is generalized to handle various composite data structures, all within a functional framework. So we can use assoc to modify the parts of a map:

(def point {:x 1, :y 2})
(assoc point :x 10) ;=> {:x 10, :y 2}

In addition, the assoc-in method can be used to make modifications deep within the nested parts of a composite data structure. For example, it can be used to modify the values within a matrix.

(def matrix 
  [[1 2 3]
   [4 5 6]
   [7 8 9]])
(assoc-in matrix [1 1] 50) ;=> [[1 2 3] [4 50 6] [7 8 9]]

So it would seem that there is a way of modifying things in functional languages. It is just that the modifications take on a different form. This leads us to functional references, which help us to make sense of functional modifications.

Functional references:
Traditional languages like C++, Java, C#, and JavaScript provide generalized imperative references in the form of lvalues (locator values). An lvalue is anything that occurs on the left-hand side of an equal sign. The right-hand side contains what the lvalue is to be turned into. When you move over to the functional world, you get generalized functional references, also known as lenses.

; the basic idea is that a lens can be defined by combining a pure 
; getter function and a pure setter function.  
(deftype Lens [getter setter]
  clojure.lang.IFn 
  (invoke [this arg] (get arg))
  (applyTo [this args] (clojure.lang.AFn/applyToHelper this args)))

Lenses implement clojure.lang.IFn, so they can be treated a lot like functions. However, we need to define special methods that wrap their setter functions to make further use of them.

(defn modify 
  "Set the part of object at lens to val."
  [lens obj val]
  
  (let [setter (.-setter lens)]
    (setter obj val)))

(defn zap
  "Apply the function func to the part of the obj at lens."
  [lens obj func]
  
  (let [setter (.-setter lens)]
    (setter obj (func (lens obj)))))

The clojure get and get-in functions are the counterparts of assoc and assoc-in. We can combine get and assoc methods to form lenses.

; an example lens which describes the first part of a tuple
(def first-lens
  (Lens.
   ; get the first part of a tuple
   (fn [arg]
     (get arg 0)) 
    
    ; set the first part of a tuple
    (fn [arg v]
      (assoc arg 0 v))))

; an example lens which describes the second part of a tuple
(def second-lens 
  (Lens.
   ; get the second part of a tuple
   (fn [arg]
     (get arg 1)) 
    
    ; set the second part of a tuple
    (fn [arg v]
      (assoc arg 1 v))))

A fundamental property of references in imperative and functional languages is that they can be arbitrarily nested. In the functional world, this nesting is defined compositionally.

(defn ^Lens compose-lenses 
  "combine two lenses to get their composition"
  [^Lens f, ^Lens g]
  
  (Lens.
   (comp (.-getter f) (.-getter g))
    (let [set-f (.-setter f)
          set-g (.-setter g)]
      (fn [a v]
        (set-g a (set-f (g a) v))))))

By using lens composition, we can recapture the nesting provided by Clojure's get-in and assoc-in methods with a much more extensible framework. With lenses, we have now translated references from an imperative context to a functional one.

Bijections are lenses:
Another key fact is that bijections are lenses. In particular, we can form a lens from a function and its inverse. This basic fact is implemented in the code below.

(defn bijection-lens 
  "get a lens from a function and its inverse."
  [func inv-func]
  
  (Lens.
   func 
    (fn [arg v]
      (inv-func v))))

A most obvious example is the bijection between booleans like true and false and bits like zero and one. It is customary to treat false as zero and one as true. So we use this to define a bijection.

(def boolean-to-bit 
  (bijection-lens
    {false 0, true 1} 
    {0 false, 1 true}))

We can now define an object of the boolean type and then modify it in terms of its bit representation.

(def condition true)
(modify boolean-to-bit condition 0) ;=> false

Recalling that we earlier demonstrated that lenses are compositional, any lens can now be composed with a bijection to get a different lens. A lens now has two different components: the information it refers to and its representation of that information. There is a single notion of the first part of a pair of numbers, but there are potentially infinite representations of that same first part.

Algebraic theory of information:
The fact that lenses can be composed with bijections suggests a dichotomy between information and its accessors.

Information: a location within a larger structure
Accessor: the way that a given piece of information is viewed

The first part of a tuple $X^2$ is a unique piece of information, but it may have many different representations. Each composition with a bijection produces a different representation. With so many bijections available producing different representations, it is important that we consider lenses and information as different.

There is a further important property. We often find that some reference has more information than another, or that one memory location has more data then another. So, for example, consider the class of 2x2 matrices. Then these matrices have a first row and a first column, and both of them together contain the $(0,0)$ entry. So the $(0,0)$ is an information substructure of the first row and the first column.

We can think of a lens as a way of drilling down into part of a structure, and some lenses drill down deeper into a structure. The depth that a lens goes down to is part of the theory of information orders, which is a kind of logic. This motivates our further study of the logic of functional programming.

The varied behaviour of functions:
Even with this groundwork laid by functional references we don't really know how general functions behave. With all this work with lenses, all we can do is translate concepts from imperative programming (like modification and references) to functional programming. But we still don't have a way of describing how functions behave.

To explain why we haven't gotten that far yet, we should look at some examples. If we have a quadruple in the set $X^4$ then we can say that it has four different lenses associated to each of its indices. Each lens refers to a different part of the quadruple. We can define a permutation $f(a,b,c,d) = (b,a,d,c)$ which swaps the first and second pairs of values.

(defn permute-quadruple 
  "Swap the first value with the second and the third with  
  the fourth"
  [[a b c d]]
  
  [b a d c])
  
(permute-quadruple [10, 20, 30, 40]) ;=> [20, 10, 40, 30]

This is not simply the modification of some part of a structure. This permutation moves data from one lens to another. There seem to be motions and subcomputations occuring inside of our pure functions! What are these? How do you explain them? This will need to be the subject of further investigation.

(defn transform-quadruple
  "A transformation of a quadruple which is not a permutation."
  [[a b c d]]
  
  [c a a c])
  
(permute-quadruple [10, 20, 30, 40]) ;=> [30, 10, 10, 30]

Aside from moving values around from place to place, a function can modify them. For example, you have things like map which lets you take a function and apply it to all parts of a structure.

(defn increment-triple
  "Increment all parts of a triple"
  [[a b c]]
  
  (vec (map inc [a b c])))
  
(increment-triple [1 2 3]) ;=> [2 3 4]

One further example uses combiners like + and * to take multiple different places and combine them into a single place in the result.

(defn sum-product-pair
  "Get the sum of the first two elements and the product of the 
   next two."
  [[a b c d]]
  
  [(+ a b) (* c d)])
  
(sum-product-pair [10 20 30 40]) ;=> [30 1200]

These are some of the many examples of things we can do with functions. A couple more examples come from linear algebra:

When you transpose a matrix each entry in the output matrix is determined by an entry in the input matrix.
If you take two matrices and multiply them then each entry of the output matrix is determined by a row in the first matrix and a column in the second.
When you add two matrices each entry in the output matrix is determined by the an entry in the first matrix and an entry in the second matrix.

There are examples abound throughout math. This concept of information dependence appears everywhere. Once we have a concept of information and of functional references, it is desirable that we should have a logical theory as to how functions behave around these references. To describe how they move information around. It follows that there is still more that needs to be done.

Describing the transport of data:
The varied behaviour of functions that we are seeking to explain all have a common description. They all transport data from one place to another using some intermediate function. Permutations move data from one place to another without changing them along the way. They are purely a rearrangement of data. The map function moves data from a location back to itself, but it changes it along the way.

Commutative square diagrams can be used to explain all these varied behaviours of functions that we have been seeing. In order to see this it might be helpful to explain commutative square diagrams using exceptionally descriptive variable names. While we could describe each of its four components by single letter variables: f,g,h, and i perhaps the approach I am about to propose will be more suggestive.

We can say that a commutative square diagram for a given function, has in addition to that function three additional components: an input function, an output function, and a transport function. Intuitively, this means that the commutative square diagram describes the movement of data from the input place to the output place occurs along the given transport function. This is how we will describe the movement of data from one place to another.

With this fundamental formalism we now have the means we need to describe the behavior of functions, such as how data is moved from one place to another. If we look back at the examples we were exploring before, like permutations, transformations, matrix transpositions, the incrementation of triples, arithmetic functions, etc they all have an explanation for their behavior using this fundamental formalism.

Take the function permute-quadruple. It moves the first index to the second index, and vice versa, and the third index to the fourth index and vice versa. In each case data is moved from one place to another along the identity function. In the case of increment-triple each index is moved back to itself using the increment-triple function. In these examples, the motion of data from one place to another by a function is described by a commutative square diagram.

When we introduced lenses and functional references we made a very big step forward, because we had a general formalism for describing functional mutations, yet the story could not end there. Introducing these functional references left us open to the question of how we could understand how various different kinds of functions move the data these lenses point to around. With commutative square diagrams, we have the means we need to explain all the behaviour we have been seeing.

Information flows:
What we have so far described is how data can be moved from an input place to an output place along some transport function. We can also ask what can be said of data migrations without a transport function? What happens if we just want to talk about input and output locations by themselves?

Then the formalism we get is a pair $(I,O)$ of an input location and an output location. These location pairs, or partition pairs, describe how ignorance spreads or information flows under the operation of a function. The idea of exploring such pairs with a view towards studying how information flows was first put forward by Hartmanis and Stearns:

Therefore, the S.P partitions on a machine describe the cases in which the "ignorance" about the exact states of the machine does not increase as the machine operates. The concept of partition pairs is more general and is introduced to study how "ignorance spreads" or "informaton flows" through a sequential machine when it operates.

J. Hartmanis, R. E. Stearns

By studying these partition pairs in the context of commutative square diagrams, we can make them part of a larger framework that includes a theory of the transport of information. At the same time, these information flows help us to describe how these commutative square diagrams function.

In a recent paper of mine, I study these information flows in terms of the commutative diagrams they produce. It is noted, for example, that for any function $f: A \to B$ the partition pairs of $f$ form a lattice, and the properties of these lattices are described. We use these results to form a new kind of structure theory.

The topos of functions:
If we take two different commutative square diagrams, where the transport of one is the function of another, then these two diagrams can be composed. The composition of the two square diagrams occurs componentwise along their input and output functions. The result is a new square diagram. If we take this to be our composition law, the result is a category: the category $Sets^{\to}$. If we look a little deeper we find there is something very special about this category: it is a topos.

The fact that it is a topos gives us a lot. This means that $Sets^{\to}$ has all small limits and colimits, products, coproducts, epi-mono factorisations, subobject classifiers, an object of truth values, logical connectives, an internal logic, exponentation, power objects, subobject lattices, etc. Everything you can do in a topos you can do in $Sets^{\to}$. This makes $Sets^{\to}$ a broadly interesting subject of study in its own right, with some interesting properties of its own, like the failure of the law of excluded middle to hold.

The key thing for us is how $Sets^{\to}$ can help us all better understand functional programming. Its central concept, the commutative square diagram, describes how data moves around in functions. The commutative square can be described by a function, an input place, an output place, and a transport function that describes how inputs are mapped to outputs. As is true for any form of categorical logic, the dual logical theories of $Sets^{\to}$ are emergent properties of these commutative square diagrams.

Functional lenses let you focus in on some particular piece of a complex data structure. Commutative square diagrams, on the other hand, let you focus on some particular piece of a function. The result is the structure theory of functions. With this structure theory, we can now reason logically about the behaviour of functional programs.

Feedback:
How can we further improve the structure theory of functional programs? Write to me at jhuni.v2@gmail.com or leave a comment in the comment section below.

References:
[1] Hartmanis, J., & Stearns, R. E. (1966). Algebraic structure theory of sequential machines.

[2] Bernier, J. (2023). Toposic structure theory of computations.

[3] Goldblatt, R. (2014). Topoi the categorial analysis of logic. Elsevier Science.

[4] Johnstone, P. T. (2008). Sketches of an Elephant: A topos theory compendium. Oxford University Press.

[5] Lane, M. S., & Moerdijk, I. (2010). Sheaves in geometry and logic: A first introduction to topos theory. Springer.

[6] Barr, M., & Wells, C. (1985). Toposes, triples and theories. Springer-Verlag.

[7] MacLarty, C. (2005). Elementary categories, elementary toposes. Clarendon Press.

[8] Bell, J. L. (2008). Toposes and local set theories: An introduction. Dover Publications.

[9] Moerdijk, I. (2008). Classifying spaces and classifying topoi. Springer.

[10] Caramello, O. (2018). Theories, sites, toposes: Relating and studying mathematical theories through topos-theoretic 'Bridges'. Oxford University Press.

Lisp AI

Tuesday, February 14, 2023

The logic of functional programming

Functional modification

No comments:

Post a Comment