Lisp AI: January 2023

Tuesday, January 31, 2023

Embarassingly parallel decompositions of functions

A basic primitive of functional programming, the map function, is embarrassingly parallel. This means that map can be broken up into independent computations requiring no communication or synchronization between them except for a join operation at the end. It would be nice if we could describe embarrassing parallelism in the most general sense using category-theoretic universal properties.

To do this we will use the topos $Sets^{\to}$. It will be shown that products in this category describe embarrassingly parallel computations. If $X^n$ is the set of $n$-tuples of elements of $X$, then the map function with argument $f$ is equivalent to the product function $f^n$ of $f$ with itself $n$ times. We will describe how arbitrary functions in $Sets^{\to}$ can be given product decompositions, even when they do not use representations by sequences so that they can be treated as embarrassingly parallel functions.

Background on the algebraic theory of information:
The algebraic theory of information, introduced by Hartmanis and Stearns requires that we model information using partitions. In modern categorical treatments, partitions are equivalence classes of epimorphisms in the topos $Sets$. As every topos, including $Sets$, has epi-mono factorizations, every function in $Sets$ has an epi-component which is its partition or kernel.

Definition. let $f$ be a function then $f$ defines a partition called $ker(f)$ which describes equality with respect to $f$: \[ (a,b) \in ker(f) \Leftrightarrow f(a) = f(b) \] Example 1. let $\Omega : \{0,\frac{1}{2},1\} \to \{0,1\}$ be the object of truth values in $Sets^{\to}$ then its kernel is the equivalence relation defined by the partition $\{\{0\},\{\frac{1}{2},1\}\}$

Example 2. let $abs : \mathbb{R} \to \mathbb{R}$ be the absolute value function. Then $(-1,1) \in ker(abs)$ because $-1$ and $1$ have the same absolute value.

Background on products in Sets:
Definition. let $S_1,S_2,...S_n$ be a family of sets in the topos $Sets$. Then a universal product cone is a family of epimorphisms $f_i$ from an object $X$ to each $S_i$. By the kernel construction it follows that each $f_i$ has a corresponding partition $ker(f_i)$. These partitions have the following property: \[ \forall F \in \prod_{i \in I} P_i: |\bigcap_{i \in I} F_i| = 1 \] Which equivalently states that each selection of equivalence classes for each partition has a single common intersection. To demonstrate the use of this formalism we have provided a set of examples:

Example 1. {{0,1},{2,3}} and {{0,2},{1,3}} form a product decomposition because all pairs of equivalence classes between them have singular intersection:

	0,2	1,3
0,1	0	1
2,3	2	3

Example 2. {{0,1,2},{3,4,5},{6,7,8}} and {{0,3,6},{1,4,7},{2,5,8}} form a product decomposition because all pairs of equivalence classes between them have singular intersection:

	0,3,6	1,4,7	2,5,8
0,1,2	0	1	2
3,4,5	3	4	5
6,7,8	6	7	8

Example 3. the following three equivalence classes form a product decomposition because all triples of equivalence classes chosen between the three of them have singular intersection: \[ \{\{0,1,2,3\}, \{4,5,6,7\} \} \] \[ \{\{0,2,4,6\}, \{1,3,5,7\} \} \] \[ \{\{0,1,4,5\}, \{2,3,6,7\} \} \] This can be confirmed by examining all the triples of equivalence classes of these three partitions manually to determine that they have singular intersection: \[ \{0,1,2,3\} \cap \{0,2,4,6\} \cap \{0,1,4,5\} = \{0\} \] \[ \{0,1,2,3\} \cap \{0,2,4,6\} \cap \{2,3,6,7\} = \{2\} \] \[ \{0,1,2,3\} \cap \{1,3,5,7\} \cap \{0,1,4,5\} = \{1\} \] \[ \{0,1,2,3\} \cap \{1,3,5,7\} \cap \{2,3,6,7\} = \{3\} \] \[ \{4,5,6,7\} \cap \{0,2,4,6\} \cap \{0,1,4,5\} = \{4\} \] \[ \{4,5,6,7\} \cap \{0,2,4,6\} \cap \{2,3,6,7\} = \{6\} \] \[ \{4,5,6,7\} \cap \{1,3,5,7\} \cap \{0,1,4,5\} = \{5\} \] \[ \{4,5,6,7\} \cap \{1,3,5,7\} \cap \{2,3,6,7\} = \{7\} \] Each of these eight triples are families of sets in the product $\prod_{i \in I} P_i$ that have singular intersection.

Product functions:
Definition. let $f_i: A_i \to B_i $ be a family of functions then their product in $Sets^{\to}$ is the function: \[ \prod_{i \in I} f_i : \prod_{i \in I} A_i \to \prod_{i \in I} B_i \] That is defined by applying the function $f_i$ to the value at index $i$ of a sequence in $\prod_{i \in I} A_i$: \[ \prod_{i \in I} f_i(a_1,...a_n) = (f_1(a_1),...,f_n(a_n)) \] Example 1. let $f: X \to Y$ be a function then $f^n: X^n \to Y^n$ is the function that applies the higher-order map function to sequences of size $n$ using the function $f$.

Example 2. let $inc: \mathbb{R} \to \mathbb{R}$ be the function $inc(x) = x+1$ and let $double : \mathbb{R} \to \mathbb{R}$ be the function $double(x) = 2*x$. Then the function $inc \times double : \mathbb{R}^2 \to \mathbb{R}^2$ has $f(x,y) = (x+1, 2*y)$.

The critical characteristic of product functions is that they are embarrassingly parallel. Given a product function $f \times g$, the results of the functions $f$ and $g$ can be computed entirely separately from one another. Then all we need to do is combine the results of these separately computed values at the end to get a final result.

Category theory teaches us that we can treat products using universal properties instead of by reference to their common representations. The product of sets is not necessarily a set of ordered pairs but rather a universal limiting cone, and that is one of its representations. To move beyond issues of representation, we will also have to consider universal cones in $Sets^{\to}$.

Embarassingly parallel decompositions:
Proposition. let $f: A \to B$ be a function in $Sets^{\to}$. Then a decomposition of $f$ into separate information flows is a family $(P_1,Q_1), (P_2,Q_2), ... (P_n,Q_n)$ of information flows such that the source partitions $P_1, P_2, ... P_n$ form a product decomposition of $A$ and the target partitions $Q_1, Q_2, ... Q_n$ form a product decomposition of $B$.

Example 1. let $f(a,b) = (a+1,b+1)$ then ((0,0), (1,1)) form a family of flow relations where (0,1) and (0,1) are both product decompositions. These information flows produce a universal limiting cone in $Sets^{\to}$:

Example 2. let $f(a,b) = (b,a)$ be the transposition function then $((0,1),(1,0))$ forms a family of flow relation where (0,1) and (1,0) are both product decompositions. These information flows produce a universal limiting cone in $Sets^{\to}$:

We see now that to reconstruct a product function from a family of independent information flows, you only need to take the product of all its quotients. This demonstrates how the theory of information flows in the topos of functions $Sets^{\to}$ can be used to create embarassingly parallel decompositions.

References:
[1] Hartmanis, J., & Stearns, R. E. (1966). Algebraic structure theory of sequential machines.

[2] Bernier, J. (2023). Toposic structure theory of computations.

[3] Goldblatt, R. (2014). Topoi the categorial analysis of logic. Elsevier Science.

Sunday, January 29, 2023

A comparative study of structure theories

The most groundbreaking work in the computer industry was essentially done in the late 1950s to the 1960s, because there wasn't much ground to break yet. The Algebraic Structure Theory of Sequential Machines (1966) owes its origin to that era. Every single computer science treatise now owes something to that time, and most ones have broad similarity to something back then. But the similarities often end there, as examinations of any modern theory always exposes key differences.

That is also the case with the topos theory of computations. While the basic intuitive framework is the same as Hartmanis and Stearns, all technical aspects of the theory are different. In 2023, a preliminary research paper Toposic Structure Theory of Computations (2023) was released with this new and updated theory. If this is well recieved, a full textbook on the the topos of computation, containing the advanced theory, will be produced.

With these two structure theories now widely available for consideration, it is high time that a comparative study was conducted to highlight their similarities and differences. In this study, we will describe why it is so necessary that a new and updated theory should be produced. The value brought about by the application of new techniques in categorical logic and topos theory will be described and the limitations of the classical theory will be considered.

Background and motivation:
The theory of Hartmanis and Stearns was designed with engineering applications in mind. To quote their text, "The engineering motivations demands new mathematical techniques to be invented with no counterpart in the development of algebra. Thus, this theory has a characteristic flavour and mathematical identity of its own." As a consequence, they developed the idea of information flows with its engineering applications in mind.

On the other hand, the Toposic Structure Theory of Computations (2023) is a purely mathematical treatise. It was created with the idea of exploring an aspect of topos theory in mind. That the topos it studies happens to be the topos of computations is a convenient coincidence, that makes this study worthy of publication and wide spread dissemination. This text should be of interest even to pure mathematicians working in topos theory.

Scope:
The Algebraic Structure Theory of Sequential Machines (1966) studies information flows with respect to sequential machines. These are treated as transition functions $\delta: S \times I \to S$ on a set of states $S$ and a set of inputs $I$. In effect, however, they are families of functions $\delta : S \to S$ for each $i \in I$. Then a partition pair of $\delta$ for $(P,Q)$ says that for each $i \in I$ : $s =_P t \Rightarrow \delta(i,s) =_Q \delta(i,t)$.

They never consider to examine what a partition pair $(P,Q)$ be defined an a single function $f: A \to B$ would mean. Already in section 0.1 of their text, on "sets and functions" you can see that they are hopelessly lost in the old ways of set theory. The great mental liberation and significant technical advantages of modern topos theory were not considered.

That is why they define information flows always for families of functions $\delta_i$ rather then considering them individually. If they'd taken the later perspective, their theory would have had better foundations. This approach produces a theory with very limited scope and applicability. It is no wonder then why this theory was hardly ever used and history went the way it did.

By contrast, the Toposic Structure Theory of Computations (2023) is applicable to any kind of functional computation. This produces a theory with far greater and wider applicability than ever before, and it exposes all functions to the logic of information flows. This is really a paradigm shift and a transition from foundations in the topos $Sets$ to the topos $Sets^{\to}$.

Mathematical context:
In the Toposic Structure Theory of Computations (2023) we study the topos $Sets^{\to}$ and its morphisms. These are ordered pairs of functions $(i,o)$ between functions $f$ and $g$ that form commutative diagrams like so:

Then $(i,o)$ has an epi-mono factorisation like so:

There are your partition pairs $(P,Q)$ emerging right out of morphisms in the Sierpinski topos. They exist between individual functions. A morphism in the topos $Sets^{\to}$ is the embedding of the information flow of some function in another. The algebraic approach to information, it turns out, is nothing more then the logic of the topos $Sets^{\to}$.

By taking this categorical perspective we are able to create a far richer theory than the classical one of Hartmanis and Stearns. The effects of morphisms in $Sets^{\to}$ on information flows is considered categorically, leading to a number of interesting results. Functorial semantics for information flows are developed. None of this would be possible without the use of new developments in category theory.

Conclusions:
The prior work of Hartmanis and Stearns on the algebraic theory of information is inspiring. Any further work in the algebraic theory of information should mention their fundamental work and give proper credit where it is due. Nonetheless, their approach is frought with a number of limitations: limited scope, a lack of category theory, underdeveloped mathematical foundations. This is solved by creating a new theory.

References:
[1] Hartmanis, J., & Stearns, R. E. (1966). Algebraic structure theory of sequential machines. Prentice-Hall.

Saturday, January 28, 2023

Visualising information flows of finite functions

One way that I can demonstrate that information flows is through visualisation. That information flows form a lattice can be seen intuitively by realising that if the information in $P_1$ determines the information in $Q_1$ and the information in $P_2$ determines the information in $Q_2$ then the combined information $P_1P_2$ determines the information in $Q_1Q_2$. That they form a lattice also means they can be displayed as Hasse diagrams, so you can see what they look like.

All visualisations will be done with Clojure and Graphviz. In order to create set theoretic functions, we need the data of an input set, an output set, and a Clojure function or map. This ensures that we can handle functions that are not surjective. But in the easier case where they are surjective, we can just get a function using the to-function method. The elements of the information flow lattices are partition pairs, which are vectors of disjoint families of sets.

Small functions:

(SetFunction. #{0} #{0} {0 0})

(SetFunction. #{0 1} #{0} {0 0, 1 0})

(SetFunction. #{0 1} #{0 1} {0 0, 1 1})

(SetFunction. #{0 1} #{0 1} {0 0, 1 0})

(SetFunction. #{0 1 2} #{0} {0 0, 1 0, 2 0})

(SetFunction. #{0 1} #{0 1 2} {0 0, 1 1})

(SetFunction. #{0 1 2} #{0 2} {0 0, 1 0, 2 2})

(SetFunction. #{0 1 2} #{0 1 2} {0 0, 1 1, 2 2})

Intermediate sized functions:

(to-function {0 0, 1 0, 2 0, 3 0})

(to-function {0 0, 1 0, 2 2, 3 2})

(to-function {0 0, 1 0, 2 0, 3 3})

(to-function {0 0, 1 1, 2 2, 3 2})

(to-function {0 0, 1 1, 2 2, 3 3})

(to-function {0 0, 1 0, 2 0, 3 0, 4 0})

(to-function {0 0, 1 0, 2 2, 3 2, 4 2})

(to-function {0 0, 1 0, 2 0, 3 0, 4 4})

(to-function {0 0, 1 0, 2 0, 3 3, 4 4})

(to-function {0 0, 1 1, 2 1, 3 3, 4 3})

(to-function {0 0, 1 1, 2 2, 3 3, 4 3})

(to-function {0 0, 1 1, 2 2, 3 3, 4 4})

Large or infinite functions:
By generality, functions larger then these have information flow lattices even if we cannot visualize them. The algebraic theory of information ensures that we can exist. Using mathematical techniques we can describe their properties even in the infinite case.

References:
[1] Hartmanis, J., & Stearns, R. E. (1966). Algebraic structure theory of sequential machines. Prentice-Hall.

[2] Lausmaa, T. (2005). Extropy-based quantitative evaluation of finite functions. Proceedings of the Estonian Academy of Sciences. Physics. Mathematics, 54(2), 67. https://doi.org/10.3176/phys.math.2005.2.01

Friday, January 27, 2023

Paying homage to the work of Hartmanis and Stearns

A commenter alerted me to the fundamental work of Hartmanis and Stearns: Algebraic Structure Theory of Sequential Machines (1966). This came closest to capturing the fundamental ideas that I was getting at in my study of information flows. To put my work in context, I want to do something to pay homage to this prior work.

When I first thought about what to title my paper, I couldn't think of anything particularly good that would stick. So I took the good old fashioned shotgun approach and made a big title with a bit of everything: "The Topos Theory of Computing: Introduction to the Mathematics of Dataflow." Yet its clear that calling this a theory of computing doesn't get to the point, because the whole point of this thesis is an examination of structures.

By changing the title of my work to Toposic Structure Theory of Computations (2023) I get an intrinsically better title while also more effectively paying homage to the excellent prior work of Hartmanis and Stearns. This hopefully reduces any controversy around my work and puts it into an appropriate historical context. We now have two different types of structure theory in computer science:

The algebraic structure theory
The toposic structure theory

The Algebraic Structure Theory of Sequential Machines (1966) is one of those marvelous gems of science and technology that due to historical accident seems to have fallen to the way side. It is something that despite its age still has something to teach us, just like the Lisp machines. There is always something to be learned from studying the best technologies of the past.

But I am not here to revive old technologies. In my work, I developed novel new foundations for the mathematics of information flows using topos theory. It is demonstrated that the information flows of Hartmanis and Stearns are nothing more then quotients in $Sets^{\to}$. By establishing functors from a wide variety of different categories to $Sets^{\to}$ information flows are applied to a wider variety of contexts then ever before.

It is shown that morphisms in $Sets^{\to}$ preserve and reflect partition pairs. The result is a monotone Galois connection between information flow lattices of functions associated to every morphism in $Sets^{\to}$. This opens up a wide variety of different computations that would not be possible without the topos theoretic framework. All of this makes this subject an exciting and fertile ground for further exploration.

References:
Hartmanis, J., & Stearns, R. E. (1966). Algebraic structure theory of sequential machines. Prentice-Hall.

Thursday, January 26, 2023

A new way of programming with information flows

Informal discussion:
The physical world is highly parallel, with many things happening side by side. A realistic model of computation is dataflow, which can be conceptualized as a kind of motion. Much like how particles move around in physical systems, bits of information move around in computers. This motivates our development of a theory of information flows.

But what exactly are these bits of information? Consider even and odd; then both are bit-valued functions that produce ones and zeros. But even and odd produce the same bits of information. They represent the information in two different ways. If $n$ is even then we know it is not odd, and if its not even then we know it is odd. The same applies the other way around.

To represent the same bit of information as evenness or oddness, we can use a partition. Both the even and odd functions are representations of the abstract notion of parity, which can be defined as a partition with two elements. An abstract bit of information is then a partition of a set into two different classes. It does not matter rather you get the parity from even, odd, 2 * mod(n,2), or any other representation. An abstract bit of information may be realized in many different forms.

A piece of information, in general, is just a collection of bits of information. We can call these partitions, but the important point is that they refer to a place within the information of a larger structure. A piece of information is larger than another one if it has more bits than it. Partitions abstract functions so that every function can now be treated as a representation of a piece of information.

We now know how to think abstractly about places within an information system, but we still need to understand how information moves around. Let $f: A \to B$ be a function; then we can describe a flow relation on $f$ by a pair (P,Q) which proclaims that the function moves the information in the place $P$ to the place $Q$. We now have a fundamental notion of information flows.

As an example, return back to our notion of parity. If we take a function like $x+1$ then we see that $x+1$ maps the parity information in its input to the parity information in its output. So (parity, parity) is a datflow relation on $x+1$. If we take an even number and add one to it, then we get an odd number and if we take an odd number and add one to it we get an even number. So $x+1$ flips parity bits around.

On the other hand, $x^2$ leaves the parity bit as it is: the square of an even number is even, and the square of an odd number is odd. Two other examples are $2x$, which always produces a number of even parity, and $2x+1$, which always produces one that is odd. Other functions like $\Omega$ might not map parities back to parities at all. These functions might take different numbers of the same parity and map them to numbers of different parities.

It would be nice if we could take a place in an input structure, like the parity of a number, and then find out exactly what place in the output it flows into. By doing this, we will depart from the traditional view of set theory which is that functions values in sets and instead make them take values in partitions.

In fact, we can do this if we take the notion of a partition image $f(P)$. This produces the set of information we know from $f$ when it is given the information in $P$. So the partition image directly codifies how a function moves information from places to places, and it lets you compute how a function moves bits from one location back to another. The dual concept is the partition inverse image $f^{-1}(Q)$, which is the collection of all bits of information that map into $Q$.

Partition images are the basis of our new way of programming with dataflow relations. A whole new parallel universe of functions that take operate on partitions is created alongside those that take values in sets. This is a new paradigm of computing in information systems which we call functional dataflow.

We provide a system of support for smart functions: special data types of functions that know things about their own data flow characteristics. These are combined with special partition data types, so that smart functions can have computable partition images. They can take a place in a system and return the place it maps to. This makes information flows accessible to the user.

As a basic example, consider a transposition permutation on pairs. Then this function swaps the order in an ordered pair $f(x,y) = (y,x)$. We might make this into a smart function that records its own flow information so that it records that the information in the first index is moved to the second index, and the information in the second is moved to the first. This flow information might be made computable using partition images, so that if you $f$ what it maps the first index to it will return the second index.

If bits of information are the simplest units of storage, then this suggests that bit-flow relations $(P,Q)$ between bits of information $P$ and $Q$ are the simplest forms of dataflow. We already saw an example of a bit flow relation with the (parity, parity) relations in arithmetic functions. These bit-flow relations are the atomic building blocks for all other dataflow relations.

Information flows can now be seen to form an ordering by scale. There can be large events with millions of bits moving around to ones with only one or two. By the same token, an event in a physical system might have anything from the motion of an astronomical body on the higher end to the motions of subatomic particles on the smallest scales. In both cases, we have an intuitive notion of scale. We now have a way of capturing information flows from the smallest scales to the largest.

For technical experts:
For aficionados of functors, categories, adjunctions, toposes, lattices, lenses, theorems and proofs: the topos structure theory of computing

Sunday, January 22, 2023

Toposic structure theory of computations FAQ

As I present this new thesis on the role of topos theory in computing to the world, I have anticipated a number of possible questions and answers ahead of time. I will do my best to answer any questions you may have.

What prior work is closest to yours?
With the help of a commentor on hacker news, I have determined that the Algebraic Structure Theory of Sequential Machines (1966) comes closest my work. Most impressively, they came close to developing the idea of a dataflow relation without using Sierpinski topos theory. Instead of an ordinary algebraic structure theory, I provide an updated theory based instead upon topoi.

What other prior works do you build upon the most?
The first and foremost is probably Robert Goldblatt's Topoi: The Categorial Analysis of Logic. It is with this text that I first discovered the Sierpinski topos which opened up the way for my own further studies in the subject. A second reference text is Sketches of an Elephant – A Topos Theory Compendium. Aside from these a variety of lattice theory textbooks and papers provided extra tidbits of information.

What is the practical basis of applied topos theory?
I believe the answer to this is the principle of locality. The physical universe of which we are all a part is organized around this principle which states that an object is influenced directly only by its immediate surroundings. This creates an organizing principle of relevance to all fields of engineering. In classical physics this motivates the modeling of physical systems based upon their local behaviour. In computing, this motivates our theories of local computation and dataflow.

What use is a topos theory of computing?
The topos theory of computing can be used to model information loss and information flow. The first is relevant because the second law of thermodynamics states that the entropy of a logical system cannot decrease - which implies that information loss leads to heat dissipitation. The second is relevant because the principal of locality necessitates that computer systems should be modeled in terms of information flows and local effects.

What is this topos theory of computing?
I would like to explain it by analogy to physics. In physics we study the movement of particles from place to place. In this theory of computing, we instead study the movement of packets of bits from place to place. In order to describe these information flows I have presented a new formalism based upon Sierpinski topos theory.

Toposic structure theory of computations

Abstract:
The theory of data-flow analysis of computer programs has been extensively studied. The increasing need for dataflow analysis in the automatic parallelization of computer programs motivates the development of mathematical foundations for this field. We present a new approach to program dataflow analysis that incorporates partition lattices and the Sierpinski topos.

Link:
The topos theory of computing: introduction to the mathematics of dataflow