Monday, February 18, 2019

Parse trees as set systems

The task of parsing a data structure into its constituent components usually involves an ordered sequence of characters or bytes. When classifying subsets of an ordered sequence like a byte stream or a string, the most important thing about the sequence is its betweenness relation, as we can limit our parent set to the set of intervals of the sequence. This is what instaparse does, as it automatically gives you the intervals of characters in the parse tree of some string subject to some context-free grammar. To demonstrate this I will consider the simplest possible program.
(defn hi [args]
    (prn "hello world"))
A set system derived from this program is shown below. The set system contains all the sets of characters that locate the different well formed parts of the tree structure as represented as string. The parse tree can be broken down into two components : minimal members and containers for minimal members. In Lisp the minimal members are atoms like the symbols and strings and the collections are lists. In this way, Lisp directly represents the parse tree structure in the language itself.



As a set system the parse tree is a laminar family. As laminar families emerge in the study of parsing, they are a special class of families that should be studied further. It is also nullfree because it doesn't contain the empty set as a member.

Definition. a family of sets $F$ is called a laminar family if for every two elements $S_1, S_2$ either $S_1 \subseteq S_2$, $S_2 \subseteq S_1$, or $S_1 \cup S_2 = \emptyset$. In other words, every pair of elements is either order comparable or disjoint.

The class of laminar families is a set of sets of sets of rank two, which makes it a clique complex whose relation is defined by the union of the order comparability and disjointness. Every nullfree laminar family forms an upper forest, and connected nullfree laminar families form upper trees. So laminar families provide a general means of describing tree decompositions of structures.

No comments:

Post a Comment