Tuesday, October 30, 2018

Prefix sublist ordering

Given a set of sequences or lists, we can partially order them by rather or not one list can be embedded in another from the start. This produces a partial ordering of the set of sequences or lists we are being given.

This partial ordering is necessarily going to be a lower forest for any given relation, but if it as a prefix sublist closed relation then it is going to be a lower tree like the one shown above. One example where this prefix sublist ordering may be used is to order the sequences of indices or keys of some data structure. If that data structure is a nested list, then its sequences of integer indices can also be ordered by the lexicographic ordering. The following routine demonstrates the lexicographic ordering. Its relatively easy to implement, the main thing is to order the check the order of the first indices of the sequences and then recursively check the rest of the indices.
(defn lexicographic-successor?
  [a b]

  (cond
    (empty? a) true
    (empty? b) false
    :else (and
           (<= (first a) (first b))
           (lexicographic-successor? (rest a) (rest b)))))

This lexicographic ordering of the sequences of indices which are themselves ordered as a tree by the prefix sublist ordering produces an ordered tree. Such ordered trees are counted by the Catalan numbers $C_n$. As they are ordered trees they correspond to pure lists. The build-pure-list routine can produce a pure nested list from a such a set of indices and the get-indices routine can be used to convert any list rather it is pure or not into a set of indices.
(defn get-indices
  [coll]

  (if (empty? coll)
    #{(list)}
    (set
     (cons
      (list)
      (mapcat
       (fn [i]
         (let [v (nth coll i)]
           (if (and (seq? v) (not (empty? v)))
             (map (partial cons i) (get-indices v))
             (list (list i)))))
       (range (count coll)))))))

(defn build-pure-list
  [indices]

  (let [current-coll (sort
                      <
                      (apply
                       union
                       (map
                        set
                        (filter
                         (fn [i] (= (count i) 1))
                         indices))))]
    (map
     (fn [i]
       (build-pure-list
        (for [index indices
              :when (= (first index) i)]
          (rest index))))
     current-coll)))

It was already stated that the prefix sublist ordering is a lower tree. This means that is a meet semilattice (it is only missing a maximal element). As a result, we can compute the meet of two different elements using the longest-common-prefix function.
(defn longest-common-prefix
  [a b]

  (if (or (empty? a)
          (empty? b)
          (not= (first a) (first b)))
    '()
    (cons (first a) (longest-common-prefix (rest a) (rest b)))))

Making use of this longest-common-prefix routine, we can compute the distance between two elements in the tree. The distance is the sum of the distances to their longest common prefix, which being a prefix is only equal to the difference of their sizes. Then making use of the lexicographic ordering provided by lexicographic-predecessor as well as the
(defn indices-distance
  [a b]

  (let [ancestor (longest-common-prefix a b)]
    (+ (Math/abs (- (count ancestor) (count a)))
       (Math/abs (- (count ancestor) (count b))))))

(defn indices-distances
  [coll]

  (let [sorted-coll (sort lexicographic-successor? coll)]
    (map
     (fn [i]
       (indices-distance
        (nth sorted-coll i)
        (nth sorted-coll (dec i))))
     (range 1 (count sorted-coll)))))

This produces a sequence of positive integers corresponding to the distances of the tree. These sequences of distances fully determine the structure of the ordered tree. Given a monotone sequence, we can compute a sequence of distances from that monotone sequence sequence by taking the differences between elements of the sequence and adding one to make them positive. Further, given a semiorder we can compute such a monotone sequence from the cardinality of the sets of predecessors of each of its elements which fully determines the semiorder. That monotone sequence can then be converted into a sequence of distances and then into an ordered tree. This demonstrates the correspondence between these two types of Catalan structures, the ordered trees and semiorders.

No comments:

Post a Comment