Citation Link: https://doi.org/10.25819/ubsi/10429
Combinatorial and information-theoretic aspects of tree compression
Alternate Title
Kombinatorische und informationstheoretische Aspekte der Baumkompression
Source Type
Doctoral Thesis
Author
Institute
Issue Date
2023
Abstract
We analyze lossless tree compression algorithms under information-theoretic and combinatorial aspects.
One of the most important and widely used compression methods for rooted trees is to represent a tree by its minimal directed acyclic graph, shortly referred to as minimal DAG. The size of the minimal DAG of the tree is the number of distinct fringe subtrees occurring in the tree, where a fringe subtree of a rooted tree is a subtree induced by one of the nodes and all its descendants.
In the first part of this work, we study the average number of distinct fringe subtrees (i.e., the average size of the minimal DAG) in random trees. Specifically, we consider the random tree models of leaf-centric binary tree sources, simply generated families of trees and very simple families of increasing trees.
In the second part of this work, we analyze grammar-based tree compression via tree straight-line programs (TSLPs) from an information-theoretic point of view. Specifically, we extend the notion of empirical entropy from stings to node-labeled binary trees and plane trees and show that a suitable binary encoding of TSLPs yields binary tree encodings of size bounded by the empirical entropy plus some lower order terms. This generalizes recent results from grammar-based string compression to grammar-based tree compression.
In the third part of this work, we present a new compressed encoding of unlabeled binary and plane trees. We analyze this encoding under an information-theoretic point of view by proving that this encoding is universal und thus asymptotically optimal for a great variety of tree sources; this covers in particular the vast majority of tree sources, with respect to which previous tree sources codes were shown to be universal.
One of the most important and widely used compression methods for rooted trees is to represent a tree by its minimal directed acyclic graph, shortly referred to as minimal DAG. The size of the minimal DAG of the tree is the number of distinct fringe subtrees occurring in the tree, where a fringe subtree of a rooted tree is a subtree induced by one of the nodes and all its descendants.
In the first part of this work, we study the average number of distinct fringe subtrees (i.e., the average size of the minimal DAG) in random trees. Specifically, we consider the random tree models of leaf-centric binary tree sources, simply generated families of trees and very simple families of increasing trees.
In the second part of this work, we analyze grammar-based tree compression via tree straight-line programs (TSLPs) from an information-theoretic point of view. Specifically, we extend the notion of empirical entropy from stings to node-labeled binary trees and plane trees and show that a suitable binary encoding of TSLPs yields binary tree encodings of size bounded by the empirical entropy plus some lower order terms. This generalizes recent results from grammar-based string compression to grammar-based tree compression.
In the third part of this work, we present a new compressed encoding of unlabeled binary and plane trees. We analyze this encoding under an information-theoretic point of view by proving that this encoding is universal und thus asymptotically optimal for a great variety of tree sources; this covers in particular the vast majority of tree sources, with respect to which previous tree sources codes were shown to be universal.
File(s)![Thumbnail Image]()
Loading...
Name
Dissertation_Seelbach_Benkner_Louisa.pdf
Size
1.67 MB
Format
Adobe PDF
Checksum
(MD5):1bb1a69faaba2172ab3f2a037abe10b2
Owning collection