docs/Writerside/topics/tree.h.md

Wed, 31 Dec 2025 13:39:55 +0100

author
Mike Becker <universe@uap-core.de>
date
Wed, 31 Dec 2025 13:39:55 +0100
changeset 1689
a5b7cf49dea7
parent 1681
56e76fbac167
child 1690
7d41291b3095
permissions
-rw-r--r--

fix tree node destruction + reactivate all tests

# Tree

UCX provides several low-level functions for working with arbitrary tree structures,
as well as some high-level functions for trees that induce a certain order on the data they store. 

The following convenience macro allows you to declare and use simple tree structures.

```C
#define CX_TREE_NODE(Type)
```

It declares four pointers of type `Type*`:
`parent`, `children`, `last_child`, `prev`, and `next`.
It can be placed anywhere in your node structure, but it is recommended
to use them as first members to avoid padding issues.

You can use it, for example, like this:
```C
typedef struct my_tree_node_s {
    CX_TREE_NODE(struct my_tree_node_s);
    int value;
    char *desc;
} MyTreeNode;
```

The macro `cx_tree_node_layout(name)` expands to the offsets of the above-mentioned pointers.
It will become handy when calling the low-level tree functions which expect all offsets to be passed as arguments. 

> In all functions, the `last_child` and `prev` pointers are completely optional.
> If your tree structure does not contain the `last_child` pointer, the last child will be determined by
> traversing all children from the first child.
> The same happens when it does not have a `prev` pointer, and the left sibling of a child is needed.
> The `children` pointer (which points to the first child), and the `next` pointer are mandatory,
> and so is the `parent` pointer.

## Create

```C
#include <cx/tree.h>

CxTree *cxTreeCreate(const CxAllocator *allocator,
        size_t node_size, size_t elem_size,
        void *root, ptrdiff_t loc_data,
        ptrdiff_t loc_parent, ptrdiff_t loc_children,
        ptrdiff_t loc_last_child,
        ptrdiff_t loc_prev, ptrdiff_t loc_next);
```

The function `cxTreeCreate()` creates a `CxTree` structure with the specified layout.
You can use the `cx_tree_node_layout()` macro to fill in the locations starting with `loc_parent`.

> The `node_size` is used by `cxTreeCreateRootData()` and `cxTreeAddData()` to allocate memory for new tree nodes.
> The `elem_size` is used to copy the data into the new node.
> See [below](#add-nodes) for details.

The `loc_data` location is optional.
If specified, the functions `cxTreeCreateRootData()`, `cxTreeAddData()`, `cxTreeFind()`, and `cxTreeFindInSubtree()`
will resolve the location to copy or compare the data.
A location of zero means that a pointer to the entire node is used
(in that case the `elem_size` MUST be a positive number and SHOULD equal the `node_size`).

If your tree structure does not have a `last_child` or a `prev` pointer,
you may specify a negative location to indicate the missing pointer.
All other pointers are mandatory and a non-negative location is expected.

The `root` node is optional.
If not specified, the tree is initialized empty and `cxFree` is registered as advanced destructor.
If a `root` is specified, the initial tree size is calculated and stored in the collection's metadata.
Also, no destructors are specified and the nodes will not be deallocated when destroying the tree to avoid
use-after-free bugs when the `root` is still in use elsewhere.

### Example for wrapping a libxml2 tree

In this example we wrap the XML tree of the commonly used libxml2 library.

```C
#include <libxml/tree.h>
#include <cx/tree.h>

CxTree *wrap_xml_tree(xmlNode *root) {
    return cxTreeCreate(
        cxDefaultAllocator, // or you can just write NULL
        sizeof(xmlNode), sizeof(xmlNode), root, 0,
        offsetof(xmlNode, parent),
        offsetof(xmlNode, children),
        offsetof(xmlNode, last),
        offsetof(xmlNode, prev),
        offsetof(xmlNode, next)
    );
}
```

You can, for example, print the XML structure with the following code:

```C
// continued from above

void print_xml_structure(CxTree *tree) {
    // specify visit_on_exit argument = true
    // so that we are visiting each node twice: on enter and on exit
    CxTreeIterator iter = cxTreeIterate(tree, true);
    
    cx_foreach(xmlNode*, node, iter) {
        if (node->type == XML_ELEMENT_NODE) {
            // the exiting field from the iterator indicates
            // whether we are entring or leaving the subtree
            // of the current element 
            printf("%s - %s\n",
                node->name,
                iter.exiting ? "start" : "end"
            );
        }
    }
    cxTreeIteratorDispose(&iter);
}
```

## Add Nodes

```C
#include <cx/tree.h>

void cx_tree_add(void *parent, void *node, ptrdiff_t loc_parent,
        ptrdiff_t loc_children, ptrdiff_t loc_last_child,
        ptrdiff_t loc_prev, ptrdiff_t loc_next);

void *cxTreeAddData(CxTree *tree, void *parent, const void *data);

void cxTreeAddNode(CxTree *tree, void *parent, void *child);

void cxTreeSetParent(CxTree *tree, void *parent, void *child);
```

The low-level function `cx_tree_add()` adds a `node` to a new `parent`, unlinking it from its previous parent, if required.

The functions `cxTreeAddData()`, `cxTreeAddNode()`, and `cxTreeSetParent()` are similar to `cx_tree_add()`.
The difference between `cxTreeAddData()` and `cxTreeAddNode()` is,
that `cxTreeAddData()` uses the allocator of the `CxTree` to create the node first, before adding it.
The function returns `NULL` in case the creation of the node fails, or the created node when creation was successful.
The difference between `cxTreeSetParent()` and `cxTreeAddNode()` is,
that `cxTreeSetParent()` does not increase the tree size, when `child` already had a parent.

> Use `cxTreeAddNode()` to add _new_ nodes to the tree and `cxTreeSetParent()`
> to relink a subtree of the `tree` with a new parent.
> 
> Using `cxTreeSetParent()` on a `child` which already has a parent in a _different_ tree is a mistake.


## Size and Depth

```C
#include <cx/tree.h>

size_t cx_tree_size(void *root,
        ptrdiff_t loc_children, ptrdiff_t loc_next);

size_t cx_tree_depth(void *root,
        ptrdiff_t loc_children, ptrdiff_t loc_next);

size_t cxTreeSubtreeSize(CxTree *tree, void *subtree_root);

size_t cxTreeSubtreeDepth(CxTree *tree, void *subtree_root);

size_t cxTreeSize(CxTree *tree);

size_t cxTreeDepth(CxTree *tree);
```

The function `cxTreeSubtreeSize()` counts all nodes belonging to the subtree starting with `subtree_root`.

The function `cxTreeSubtreeDepth()` reports the maximum depth of the subtree starting with `subtree_root`.
If the `subtree_root` does not have any children, this results in a depth of one.

The functions `cxTreeSize()` and `cxTreeDepth()` are equivalent to `cxTreeSubtreeSize()` and `cxTreeSubtreeDepth()`, respectively, where `subtree_root` is the root node of the entire tree.

The functions `cx_tree_size()` and `cx_tree_depth()` are the low-level counterparts.

> Passing a `NULL` pointer as `root` or `subtree_root` makes those functions return zero.

> Since the size of a `CxTree` is stored in the [collection](collection.h.md) metadata, `cxTreeSize()` is actually a constant time operation.
> This is not the case for `cxTreeSubtreeSize()` or `cx_tree_size()` where the size has to be calculated by traversing the tree.

## Search

```C
#include <cx/tree.h>

#define CX_TREE_SEARCH_INFINITE_DEPTH 0

typedef int (*cx_tree_search_func)(
        const void *node, const void *data);

int cx_tree_search(const void *root, size_t max_depth,
        const void *data, cx_tree_search_func sfunc, void **result,
        bool use_dfs, ptrdiff_t loc_children, ptrdiff_t loc_next);

void *cxTreeFindFast(CxTree *tree, const void *data,
        cx_tree_search_func sfunc);
        
void *cxTreeFind(CxTree *tree, const void *data, bool use_dfs);

void *cxTreeFindInSubtree(CxTree *tree, const void *data,
        void *subtree_root, size_t max_depth, bool use_dfs);
```

The function `cx_tree_search()` efficiently finds a node in the tree.

The search proceeds as follows, starting with the `root` node:

1. The current node is compared with `data` using the `sfunc`.
2. If `sfunc` returns zero, a matching node has been found and the pointer to that node is stored at the location given by `result`.
3. If `sfunc` returns a negative number, the entire subtree is skipped.
4. If `sfunc` returns a positive number, the subtree is searched, unless the number is larger than the number of an already searched subtree. The best match will be stored at the location given by `result`. 
5. The return value will be the return value of the `sfunc` for node that was stored in the `result`, or a negative number when no match was found

> If the `sfunc` implements some kind of metric, the search functions will return the best match in the tree with respect to that metric.
> When a positive number is returned, it means that a new node with the searched data could be added as a child of the found node.

The function `cxTreeFindFast()` uses the `sfunc` to find the `data` in the tree, and returns a pointer to the node when the data was found (or `NULL`, otherwise).

The function `cxTreeFind()` instead traverses the entire tree (either with breadth-first search or depth-first search)
and compares the `data` with the collections [comparator function](collection.h.md#comparator-functions).
For this it is necessary that the `loc_data` was specified when creating the tree, or the function will immediately return `NULL`.

The functions `cxTreeFindFastInSubtree()` and `cxTreeFindInSubtree()` are similar, except that they restrict the search to nodes
in the subtree starting with (and including) `subtree_root`, and skipping all nodes below the `max_depth`.
Note, that the `max_depth` is specified in relation to the `subtree_root` and not in relation to the entire tree.

## Iterator and Visitor

```C
#include <cx/tree.h>

CxTreeIterator cx_tree_iterator(void *root, bool visit_on_exit,
        ptrdiff_t loc_children, ptrdiff_t loc_next);

CxTreeIterator cxTreeIterate(CxTree *tree, bool visit_on_exit);

CxTreeIterator cxTreeIterateSubtree(CxTree *tree,
        void *node, bool visit_on_exit);

CxTreeIterator cx_tree_visitor(void *root,
        ptrdiff_t loc_children, ptrdiff_t loc_next);

CxTreeIterator cxTreeVisit(CxTree *tree);

CxTreeIterator cxTreeVisitSubtree(CxTree *tree, void *node)

#define cxTreeIteratorContinue(iter)

void cxTreeIteratorDispose(CxTreeIterator *iter);
```

There are two different kinds of iterators for trees: one for depth-first iteration and one for breadth-first iteration.

An iterator created with the _iterate_ functions is performing a depth-first iteration with the capability of visiting a node twice:
first when the iterator enters the node coming from its parent, and secondly when the iterator tracks back from its last child.
This behavior is controlled via the `visit_on_exit` argument.
When set to `true`, the iterators `exiting` flag can be checked during iteration to see whether the iterator is currently entering or leaving the node. 
The above [example](#example-for-wrapping-a-libxml2-tree) for iterating through an XML tree illustrates this.

On the other hand, an iterator created with the _visit_ functions performs a breadth-first iteration and visits every node only once.

Both iterators do not support the removal of nodes.

Since tree iteration needs to keep track of a stack (depth-first) or queue (breadth-first), internal memory is allocated.
This memory is _automatically_ disposed of when the iteration _completes_.
If you break from the iteration early, you must call `cxTreeIteratorDispose()` to deallocate the memory.

> It is recommended to always invoke the dispose functions, even when it seems not necessary.
> In the best case it just does nothing, but calling them guarantees that no memory can be leaking, even when your code will change in the future.
>{style="note"}

The macro `cxTreeIteratorContinue()` instruct the iterator to skip the subtree below the currently inspected node.

> In contrast to iterators over [lists](list.h.md#iterators) or [maps](map.h.md#iterators),
> a tree iterator is _always_ iterating over the nodes.
> That means, the pointer in a `cx_foreach` loop is always expected to be a pointer to a node and not a pointer to
> the element / payload within the node.
> 
>{style="warning"}

## Remove

```C
#include <cx/tree.h>

typedef void (*cx_tree_relink_func)(
        void *node, const void *old_parent, const void *new_parent);

void cx_tree_remove(void *node, ptrdiff_t loc_parent,
        ptrdiff_t loc_children, ptrdiff_t loc_last_child,
        ptrdiff_t loc_prev, ptrdiff_t loc_next);

int cxTreeRemoveNode(CxTree *tree, void *node,
        cx_tree_relink_func relink_func);

void cxTreeRemoveSubtree(CxTree *tree, void *node);
```

The low-level function `cx_tree_remove()` removes the specified `node`,
as well as the entire subtree beneath that node, from its parent.
The high-level counterpart is `cxTreeRemoveSubtree()`.

The function `cxTreeRemoveNode()`, on the other hand, _only_ removes the `node` from the tree,
links all children of `node` to the former parent of `node`, 
and calls the optional `relink_func` for every former child of `node` which will be relinked to a new parent.
Therefore, calling `cxTreeRemoveNode()` on the `root` node is an error and the function returns non-zero.
In all other cases, the function returns zero.

> When your tree is storing a scene graph, for example, a possible use-case for the `relink_func` might be updating
> the world transform matrices in the subtree of the removed node. 

## Dispose

```C
#include <cx/tree.h>

typedef void (*cx_tree_relink_func)(
        void *node, const void *old_parent, const void *new_parent);

int cxTreeDestroyNode(CxTree *tree, void *node,
        cx_tree_relink_func relink_func);

void cxTreeDestroySubtree(CxTree *tree, void *node);

void cxTreeClear(CxTree *tree);

void cxTreeFree(CxTree *tree);
```

The function `cxTreeDestroyNode()` first [removes](#remove) the `node` from the tree, like `cxTreeRemoveNode()`,
and then invokes the [destructor functions](collection.h.md#destructor-functions).
It is guaranteed that the simple destructor is called before the advanced destructor.
If no destructor function is registered at all, the behavior is identical to `cxTreeRemoveNode()`.
That means this function does not deallocate the memory for the node on its own and leaves that entirely to the destructor functions.

The function `cxTreeDestroySubtree()` performs the above actions for the entire subtree starting with `node`.
The order in which the destructor functions for the nodes of the subtree are called is determined by a [tree iterator](#iterator-and-visitor).

The function `cxTreeClear()` is a shorthand for invoking `cxTreeDestroySubtree()` on the root node of the tree.

The function `cxTreeFree()` behaves like `cxTreeClear()` and then deallocates the memory for the `CxTree` structure.

> In contrast to other collections, the destructor functions of a `CxTree` are _always_ invoked with pointers to the _nodes_
> and not to pointers to the elements / the payload of the nodes.
> 
> Also, the nodes are _not_ deallocated by the tree and _MUST_ be deallocated by the destructor functions.
> When the tree was created by `cxTreeCreate()` without specifying an existing `root` node, `cxFree` is automatically used as the destructor function.
> {style="note"}

<seealso>
<category ref="apidoc">
<a href="https://ucx.sourceforge.io/api/tree_8h.html">tree.h</a>
</category>
</seealso>

mercurial