docs/Writerside/topics/tree.h.md

changeset 1681
56e76fbac167
parent 1605
55b13f583356
--- a/docs/Writerside/topics/tree.h.md	Tue Dec 30 13:50:55 2025 +0100
+++ b/docs/Writerside/topics/tree.h.md	Tue Dec 30 21:44:23 2025 +0100
@@ -3,28 +3,27 @@
 UCX provides several low-level functions for working with arbitrary tree structures,
 as well as some high-level functions for trees that induce a certain order on the data they store. 
 
-The following convenience macros allow you to declare and use simple tree structures.
+The following convenience macro allows you to declare and use simple tree structures.
 
 ```C
-#define CX_TREE_NODE_BASE(Type)
-
-#define cx_tree_node_base_layout
+#define CX_TREE_NODE(Type)
 ```
 
-The `CX_TREE_NODE_BASE` macro declares four pointers of type `Type*`:
-`parent`, `children`, `last_child`, `prev`, and `next`,
-which must be placed as first member into your struct.
+It declares four pointers of type `Type*`:
+`parent`, `children`, `last_child`, `prev`, and `next`.
+It can be placed anywhere in your node structure, but it is recommended
+to use them as first members to avoid padding issues.
 
 You can use it, for example, like this:
 ```C
 typedef struct my_tree_node_s {
-    CX_TREE_NODE_BASE(struct my_tree_node_s);
+    CX_TREE_NODE(struct my_tree_node_s);
     int value;
     char *desc;
 } MyTreeNode;
 ```
 
-The macro `cx_tree_node_base_layout` expands to the offsets of the above-mentioned pointers.
+The macro `cx_tree_node_layout(name)` expands to the offsets of the above-mentioned pointers.
 It will become handy when calling the low-level tree functions which expect all offsets to be passed as arguments. 
 
 > In all functions, the `last_child` and `prev` pointers are completely optional.
@@ -34,81 +33,41 @@
 > The `children` pointer (which points to the first child), and the `next` pointer are mandatory,
 > and so is the `parent` pointer.
 
-Before diving into the function definitions, there are four function pointer declarations you should know.
-
-```C
-#include <cx/tree.h>
-
-typedef void *(*cx_tree_node_create_func)(
-        const void *data, void *context);
-
-typedef int (*cx_tree_search_data_func)(
-        const void *node, const void *data);
-
-typedef int (*cx_tree_search_func)(
-        const void *node, const void *new_node);
-
-typedef void (*cx_tree_relink_func)(
-        void *node, const void *old_parent, const void *new_parent);
-```
-
-A `cx_tree_node_create_func` takes a pointer to the `data` the new node shall contain, and a `context` (which might be an allocator, for example).
-It shall allocate and return the created node, or `NULL` when node creation is not possible.
-
-A `cx_tree_search_data_func` shall check if the `node` contains the `data`, or if a child node might contain the data.
-It shall return zero, when `node` contains the `data`.
-When a child node _might_ contain the data, the returned positive number shall indicate how close the match is.
-It is _not_ relevant if the data can actually be found in the subtree - the positive number also indicates that the data could be inserted in that subtree.
-Only when the entire subtree starting with `node` cannot contain the `data`, a negative number is supposed to be returned.
-
-A `cx_tree_search_func` behaves exactly the same, except that it does expect a pointer to the `data` but a pointer to a node structure.
-In combination with the `cx_tree_node_create_func`, when inserting nodes with the high-level functions, a `new_node` is created first,
-and then an appropriate insertion point is searched with the `cx_tree_search_func`.
-
-A `cx_tree_relink_func` is called when an intermediate node was removed from the tree and it's children need to be detached from
-the removed `old_parent` and attached to a `new_parent`.
-The function is called for every child of the removed node and can be used, for example, to update the contents of the node when needed.
-
 ## Create
 
 ```C
 #include <cx/tree.h>
 
 CxTree *cxTreeCreate(const CxAllocator *allocator,
-        cx_tree_node_create_func create_func,
-        cx_tree_search_func search_func,
-        cx_tree_search_data_func search_data_func,
-        ptrdiff_t loc_parent,
-        ptrdiff_t loc_children, ptrdiff_t loc_last_child,
-        ptrdiff_t loc_prev, ptrdiff_t loc_next);
-
-CxTree *cxTreeCreateSimple(const CxAllocator *allocator,
-        cx_tree_node_create_func create_func,
-        cx_tree_search_func search_func,
-        cx_tree_search_data_func search_data_func);
-
-CxTree *cxTreeCreateWrapped(const CxAllocator *allocator,
-        void *root,
-        ptrdiff_t loc_parent,
-        ptrdiff_t loc_children, ptrdiff_t loc_last_child,
+        size_t node_size, size_t elem_size,
+        void *root, ptrdiff_t loc_data,
+        ptrdiff_t loc_parent, ptrdiff_t loc_children,
+        ptrdiff_t loc_last_child,
         ptrdiff_t loc_prev, ptrdiff_t loc_next);
 ```
 
-The function `cxTreeCreate()` creates a `CxTree` structure
-where each node created by the `create_func` has the layout specified by the offset arguments.
-The function `cxTreeCreateSimple()` is exactly the same, except that it assumes the `cx_tree_node_base_layout`.
+The function `cxTreeCreate()` creates a `CxTree` structure with the specified layout.
+You can use the `cx_tree_node_layout()` macro to fill in the locations starting with `loc_parent`.
+
+> The `node_size` is used by `cxTreeCreateRootData()` and `cxTreeAddData()` to allocate memory for new tree nodes.
+> The `elem_size` is used to copy the data into the new node.
+> See [below](#add-nodes) for details.
 
-In both cases the tree will be created empty, that is with no nodes, not even a root node.
-On the other hand, `cxTreeCreateWrapped()` creates a `CxTree` structure with the specified layout and `root` node
-where `root` may already be the root of a larger tree.
+The `loc_data` location is optional.
+If specified, the functions `cxTreeCreateRootData()`, `cxTreeAddData()`, `cxTreeFind()`, and `cxTreeFindInSubtree()`
+will resolve the location to copy or compare the data.
+A location of zero means that a pointer to the entire node is used
+(in that case the `elem_size` MUST be a positive number and SHOULD equal the `node_size`).
 
-If your wrapped tree structure does not have a `last_child` or a `prev` pointer,
+If your tree structure does not have a `last_child` or a `prev` pointer,
 you may specify a negative location to indicate the missing pointer.
 All other pointers are mandatory and a non-negative location is expected.
 
-Note that a wrapped tree by default has no create or search functions assigned.
-Therefore, if you wish to use one of the functions below that needs those function pointers set,
-you will have to set them manually by assigning to the respective fields in the `CxTree` structure.
+The `root` node is optional.
+If not specified, the tree is initialized empty and `cxFree` is registered as advanced destructor.
+If a `root` is specified, the initial tree size is calculated and stored in the collection's metadata.
+Also, no destructors are specified and the nodes will not be deallocated when destroying the tree to avoid
+use-after-free bugs when the `root` is still in use elsewhere.
 
 ### Example for wrapping a libxml2 tree
 
@@ -119,9 +78,9 @@
 #include <cx/tree.h>
 
 CxTree *wrap_xml_tree(xmlNode *root) {
-    return cxTreeCreateWrapped(
+    return cxTreeCreate(
         cxDefaultAllocator, // or you can just write NULL
-        root,
+        sizeof(xmlNode), sizeof(xmlNode), root, 0,
         offsetof(xmlNode, parent),
         offsetof(xmlNode, children),
         offsetof(xmlNode, last),
@@ -131,9 +90,6 @@
 }
 ```
 
-You do not need to specify any function pointers or destructors in the `CxTree` structure,
-if you just want to use the resulting `CxTree` for [iterating](#iterator-and-visitor) over the nodes.
-
 You can, for example, print the XML structure with the following code:
 
 ```C
@@ -164,95 +120,43 @@
 ```C
 #include <cx/tree.h>
 
-void cx_tree_link(void *parent, void *node, ptrdiff_t loc_parent,
+void cx_tree_add(void *parent, void *node, ptrdiff_t loc_parent,
         ptrdiff_t loc_children, ptrdiff_t loc_last_child,
         ptrdiff_t loc_prev, ptrdiff_t loc_next);
 
-extern unsigned int cx_tree_add_look_around_depth;
+void *cxTreeAddData(CxTree *tree, void *parent, const void *data);
 
-size_t cx_tree_add_iter(struct cx_iterator_base_s *iter, size_t n,
-        cx_tree_search_func sfunc, cx_tree_node_create_func cfunc,
-        void *cdata, void **failed, void *root,
-        ptrdiff_t loc_parent,
-        ptrdiff_t loc_children, ptrdiff_t loc_last_child,
-        ptrdiff_t loc_prev, ptrdiff_t loc_next);
-        
-size_t cx_tree_add_array(const void *src, size_t n, size_t elem_size,
-        cx_tree_search_func sfunc, cx_tree_node_create_func cfunc,
-        void *cdata, void **failed, void *root,
-        ptrdiff_t loc_parent,
-        ptrdiff_t loc_children, ptrdiff_t loc_last_child,
-        ptrdiff_t loc_prev, ptrdiff_t loc_next);
-        
-int cx_tree_add(const void *src,
-        cx_tree_search_func sfunc, cx_tree_node_create_func cfunc,
-        void *cdata, void **cnode, void *root,
-        ptrdiff_t loc_parent,
-        ptrdiff_t loc_children, ptrdiff_t loc_last_child,
-        ptrdiff_t loc_prev, ptrdiff_t loc_next);
+void cxTreeAddNode(CxTree *tree, void *parent, void *child);
+
+void cxTreeSetParent(CxTree *tree, void *parent, void *child);
 ```
 
-The low-level function `cx_tree_link()` adds a `node` to a new `parent`, unlinking it from its previous parent, if required.
-
-The functions `cx_tree_add_array()` and `cx_tree_add_iter()` are equivalent,
-except that the former adds `n` items of size `elem_size` from the `src` array,
-and the latter adds _up to_ `n` items from the iterator `iter`.
-For each element, the `cfun` is called with the element as first argument and the `cdata` context as second argument.
-The `cfun` function is supposed to return a node for which then the `sfunc` is used to find the location where to insert the node.
-If a node could not be added, because `sfunc` always returns a negative number, it is stored at the location where `failed` points to.
-
-> The functions `cx_tree_add_array()` and `cx_tree_add_iter()` are optimized for adding multiple nodes to
-> the same subtree without starting the search all over from the root node.
-> This behavior can be controlled with `cx_tree_add_look_around_depth`, which specifies for how many parent nodes
-> the algorithm backtracks in the current subtree before restarting the search from the root node.
-
-The function `cx_tree_add()` is similar to the other add-functions,
-except that it does only add one single item to the tree and always stores the created node at the location where `cnode` points to.
-
-> By design, the add-functions can only be used to add children to the tree.
-> If you want to add a new root node, this is much easier by simply calling `cx_tree_link(newroot, oldroot, layout...)` .
-
-The following high-level functions can be used to add nodes to a `CxTree`.
+The low-level function `cx_tree_add()` adds a `node` to a new `parent`, unlinking it from its previous parent, if required.
 
-```C
-#include <cx/tree.h>
-
-int cxTreeAddChild(CxTree *tree, void *parent, const void *data);
-
-void cxTreeAddChildNode(CxTree *tree, void *parent, void *child);
-
-void cxTreeSetParent(CxTree *tree, void *parent, void *child);
-
-int cxTreeInsert(CxTree *tree, const void *data);
-
-size_t cxTreeInsertIter(CxTree *tree, CxIteratorBase *iter, size_t n);
-
-size_t cxTreeInsertArray(CxTree *tree, const void *data,
-        size_t elem_size, size_t n);
-```
-
-The functions `cxTreeAddChild()`, `cxTreeAddChildNode()`, and `cxTreeSetParent()` are similar to `cx_tree_link()`.
-The difference between `cxTreeAddChild()` and `cxTreeAddChildNode()` is,
-that `cxTreeAddChild()` uses the `cx_tree_node_create_func` of the `CxTree` to create the node first, before adding it.
-The function returns non-zero in case the creation of the node fails.
-The difference between `cxTreeSetParent()` and `cxTreeAddChildNode()` is,
+The functions `cxTreeAddData()`, `cxTreeAddNode()`, and `cxTreeSetParent()` are similar to `cx_tree_add()`.
+The difference between `cxTreeAddData()` and `cxTreeAddNode()` is,
+that `cxTreeAddData()` uses the allocator of the `CxTree` to create the node first, before adding it.
+The function returns `NULL` in case the creation of the node fails, or the created node when creation was successful.
+The difference between `cxTreeSetParent()` and `cxTreeAddNode()` is,
 that `cxTreeSetParent()` does not increase the tree size, when `child` already had a parent.
 
-> Use `cxTreeAddChildNode()` to add _new_ nodes to the tree and `cxTreeSetParent()`
+> Use `cxTreeAddNode()` to add _new_ nodes to the tree and `cxTreeSetParent()`
 > to relink a subtree of the `tree` with a new parent.
 > 
 > Using `cxTreeSetParent()` on a `child` which already has a parent in a _different_ tree is a mistake.
 
-The functions `cxTreeInsert()`, `cxTreeInsertIter()`, and `cxTreeInsertArray()` behave
-like `cx_tree_add()`, `cx_tree_add_iter()`, and `cx_tree_add_array()`.
-As creation context `cdata` for the `cx_tree_node_create_func` a pointer to the `CxTree` is passed.
-Instead of returning a `failed` node, like the low-level functions, the high-level functions automatically free the memory for the failed node.
 
 ## Size and Depth
 
 ```C
 #include <cx/tree.h>
 
+size_t cx_tree_size(void *root,
+        ptrdiff_t loc_children, ptrdiff_t loc_next);
+
+size_t cx_tree_depth(void *root,
+        ptrdiff_t loc_children, ptrdiff_t loc_next);
+
 size_t cxTreeSubtreeSize(CxTree *tree, void *subtree_root);
 
 size_t cxTreeSubtreeDepth(CxTree *tree, void *subtree_root);
@@ -269,39 +173,41 @@
 
 The functions `cxTreeSize()` and `cxTreeDepth()` are equivalent to `cxTreeSubtreeSize()` and `cxTreeSubtreeDepth()`, respectively, where `subtree_root` is the root node of the entire tree.
 
-> Passing a `NULL` pointer as `subtree_root` makes those functions return zero.
+The functions `cx_tree_size()` and `cx_tree_depth()` are the low-level counterparts.
+
+> Passing a `NULL` pointer as `root` or `subtree_root` makes those functions return zero.
+
+> Since the size of a `CxTree` is stored in the [collection](collection.h.md) metadata, `cxTreeSize()` is actually a constant time operation.
+> This is not the case for `cxTreeSubtreeSize()` or `cx_tree_size()` where the size has to be calculated by traversing the tree.
 
 ## Search
 
 ```C
 #include <cx/tree.h>
 
-#define CX_TREE_SEARCH_INFINITE_DEPTH
+#define CX_TREE_SEARCH_INFINITE_DEPTH 0
 
-int cx_tree_search_data(const void *root, size_t max_depth,
-        const void *data, cx_tree_search_data_func sfunc,
-        void **result, ptrdiff_t loc_children, ptrdiff_t loc_next);
+typedef int (*cx_tree_search_func)(
+        const void *node, const void *data);
 
 int cx_tree_search(const void *root, size_t max_depth,
-        const void *node, cx_tree_search_func sfunc,
-        void **result, ptrdiff_t loc_children, ptrdiff_t loc_next);
+        const void *data, cx_tree_search_func sfunc, void **result,
+        bool use_dfs, ptrdiff_t loc_children, ptrdiff_t loc_next);
+
+void *cxTreeFindFast(CxTree *tree, const void *data,
+        cx_tree_search_func sfunc);
         
-void *cxTreeFind(CxTree *tree, const void *data);
+void *cxTreeFind(CxTree *tree, const void *data, bool use_dfs);
 
 void *cxTreeFindInSubtree(CxTree *tree, const void *data,
-        void *subtree_root, size_t max_depth);
+        void *subtree_root, size_t max_depth, bool use_dfs);
 ```
 
-The functions `cx_tree_search_data()` and `cx_tree_search()` are equivalent,
-except that `cx_tree_search_data()` compares a currently investigated node against the `data` with the specified `cx_tree_search_data_func`,
-and `cx_tree_search()` compares against a `node` with the specified `cx_tree_search_func`.
-
-Usually you the latter is rarely needed.
-It is used, for example, by `cx_tree_add()` to find the correct position where to add a freshly created node.
+The function `cx_tree_search()` efficiently finds a node in the tree.
 
 The search proceeds as follows, starting with the `root` node:
 
-1. The current node is compared with `data` / `node` using the `sfunc`.
+1. The current node is compared with `data` using the `sfunc`.
 2. If `sfunc` returns zero, a matching node has been found and the pointer to that node is stored at the location given by `result`.
 3. If `sfunc` returns a negative number, the entire subtree is skipped.
 4. If `sfunc` returns a positive number, the subtree is searched, unless the number is larger than the number of an already searched subtree. The best match will be stored at the location given by `result`. 
@@ -309,14 +215,16 @@
 
 > If the `sfunc` implements some kind of metric, the search functions will return the best match in the tree with respect to that metric.
 > When a positive number is returned, it means that a new node with the searched data could be added as a child of the found node.
-> This is exactly how `cx_tree_add()` finds the best position to add a node to the tree.
+
+The function `cxTreeFindFast()` uses the `sfunc` to find the `data` in the tree, and returns a pointer to the node when the data was found (or `NULL`, otherwise).
 
-The function `cxTreeFind()` uses the `search_data_func` of the `CxTree`
-to find the `data` in the tree, and returns a pointer to the node when the data was found (or `NULL`, otherwise).
+The function `cxTreeFind()` instead traverses the entire tree (either with breadth-first search or depth-first search)
+and compares the `data` with the collections [comparator function](collection.h.md#comparator-functions).
+For this it is necessary that the `loc_data` was specified when creating the tree, or the function will immediately return `NULL`.
 
-The function `cxTreeFindInSubtree()` is equivalent, except that it restricts the search to nodes
+The functions `cxTreeFindFastInSubtree()` and `cxTreeFindInSubtree()` are similar, except that they restrict the search to nodes
 in the subtree starting with (and including) `subtree_root`, and skipping all nodes below the `max_depth`.
-Note, that the `max_depth` is specified in relation to the `subtree_root` and not in relation to the entire tree. 
+Note, that the `max_depth` is specified in relation to the `subtree_root` and not in relation to the entire tree.
 
 ## Iterator and Visitor
 
@@ -331,51 +239,56 @@
 CxTreeIterator cxTreeIterateSubtree(CxTree *tree,
         void *node, bool visit_on_exit);
 
+CxTreeIterator cx_tree_visitor(void *root,
+        ptrdiff_t loc_children, ptrdiff_t loc_next);
+
+CxTreeIterator cxTreeVisit(CxTree *tree);
+
+CxTreeIterator cxTreeVisitSubtree(CxTree *tree, void *node)
+
 #define cxTreeIteratorContinue(iter)
 
 void cxTreeIteratorDispose(CxTreeIterator *iter);
-
-
-CxTreeVisitor cx_tree_visitor(void *root,
-        ptrdiff_t loc_children, ptrdiff_t loc_next);
-
-CxTreeVisitor cxTreeVisit(CxTree *tree);
-
-CxTreeVisitor cxTreeVisitSubtree(CxTree *tree, void *node)
-
-#define cxTreeVisitorContinue(visitor)
-
-void cxTreeVisitorDispose(CxTreeVisitor *visitor);
 ```
 
 There are two different kinds of iterators for trees: one for depth-first iteration and one for breadth-first iteration.
 
-The `CxTreeIterator` is performing a depth-first iteration with the capability of visiting a node twice:
+An iterator created with the _iterate_ functions is performing a depth-first iteration with the capability of visiting a node twice:
 first when the iterator enters the node coming from its parent, and secondly when the iterator tracks back from its last child.
 This behavior is controlled via the `visit_on_exit` argument.
 When set to `true`, the iterators `exiting` flag can be checked during iteration to see whether the iterator is currently entering or leaving the node. 
 The above [example](#example-for-wrapping-a-libxml2-tree) for iterating through an XML tree illustrates this.
 
-On the other hand, the `CxTreeVisitor` performs a breadth-first iteration and visits every node only once.
+On the other hand, an iterator created with the _visit_ functions performs a breadth-first iteration and visits every node only once.
 
 Both iterators do not support the removal of nodes.
 
 Since tree iteration needs to keep track of a stack (depth-first) or queue (breadth-first), internal memory is allocated.
 This memory is _automatically_ disposed of when the iteration _completes_.
-If you break from the iteration early, you must call `cxTreeIteratorDispose()` or `cxTreeVisitorDispose()`, respectively, to deallocate the memory.
+If you break from the iteration early, you must call `cxTreeIteratorDispose()` to deallocate the memory.
 
 > It is recommended to always invoke the dispose functions, even when it seems not necessary.
 > In the best case it just does nothing, but calling them guarantees that no memory can be leaking, even when your code will change in the future.
 >{style="note"}
 
-The macros `cxTreeIteratorContinue()` and `cxTreeVisitorContinue()` equivalently instruct the iterator/visitor to skip the subtree below the currently inspected node.
+The macro `cxTreeIteratorContinue()` instruct the iterator to skip the subtree below the currently inspected node.
+
+> In contrast to iterators over [lists](list.h.md#iterators) or [maps](map.h.md#iterators),
+> a tree iterator is _always_ iterating over the nodes.
+> That means, the pointer in a `cx_foreach` loop is always expected to be a pointer to a node and not a pointer to
+> the element / payload within the node.
+> 
+>{style="warning"}
 
 ## Remove
 
 ```C
 #include <cx/tree.h>
 
-void cx_tree_unlink(void *node, ptrdiff_t loc_parent,
+typedef void (*cx_tree_relink_func)(
+        void *node, const void *old_parent, const void *new_parent);
+
+void cx_tree_remove(void *node, ptrdiff_t loc_parent,
         ptrdiff_t loc_children, ptrdiff_t loc_last_child,
         ptrdiff_t loc_prev, ptrdiff_t loc_next);
 
@@ -385,7 +298,7 @@
 void cxTreeRemoveSubtree(CxTree *tree, void *node);
 ```
 
-The low-level function `cx_tree_unlink()` removes the specified `node`,
+The low-level function `cx_tree_remove()` removes the specified `node`,
 as well as the entire subtree beneath that node, from its parent.
 The high-level counterpart is `cxTreeRemoveSubtree()`.
 
@@ -403,6 +316,9 @@
 ```C
 #include <cx/tree.h>
 
+typedef void (*cx_tree_relink_func)(
+        void *node, const void *old_parent, const void *new_parent);
+
 int cxTreeDestroyNode(CxTree *tree, void *node,
         cx_tree_relink_func relink_func);
 
@@ -426,11 +342,12 @@
 
 The function `cxTreeFree()` behaves like `cxTreeClear()` and then deallocates the memory for the `CxTree` structure.
 
-> Although `CxTree` supports the general concept of [destructor functions](collection.h.md#destructor-functions),
-> it is not based on `CX_COLLECTION_BASE`.
-> Therefore, the `cxSetDestructor()` and `cxSetAdvancedDestructor()` macros cannot be used on a `CxTree` and
-> the fields must be set manually.
->{style="note"}
+> In contrast to other collections, the destructor functions of a `CxTree` are _always_ invoked with pointers to the _nodes_
+> and not to pointers to the elements / the payload of the nodes.
+> 
+> Also, the nodes are _not_ deallocated by the tree and _MUST_ be deallocated by the destructor functions.
+> When the tree was created by `cxTreeCreate()` without specifying an existing `root` node, `cxFree` is automatically used as the destructor function.
+> {style="note"}
 
 <seealso>
 <category ref="apidoc">

mercurial