docs/Writerside/topics/array_list.h.md

Sun, 26 Oct 2025 12:01:28 +0100

author
Mike Becker <universe@uap-core.de>
date
Sun, 26 Oct 2025 12:01:28 +0100
changeset 1447
aaf85b3e9601
parent 1429
6e0c3a8a914a
permissions
-rw-r--r--

add documentation for cxMapDifference() and cxMapListDifference()

relates to #746

# Array List

Next to an array list implementation of the list interface,
UCX offers several functions to work with plain C arrays equipped with a size and a capacity.

The high-level [list interface](list.h.md) is documented on a separate page and explains how lists are used
that are created by one of the following functions.

```C
#include <cx/array_list.h>

CxList *cxArrayListCreate(const CxAllocator *allocator,
        cx_compare_func comparator, size_t elem_size,
        size_t initial_capacity);
        
CxList *cxArrayListCreateSimple(size_t elem_size,
        size_t initial_capacity);
```

The remaining documentation on this page concentrates on dealing with plain C arrays.

## Declare Array with Size and Capacity

```C
#include <cx/array_list.h>

#define CX_ARRAY_DECLARE_SIZED(type, name, size_type)

#define CX_ARRAY_DECLARE(type, name)

#define cx_array_initialize(ARRAY, capacity)

#define cx_array_initialize_a(allocator, ARRAY, capacity)
```

The purpose of the UCX array functions is to keep track of the size and capacity of a plain C array.

The raw functions do not need this information packed into a structure, but working with independent variables is quite cumbersome.
Therefore, UCX defines several macros that call the raw functions assuming certain variable names.

This is what the `CX_ARRAY_DECLARE_SIZED()` and `CX_ARRAY_DECLARE()` macros are for.
They take a `type` for the array members, a `name` for the array variable, and a `size_type` for the type of the size and capacity variables (`size_t` by default when you use `CX_ARRAY_DECLARE()`),
and generate three variables named `name`, `name_size`, and `name_capacity`.

For example, you can abbreviate the following code
```C
struct my_data {
    int other_int;
    float other_float;
    int *my_array;
    size_t my_array_size;
    size_t my_array_capacity;
}
```
by substituting the three members for the array with `CX_ARRAY_DECLARE()`.
```C
struct my_data {
    int other_int;
    float other_float;
    CX_ARRAY_DECLARE(int, my_array);
}
```

> Using `CX_ARRAY_DECLARE_SIZED()` can save some space when you are using a size type that is _half_ as wide as `sizeof(void*)`.
> On 64-bit architectures, having the possibility to store more than four billion items is rarely necessary, hence using 32-bit integers for size and capacity
> saves eight bytes (assuming proper alignment in your struct).

Once the array is declared, you can use one of the macros `cx_array_initialize()` or `cx_array_initialize_a()` to allocate memory.
The former uses the [default allocator](allocator.h.md#default-allocator) and the latter allows you to use a specific allocator.
Important to note is, that the `ARRAY` argument expects the variable's _name_.
The macros set `ARRAY_size` to zero, `ARRAY_capacity` to the specified initial capacity, and invoke the allocator's `malloc()` function to allocate the memory.

Using the example from above, and the UCX [memory pool](mempool.h.md), this could look like this:

```C
CxMempool *mpool = cxMempoolCreateSimple(128);

struct my_data data;
cx_array_initialize_a(mpool->allocator, data.my_array, 16);
```

> The aforementioned macros simplify your life, but keep in mind that using them is not mandatory.
> All other macros continue to work perfectly if you declare and initialize your array manually, as long as you use
> the naming convention to suffix the size variable with `_size` and the capacity variable with `_capacity`.
> Also, you can freely decide in which order you want to declare the variables.
> 
> For example, when you have multiple arrays in your struct with 8-bit `size_type` (because in your use case you don't expect more than 255 elements),
> it is favorable to group all pointers and then declare the size and capacity variables to avoid padding between the array declarations. 

## Array Reallocator

```C
#include <cx/array_list.h>

typedef struct {
    void *(*realloc)(void *array,
            size_t old_capacity, size_t new_capacity,
            size_t elem_size, CxArrayReallocator *alloc);
    const CxAllocator *allocator;
    const void *stack_ptr;
} CxArrayReallocator;

CxArrayReallocator cx_array_reallocator(
        const struct cx_allocator_s *allocator,
        const void *stack_ptr
);

extern CxArrayReallocator* cx_array_default_reallocator;
```

The main purpose of the functions defined in the array list header 
is to make it easier to write to an array without caring too much about a possibly necessary reallocation.

This is realized by passing a reallocator to the various functions that specify how an array shall be reallocated when needed.

The default `cx_array_default_reallocator` simply defers to the [default allocator](allocator.h.md#default-allocator).

A reallocator created with the `cx_array_reallocator()` function uses a more sophisticated approach.
On the one hand, it can use an arbitrary UCX [allocator](allocator.h.md) for the reallocation,
and on the other hand, it can optionally keep track of a stack memory pointer.
If you pass a non-`NULL` pointer to `stack_ptr`, the reallocator will _always_ allocate _new_ memory for the array,
if the current location equals the `stack_ptr` address.

This allows you to allocate an array on the stack and instruct UCX to automatically move it to heap memory when the capacity is exceeded.
Combined with a UCX [memory pool](mempool.h.md) this can be a powerful tool.
For example, you can add small arrays to your structs plus a memory pool and then use that memory pool to reallocate the arrays on the heap when needed.
When you are done with the arrays, you can then use the memory pool to free the memory for those arrays that needed to be moved to the heap.

## Reserve

```C
#include <cx/array_list.h>

int cx_array_reserve(
        void **array, void *size, void *capacity, unsigned width,
        size_t elem_size, size_t count,
        CxArrayReallocator *reallocator);

#define cx_array_simple_reserve(ARRAY, count)
#define cx_array_simple_reserve_a(reallocator, ARRAY, count)
```

The function `cx_array_reserve()` guarantees that at least `count` _additional_ elements
can be stored in the array pointed to by `array`.

The `array` argument is a pointer to the location of the target array pointer.
The reason for this additional indirection is that this function writes
a new pointer back to that variable if the array needed to be reallocated.

If reallocation fails, this function returns non-zero, leaving the array as it was.
Otherwise, the function returns zero.

If `*size + elem_count` does not exceed the current `*capacity`, this function does nothing and simply returns zero.
Otherwise, the specified `reallocator` is used to reserve the necessary space.
If reallocation was necessary but failed, this function returns non-zero.

The actual data type of `size` and `capacity` can be a pointer to an arbitrary integer whose byte-width is determined by the `width` argument.
On 32-bit platforms the widths 1, 2, and 4 are supported, and on 64-bit platforms a width of 8 is also supported.
Passing zero as argument makes the function assume the native width for size arguments `sizeof(size_t)`.

The convenience macros take the _name_ of an array variable and invoke the function by deducing the other arguments
(including the width of the size and capacity).
The reallocator used by the `cx_array_simple_reserve()` macro is the `cx_array_default_reallocator`.

> While the actual integer type for `size` and `capacity` can be chosen freely, it must be _the same_ for both variables.
> For example, it is not allowed, and it does not make any sense either, to use a 16-bit integer for the size, but a 32-bit integer for the capacity.

## Copy

```C
#include <cx/array_list.h>

int cx_array_copy(
        void **target, void *size, void *capacity, unsigned width,
        size_t index, const void *src,
        size_t elem_size, size_t elem_count,
        CxArrayReallocator *reallocator);

#define cx_array_simple_copy(ARRAY, index, src, count)

#define cx_array_simple_copy_a(reallocator, ARRAY, index, src, count)
```

The function `cx_array_copy()` first [reserves](#reserve) enough memory to copy `elem_count` number of elements from the `src` array
to the target array at the specified `index`, and then copies the data with one call to `memmove()`.

A few things to note:
* `*target` and `src` can point to the same memory region, since the underlying copy operation is a `memmove()`
* `*target` does not need to point to the start of the array, but `size` and `capacity` always start counting from the
  position `*target` points to - in this scenario, the need for reallocation must be avoided for obvious reasons
* `index` does not need to be within size and not even within the capacity of the current array
* if `index` equals `*size`, this function effectively appends the data to the target array
* the data in the target array is overwritten - if you want to insert data, you first need to copy the existing data to the end of the array and then copy the new data in a separate call 

> If you are using `cx_array_reserve()` already in your program, there is no need to call `cx_array_copy()` or any of the convenience macros anymore.
> In other words: if you already guarantee, by any means that no reallocation is necessary when writing to your array,
> it is strongly recommended to use standard library functions or direct index-based access instead of the UCX functions,
> which only purpose is to make it easier for you to guarantee that your array's capacity is large enough to hold new elements.  

## Add Elements

```C
#include <cx/array_list.h>

#define cx_array_add(target, size, capacity, elem_size, elem,
        reallocator);

#define cx_array_simple_add(ARRAY, elem)

#define cx_array_simple_add_a(reallocator, ARRAY, elem)
```

The macros for adding an element are all convenience macros that invoke `cx_array_copy()`
interpreting the `elem` as an array of size one, copied to the past-by-one index of the target array.

## Insertion Sort

```C
#include <cx/array_list.h>

int cx_array_insert_sorted(
        void **target, size_t *size, size_t *capacity,
        cx_compare_func cmp_func,
        const void *src, size_t elem_size, size_t elem_count,
        CxArrayReallocator *reallocator);

#define cx_array_simple_insert_sorted(ARRAY,
        src, elem_count, cmp_func)

#define cx_array_simple_insert_sorted_a(reallocator, ARRAY,
        src, elem_count, cmp_func)

#define cx_array_add_sorted(target, size, capacity,
        elem_size, elem, cx_compare_func cmp_func, reallocator);

#define cx_array_simple_add_sorted(ARRAY,
        elem, cmp_func)

#define cx_array_simple_add_sorted_a(reallocator, ARRAY,
        elem, cmp_func)
```

The function `cx_array_insert_sorted()` inserts the `elem_count` number of already sorted elements from the `src` array
into the target array, comparing the elements with the given `cmp_func`.

The arguments of this function are used similar to [`cx_array_copy()`](#copy) with two notable exceptions:
first, this function actually inserts the items, moving existing items when necessary, and second,
no particular index is required because this function determines the insertion points automatically using [binary search](#binary-search).

If either the target or the source array is not already sorted with respect to the given compare function, the behavior is undefined.

The convenience macros are all calling `cx_array_insert_sorted()` by deducing the missing arguments.
The `cx_array_add_sorted()` family of macros are interpreting the `elem` as a `src` array with an `elem_count` of one.

## Sets of Unique Elements

```C
#include <cx/array_list.h>

int cx_array_insert_unique(
        void **target, size_t *size, size_t *capacity,
        cx_compare_func cmp_func,
        const void *src, size_t elem_size, size_t elem_count,
        CxArrayReallocator *reallocator);

#define cx_array_simple_insert_unique(ARRAY,
        src, elem_count, cmp_func)

#define cx_array_simple_insert_unique_a(reallocator, ARRAY,
        src, elem_count, cmp_func)

#define cx_array_add_unique(target, size, capacity,
        elem_size, elem, cx_compare_func cmp_func, reallocator);

#define cx_array_simple_add_unique(ARRAY,
        elem, cmp_func)

#define cx_array_simple_add_unique_a(reallocator, ARRAY,
        elem, cmp_func)
```

The above functions are similar to `cx_array_insert_sorted()` and its [relatives](#insertion-sort),
except that they skip insertion of elements that are already present in the target array.

## Binary Search

```C
#include <cx/array_list.h>

size_t cx_array_binary_search(
        const void *arr, size_t size, size_t elem_size,
        const void *elem, cx_compare_func cmp_func);

size_t cx_array_binary_search_inf(
        const void *arr, size_t size, size_t elem_size,
        const void *elem, cx_compare_func cmp_func);

size_t cx_array_binary_search_sup(
        const void *arr, size_t size, size_t elem_size,
        const void *elem, cx_compare_func cmp_func);
```

The function `cx_array_binary_search()` searches the array `arr` for the data pointed to by `elem` using the compare function `cmp_func`.
 
If the element is found (i.e. `cmp_func` returns zero), the function returns the index of the element.
Otherwise, the function returns `size`.

The functions `cx_array_binary_search_inf()` and `cx_array_binary_search_sup()` are equivalent,
except that they return the index of the closest element if the searched element is not found.
The former function returns the closest element that is less or equal (the greatest lower bound / infimum),
and the latter function returns the closest element that is larger or equal (the least upper bound / supremum)
than the searched element.

> Note that all the above functions are only well-defined on arrays which are sorted with respect to the given compare function.
> 
> This can, for example, easily be achieved by calling the standard library's `qsort()` function.
>{style="note"}

> Despite the name, neither C nor POSIX standards require the standard library's `bsearch()` function to be implemented using binary search.
> But observations show that it usually is.
> This makes `cx_array_binary_search()` likely to be equivalent to `bsearch()`, except that it returns an index rather than a pointer.

## Iterators

```C
#include <cx/iterator.h>

CxIterator cxIterator(const void *array,
        size_t elem_size, size_t elem_count,
        bool remove_keeps_order);
       
CxIterator cxIteratorPtr(const void *array,
        size_t elem_count,
        bool remove_keeps_order);
```

Iterators over plain C arrays are defined in [iterator.h](iterator.h.md#creating-an-iterator).

## Other

```C
#include <cx/array_list.h>

void cx_array_swap(
        void *arr,
        size_t elem_size,
        size_t idx1,
        size_t idx2
);
```

The function `cx_array_swap()` exchanges two items with a three-way swap.

No memory is allocated if the element size `elem_size` is not larger than `CX_ARRAY_SWAP_SBO_SIZE` (cf. [](install.md#small-buffer-optimizations)).

You have to make sure that both indices are within bounds of the array `arr`. 

<seealso>
<category ref="apidoc">
<a href="https://ucx.sourceforge.io/api/array__list_8h.html">array_list.h</a>
</category>
</seealso>

mercurial