docs/Writerside/topics/array_list.h.md

Sun, 16 Mar 2025 15:23:45 +0100

author
Mike Becker <universe@uap-core.de>
date
Sun, 16 Mar 2025 15:23:45 +0100
changeset 1248
fc5e63b04281
parent 1246
5f2c9750204c
permissions
-rw-r--r--

complete array_list.h documentation

relates to #451

# Array List

Next to an array list implementation of the list interface,
UCX offers several functions to work with plain C arrays equipped with a size and a capacity.

The high level [list interface](list.h.md) is documented on a separate page and explains how lists are used
that are created by one of the following functions.

```C
#include <cx/array_list.h>

CxList *cxArrayListCreate(const CxAllocator *allocator,
        cx_compare_func comparator, size_t elem_size,
        size_t initial_capacity);
        
CxList *cxArrayListCreateSimple(size_t elem_size,
        size_t initial_capacity);
```

The remaining documentation on this page concentrates on dealing with plain C arrays.

## Declare Array with Size and Capacity

```C
#include <cx/array_list.h>

#define CX_ARRAY_DECLARE_SIZED(type, name, size_type)

#define CX_ARRAY_DECLARE(type, name)

#define cx_array_initialize(ARRAY, capacity)

#define cx_array_initialize_a(allocator, ARRAY, capacity)
```

The purpose of the UCX array functions is to keep track of the size and capacity of a plain C array.

The raw functions do not need this information packed into a structure, but working with independent variables is quite cumbersome.
Therefore, UCX defines several macros that call the raw functions assuming certain variable names.

This is what the `CX_ARRAY_DECLARE_SIZED()` and `CX_ARRAY_DECLARE()` macros are for.
They take a `type` for the array members, a `name` for the array variable, and a `size_type` for the type of the size and capacity variables (`size_t` by default when you use `CX_ARRAY_DECLARE()`),
and generate three variables named `name`, `name_size`, and `name_capacity`.

For example, you can abbreviate the following code
```C
struct my_data {
    int other_int;
    float other_float;
    int *my_array;
    size_t my_array_size;
    size_t my_array_capacity;
}
```
by substituting the three members for the array with `CX_ARRAY_DECLARE()`.
```C
struct my_data {
    int other_int;
    float other_float;
    CX_ARRAY_DECLARE(int, my_array);
}
```

> Using `CX_ARRAY_DECLARE_SIZED()` can save some space when you are using a size type that is _half_ as wide as `sizeof(void*)`.
> On 64-bit architectures, having the possibility to store more than four billion items is rarely necessary, hence using 32-bit integers for size and capacity
> saves eight bytes (assuming proper alignment in your struct).

Once the array is declared, you can use one of the macros `cx_array_initialize()` or `cx_array_initialize_a()` to allocate memory.
The former uses a stdlib default allocator and the latter allows you to use a specific allocator.
Important to note is, that the `ARRAY` argument expects the variable's _name_.
The macros set `ARRAY_size` to zero, `ARRAY_capacity` to the specified initial capacity, and invoke the allocator's `malloc()` function to allocate the memory.

Using the example from above, and the UCX [memory pool](mempool.h.md), this could look like this:

```C
CxMempool *mpool = cxMempoolCreateSimple(128);

struct my_data data;
cx_array_initialize_a(mpool->allocator, data.my_array, 16);
```

> The aforementioned macros make your life simpler, but keep in mind that using them is not mandatory.
> All other macros continue to work perfectly, if you declare and initialize your array manually, as long as you use
> the naming convention to suffix the size variable with `_size` and the capacity variable with `_capacity`.
> Also you can freely decide in which order you want to declare the variables.
> 
> For example, when you have multiple arrays in your struct with 8-bit `size_type` (because in your use case you don't expect more than 255 elements),
> it is favorable to group all pointers and then declare the size and capacity variables to avoid padding between the array declarations. 

## Array Reallocator

```C
#include <cx/array_list.h>

typedef struct {
    void *(*realloc)(void *array, size_t capacity, size_t elem_size,
            CxArrayReallocator *alloc);
    void *ptr1;
    void *ptr2;
    size_t int1;
    size_t int2;
} CxArrayReallocator;

CxArrayReallocator cx_array_reallocator(
        const struct cx_allocator_s *allocator,
        const void *stackmem
);

extern CxArrayReallocator* cx_array_default_reallocator;
```

The main purpose of the functions defined in the array list header,
is to make it easier to write to an array without caring too much about a possibly needed reallocation.

This is realized by passing a reallocator to the various functions which specifies how an array shall be reallocated when needed.

The default `cx_array_default_reallocator` simply defers to the standard library `realloc()`.

A reallocator created with the `cx_array_reallocator()` function uses a more sophisticated approach.
On the one hand, it can use an arbitrary UCX [allocator](allocator.h.md) for the reallocation,
and on the other hand, it can optionally keep track of a stack memory pointer.
If you pass a non-`NULL` pointer to `stackmem`, the reallocator will _always_ allocate _new_ memory for the array,
if the current location equals the `stackmem` address.

This allows you to allocate an array on the stack and instruct UCX to automatically move it to heap memory when the capacity is exceeded.
Combined with a UCX [memory pool](mempool.h.md) this can be a powerful tool for local arrays
which are expected to stay within the bounds of the stack memory most of the time, but are also allowed to sometimes grow their capacity when needed.

## Reserve

```C
#include <cx/array_list.h>

int cx_array_reserve(
        void **array, void *size, void *capacity, unsigned width,
        size_t elem_size, size_t count,
        CxArrayReallocator *reallocator);

#define cx_array_simple_reserve(ARRAY, count)
#define cx_array_simple_reserve_a(reallocator, ARRAY, count)
```

The function `cx_array_reserve()` guarantees that at least `count` _additional_ elements
can be stored in the array pointed to by `array`.

The `array` argument is a pointer to the location of the target array pointer.
The reason for this additional indirection is that this function writes
a new pointer back to that variable, if the array needed to be reallocated.

If reallocation fails, this function returns non-zero leaving the array as it was.
Otherwise, the function returns zero.

If `*size + elem_count` does not exceed the current `*capacity`, this function does nothing and simply returns zero.
Otherwise, the specified `reallocator` is used to reserve the necessary space.
If reallocation was necessary but failed, this function returns non-zero.

The actual data type of `size` and `capacity` can be a pointer to an arbitrary integer whose byte-width is determined by the `width` argument.
On 32-bit platforms the widths 1, 2, and 4 are supported and on 64-bit platforms, additionally a width of 8 is supported.
Passing zero as argument makes the function assume the native width for size arguments `sizeof(size_t)`.

The convenience macros take the _name_ of an array variable and invoke the function by deducing the other arguments
(including the width of the size and capacity).
The reallocator used by the `cx_array_simple_reserve()` macro is the `cx_array_default_reallocator`.

> While the actual integer type for `size` and `capacity` can be chosen freely, it must be _the same_ for both variables.
> For example it is not allowed, and it does not make any sense either, to use a 16-bit integer for the size, but a 32-bit integer for the capacity.

## Copy

```C
#include <cx/array_list.h>

int cx_array_copy(
        void **target, void *size, void *capacity, unsigned width,
        size_t index, const void *src,
        size_t elem_size, size_t elem_count,
        CxArrayReallocator *reallocator);

#define cx_array_simple_copy(ARRAY, index, src, count)

#define cx_array_simple_copy_a(reallocator, ARRAY, index, src, count)
```

The function `cx_array_copy()` first [reserves](#reserve) enough memory to copy `elem_count` number of elements from the `src` array
to the target array at the specified `index`, and then copies the data with one call to `memmove()`.

A few things to note:
* `*target` and `src` can point to the same memory region, since the underlying copy operation is a `memmove()`
* `*target` does not need to point to the start of the array, but `size` and `capacity` always start counting from the
  position `*target` points to - in this scenario, the need for reallocation must be avoided for obvious reasons
* `index` does not need to be within size and not even within the capacity of the current array
* if `index` equals `*size`, this function effectively appends the data to the target array
* the data in the target array is overwritten - if you want to insert data, you first need to copy the existing data to the end of the array and then copy the new data in a separate call 

> If you are using `cx_array_reserve()` already in your program, there is no need to call `cx_array_copy()` or any of the convenience macros anymore.
> In other words: if you already guarantee, by any means, that no reallocation is necessary when writing to your array,
> it is strongly recommended to use standard library functions or direct index-based access instead of the UCX functions,
> which only purpose is to make it easier for you to guarantee that your array's capacity is large enough to hold new elements.  

## Add Elements

```C
#include <cx/array_list.h>

#define cx_array_add(target, size, capacity, elem_size, elem,
        reallocator);

#define cx_array_simple_add(ARRAY, elem)

#define cx_array_simple_add_a(reallocator, ARRAY, elem)
```

The macros for adding an element are all convenience macros that invoke `cx_array_copy()`
interpreting the `elem` as an array of size one, copied to the past-by-one index of the target array.

## Insertion Sort

```C
int cx_array_insert_sorted(
        void **target, size_t *size, size_t *capacity,
        cx_compare_func cmp_func,
        const void *src, size_t elem_size, size_t elem_count,
        CxArrayReallocator *reallocator);

#define cx_array_simple_insert_sorted(ARRAY,
        src, elem_count, cmp_func)

#define cx_array_simple_insert_sorted_a(reallocator, ARRAY,
        src, elem_count, cmp_func)

#define cx_array_add_sorted(target, size, capacity,
        elem_size, elem, cx_compare_func cmp_func, reallocator);

#define cx_array_simple_add_sorted(ARRAY,
        elem, cmp_func)

#define cx_array_simple_add_sorted_a(reallocator, ARRAY,
        elem, cmp_func)
```

The function `cx_array_insert_sorted()` inserts the `elem_count` number of already sorted elements from the `src` array
into the target array, comparing the elements with the given `cmp_func`.

The arguments of this function are used similar to [`cx_array_copy()`](#copy) with two notable exceptions:
first, this function actually inserts the items, moving existing items when necessary, and second,
no particular index is required because this function determines the insertion points automatically using [binary search](#binary-search).

If either the target or the source array is not already sorted with respect to the given compare function, the behavior is undefined.

The convenience macros are all calling `cx_array_insert_sorted()` by deducing the missing arguments.
The `cx_array_add_sorted()` family of macros are interpreting the `elem` as a `src` array with an `elem_count` of one.

## Binary Search

```C
#include <cx/array_list.h>

size_t cx_array_binary_search(
        const void *arr, size_t size, size_t elem_size,
        const void *elem, cx_compare_func cmp_func);

size_t cx_array_binary_search_inf(
        const void *arr, size_t size, size_t elem_size,
        const void *elem, cx_compare_func cmp_func);

size_t cx_array_binary_search_sup(
        const void *arr, size_t size, size_t elem_size,
        const void *elem, cx_compare_func cmp_func);
```

The function `cx_array_binary_search()` searches the array `arr` for the data pointed to by `elem` using the compare function `cmp_func`.
 
If the element is found (i.e. `cmp_func` returns zero), the function returns the index of the element.
Otherwise, the function returns `size`.

The functions `cx_array_binary_search_inf()` and `cx_array_binary_search_sup()` are equivalent,
except that they return the index of the closest element, if the searched element is not found.
The former function returns the closest element that is less or equal (greatest lower bound / infimum),
and the latter function returns the closest element that is larger or equal (least upper bound / supremum)
than the searched element.

> Note, that all of the above functions are only well-defined on arrays which are sorted with respect to the given compare function.
> 
> This can, for example, easily be achieved by calling the standard library's `qsort()` function.
>{style="note"}

> Despite the name, neither C nor POSIX standards require the standard library's `bsearch()` function to be implemented using binary search.
> But observations show that it usually is.
> This makes `cx_array_binary_search()` likely to be equivalent to `bsearch()`, except that it returns an index rather than a pointer.

## Iterators

```C
#include <cx/iterator.h>

CxIterator cxIterator(const void *array,
        size_t elem_size, size_t elem_count);
        
CxIterator cxMutIterator(void *array,
        size_t elem_size, size_t elem_count, bool remove_keeps_order);
        
CxIterator cxIteratorPtr(const void *array, size_t elem_count);
        
CxIterator cxMutIteratorPtr(void *array, size_t elem_count,
        bool remove_keeps_order);
```

Iterators over plain C arrays are defined in [iterator.h](iterator.h.md#creating-an-iterator).

## Other

```C
#include <cx/array_list.h>

void cx_array_swap(
        void *arr,
        size_t elem_size,
        size_t idx1,
        size_t idx2
);
```

The function `cx_array_swap()` exchanges two items with a three-way swap.

No memory is allocated if the element size `elem_size` is not larger than `CX_ARRAY_SWAP_SBO_SIZE` (cf. [](install.md#small-buffer-optimizations)).

You have to make sure that both indices are within bounds of the array `arr`. 

<seealso>
<category ref="apidoc">
<a href="https://ucx.sourceforge.io/api/array__list_8h.html">array_list.h</a>
</category>
</seealso>

mercurial