docs/src/features.md

Sun, 22 Dec 2024 22:10:04 +0100

author
Mike Becker <universe@uap-core.de>
date
Sun, 22 Dec 2024 22:10:04 +0100
changeset 1047
40aad3f0bc9e
parent 998
bb196054f3fd
child 1112
22dc2163fffd
permissions
-rw-r--r--

don't trust that size_t always has word width

it should be the case on all platforms supported by UCX, but it's not strictly defined in POSIX that it must be the case

---
title: UCX Features
---

<div id="modules">

------------------------ -------------------------  -------------------  ---------------------------------
[Allocator](#allocator)  [String](#string)          [Buffer](#buffer)    [Memory&nbsp;Pool](#memory-pool)
[Iterator](#iterator)    [Collection](#collection)  [List](#list)        [Map](#map)
[Utilities](#utilities)
------------------------ -------------------------  -------------------  ---------------------------------

</div>

## Allocator

*Header file:* [allocator.h](api/allocator_8h.html)  

The UCX allocator provides an interface for implementing an own memory allocation mechanism.
Various function in UCX provide an additional alternative signature that takes an allocator as
argument. A default allocator implementation using the stdlib memory management functions is
available via the global symbol `cxDefaultAllocator`.

If you want to define your own allocator, you need to initialize the `CxAllocator` structure
with a pointer to an allocator class (containing function pointers for the memory management
functions) and an optional pointer to an arbitrary memory region that can be used to store
state information for the allocator. An example is shown below:

```c
struct my_allocator_state {
    size_t total;
    size_t avail;
    char mem[];
};

static cx_allocator_class my_allocator_class = {
        my_malloc_impl,
        my_realloc_impl,   // all these functions are somewhere defined
        my_calloc_impl,
        my_free_impl
};

CxAllocator create_my_allocator(size_t n) {
    CxAllocator alloc;
    alloc.cl = &my_allocator_class;
    alloc.data = calloc(1, sizeof(struct my_allocator_state) + n);
    return alloc;
}
```

## String

*Header file:* [string.h](api/string_8h.html)

UCX strings come in two variants: immutable (`cxstring`) and mutable (`cxmutstr`).
The functions of UCX are designed to work with immutable strings by default but in situations where it is necessary,
the API also provides alternative functions that work directly with mutable strings.
Functions that change a string in-place are, of course, only accepting mutable strings.

When you are using UCX functions, or defining your own functions, you are sometimes facing the "problem",
that the function only accepts arguments of type `cxstring` but you only have a `cxmutstr` at hand.
In this case you _should not_ introduce a wrapper function that accepts the `cxmutstr`,
but instead you should use the `cx_strcast()` function to cast the argument to the correct type.

In general, UCX strings are **not** necessarily zero-terminated. If a function guarantees to return zero-terminated
string, it is explicitly mentioned in the documentation of the respective function.
As a rule of thumb, you _should not_ pass the strings of a UCX string structure to another API without explicitly
ensuring that the string is zero-terminated.

## Buffer

*Header file:* [buffer.h](api/buffer_8h.html)

Instances of this buffer implementation can be used to read from or write to memory like you would do with a stream.
This allows the use of `cx_stream_copy()` (see [Utilities](#utilities)) to copy contents from one buffer to another,
or from a file or network streams to the buffer and vice-versa.

More features for convenient use of the buffer can be enabled, like automatic memory management and automatic
resizing of the buffer space.

Since UCX 3.0, the buffer also supports automatic flushing of contents to another stream (or buffer) as an alternative
to automatically resizing the buffer space.
Please refer to the API doc for the fields prefixed with `flush_` to learn more. 

## Memory Pool

*Header file:* [mempool.h](api/mempool_8h.html)

A memory pool is providing an allocator implementation that automatically deallocates the memory upon its destruction. 
It also allows you to register destructor functions for the allocated memory, which are automatically called before
the memory is deallocated.
Additionally, you may also register _independent_ destructor functions within a pool in case some external library
allocated memory for you, which should be freed together with this pool.

Many UCX features support the use of an allocator.
The [strings](#string), for instance, provide several functions suffixed with `_a` that allow specifying an allocator.
You can use this to keep track of the memory occupied by dynamically allocated strings and cleanup everything with
just a single call to `cxMempoolFree()`.

The following code illustrates this on the example of reading a CSV file into memory. 
```C
#include <stdio.h>
#include <cx/mempool.h>
#include <cx/linked_list.h>
#include <cx/string.h>
#include <cx/buffer.h>
#include <cx/utils.h>

typedef struct {
    cxstring column_a;
    cxstring column_b;
    cxstring column_c;
} CSVData;

int main(void) {
    CxMempool* pool = cxBasicMempoolCreate(128);

    FILE *f = fopen("test.csv", "r");
    if (!f) {
        perror("Cannot open file");
        return 1;
    }
    // close the file automatically at pool destruction
    cxMempoolRegister(pool, f, (cx_destructor_func) fclose);

    // create a buffer using the memory pool for destruction
    CxBuffer *content = cxBufferCreate(NULL, 256, pool->allocator, CX_BUFFER_AUTO_EXTEND);

    // read the file into the buffer and turn it into a string
    cx_stream_copy(f, content, (cx_read_func) fread, (cx_write_func) cxBufferWrite);
    fclose(f);
    cxstring contentstr = cx_strn(content->space, content->size);

    // split the string into lines - use the mempool for allocating the target array
    cxstring* lines;
    size_t lc = cx_strsplit_a(pool->allocator, contentstr,
                              CX_STR("\n"), SIZE_MAX, &lines);

    // skip the header and parse the remaining data into a linked list
    // the nodes of the linked list shall also be allocated by the mempool
    CxList* datalist = cxLinkedListCreate(pool->allocator, NULL, sizeof(CSVData));
    for (size_t i = 1 ; i < lc ; i++) {
        if (lines[i].length == 0) continue;
        cxstring fields[3];
        size_t fc = cx_strsplit(lines[i], CX_STR(";"), 3, fields);
        if (fc != 3) {
            fprintf(stderr, "Syntax error in line %zu.\n", i);
            cxMempoolFree(pool);
            return 1;
        }
        CSVData data;
        data.column_a = fields[0];
        data.column_b = fields[1];
        data.column_c = fields[2];
        cxListAdd(datalist, &data);
    }

    // iterate through the list and output the data
    CxIterator iter = cxListIterator(datalist);
    cx_foreach(CSVData*, data, iter) {
        printf("Column A: %.*s | "
               "Column B: %.*s | "
               "Column C: %.*s\n",
               (int)data->column_a.length, data->column_a.ptr,
               (int)data->column_b.length, data->column_b.ptr,
               (int)data->column_c.length, data->column_c.ptr
        );
    }

    // cleanup everything, no manual free() needed 
    cxMempoolFree(pool);

    return 0;
} 
```

## Iterator

*Header file:* [iterator.h](api/iterator_8h.html)

In UCX 3 a new feature has been introduced to write own iterators, that work with the `cx_foreach` macro.
In previous UCX releases there were different hard-coded foreach macros for lists and maps that were not customizable.
Now, creating an iterator is as simple as creating a `CxIterator` struct and setting the fields in a meaningful way.

You do not always need all fields in the iterator structure, depending on your use case.
Sometimes you only need the `index` (for example when iterating over simple lists), and other times you will need the
`slot` and `kv_data` fields (for example when iterating over maps).

If the predefined fields are insufficient for your use case, you can alternatively create your own iterator structure
and place the `CX_ITERATOR_BASE` macro as first member of that structure.

Usually an iterator is not mutating the collection it is iterating over.
In some programming languages it is even disallowed to change the collection while iterating with foreach.
But sometimes it is desirable to remove an element from the collection while iterating over it.
For this purpose, most collections allow the creation of a _mutating_ iterator.
The only differences are, that the `mutating` flag is `true` and the `src_handle` is not const.
On mutating iterators it is allowed to call the `cxFlagForRemoval()` function, which instructs the iterator to remove
the current element from the collection on the next call to `cxIteratorNext()` and clear the flag afterward.
If you are implementing your own iterator, it is up to you to implement this behavior.

## Collection

*Header file:* [collection.h](api/collection_8h.html)

Collections in UCX 3 have several common features.
If you want to implement an own collection data type that uses the same features, you can use the
`CX_COLLECTION_BASE` macro at the beginning of your struct to roll out all members a usual UCX collection has.
```c
struct my_fancy_collection_s {
    CX_COLLECTION_BASE;
    struct my_collection_data_s *data;
};
```
Based on this structure, this header provides some convenience macros for invoking the destructor functions
that are part of the basic collection members.
The idea of having destructor functions within a collection is that you can destroy the collection _and_ the
contents with one single function call.
When you are implementing a collection, you are responsible for invoking the destructors at the right places, e.g.
when removing (and deleting) elements in the collection, clearing the collection, or - the most prominent case -
destroying the collection.

You can always look at the UCX list and map implementations if you need some inspiration.

## List

*Header file:* [list.h](api/list_8h.html)

This header defines a common interface for all list implementations.

UCX already comes with two common list implementations (linked list and array list) that should cover most use cases.
But if you feel the need to implement an own list, the only thing you need to do is to define a struct with a
`struct cx_list_s` as first member, and set an appropriate list class that implements the functionality.
It is strongly recommended that this class is shared among all instances of the same list type, because otherwise
the `cxListCompare` function cannot use the optimized implementation of your class and will instead fall back to
using iterators to compare the contents element-wise.

### Linked List

*Header file:* [linked_list.h](api/linked__list_8h.html)

On top of implementing the list interface, this header also defines several low-level functions that
work with arbitrary structures. 
Low-level functions, in contrast to the high-level list interface, can easily be recognized by their snake-casing.
The function `cx_linked_list_at`, for example, implements a similar functionality like `cxListAt`, but operates
on arbitrary structures.
The following snippet shows how it is used.
All other low-level functions work similarly.
```c
struct node {
    node *next;
    node *prev;
    int data;
};

const ptrdiff_t loc_prev = offsetof(struct node, prev);
const ptrdiff_t loc_next = offsetof(struct node, next);
const ptrdiff_t loc_data = offsetof(struct node, data);

struct node a = {0}, b = {0}, c = {0}, d = {0};
cx_linked_list_link(&a, &b, loc_prev, loc_next);
cx_linked_list_link(&b, &c, loc_prev, loc_next);
cx_linked_list_link(&c, &d, loc_prev, loc_next);

cx_linked_list_at(&a, 0, loc_next, 2); // returns pointer to c
```

### Array List

*Header file:* [array_list.h](api/array__list_8h.html)

Since low-level array lists are just plain arrays, there is no need for such many low-level functions as for linked
lists.
However, there is one extremely powerful function that can be used for several complex tasks: `cx_array_copy`.
The full signature is shown below:
```c
int cx_array_copy(
        void **target,
        void *size,
        void *capacity,
        unsigned width,
        size_t index,
        const void *src,
        size_t elem_size,
        size_t elem_count,
        struct cx_array_reallocator_s *reallocator
);
```
The `target` argument is a pointer to the target array pointer.
The reason for this additional indirection is that this function writes
back the pointer to the possibly reallocated array.
The next two arguments are pointers to the `size` and `capacity` of the target array for which the width
(in bits) is specified in the `width` argument.

On a successful invocation, the function copies `elem_count` number of elements, each of size `elem_size` from
`src` to `*target` and uses the `reallocator` to extend the array when necessary.
Finally, the size, capacity, and the pointer to the array are all updated and the function returns zero.

A few things to note:
* `*target` and `src` can point to the same memory region, effectively copying elements within the array with `memmove`
* `*target` does not need to point to the start of the array, but `size` and `capacity` always start counting from the
  position, `*target` points to - in this scenario, the need for reallocation must be avoided for obvious reasons
* `index` does not need to be within size of the current array
* `index` does not even need to be within the capacity of the array
* `width` must be one of 8, 16, 32, 64 (only on 64-bit systems), or zero (in which case the native word width is used) 

If you just want to add one single element to an existing array, you can use the macro `cx_array_add()`.
You can use `CX_ARRAY_DECLARE()` to declare the necessary fields within a structure and then use the
`cx_array_simple_*()` convenience macros to reduce code overhead.
The convenience macros automatically determine the width of the size/capacity variables.

## Map

*Header file:* [map.h](api/map_8h.html)

Similar to the list interface, the map interface provides a common API for implementing maps.
There are some minor subtle differences, though.

First, the `remove` method is not always a destructive removal.
Instead, the last argument is a Boolean that indicates whether the element shall be destroyed or returned.
```c
void *(*remove)(CxMap *map, CxHashKey key, bool destroy);
```
When you implement this method, you are either supposed to invoke the destructors and return `NULL`,
or just remove the element from the map and return it.

Secondly, the iterator method is a bit more complete. The signature is as follows:
```c
CxIterator (*iterator)(const CxMap *map, enum cx_map_iterator_type type);
```
There are three map iterator types: for values, for keys, for pairs.
Depending on the iterator type requested, you need to create an iterator with the correct methods that
return the requested thing.
There are no automatic checks to enforce this - it's completely up to you.
If you need inspiration on how to do that, check the hash map implementation that comes with UCX.

### Hash Map

*Header file:* [hash_map.h](api/hash__map_8h.html)

UCX provides a basic hash map implementation with a configurable amount of buckets.
If you do not specify the number of buckets, a default of 16 buckets will be used.
You can always rehash the map with `cxMapRehash()` to change the number of buckets to something more efficient,
but you need to be careful, because when you use this function you are effectively locking into using this
specific hash map implementation, and you would need to remove all calls to this function when you want to
exchange the concrete map implementation with something different.

## Utilities

*Header file:* [utils.h](api/utils_8h.html)

UCX provides some utilities for routine tasks.

The most useful utilities are the *stream copy* functions, which provide a simple way to copy all - or a
bounded amount of - data from one stream to another. Since the read/write functions of a UCX buffer are
fully compatible with stream read/write functions, you can easily transfer data from file or network streams to
a UCX buffer or vice-versa.

The following example shows, how easy it is to read the contents of a file into a buffer:
```c
FILE *inputfile = fopen(infilename, "r");
if (inputfile) {
    CxBuffer fbuf;
    cxBufferInit(&fbuf, NULL, 4096, NULL, CX_BUFFER_AUTO_EXTEND);
    cx_stream_copy(inputfile, &fbuf,
                   (cx_read_func) fread,
                   (cx_write_func) cxBufferWrite);
    fclose(inputfile);
    
    // ... do something meaningful with the contents ...
    
    cxBufferDestroy(&fbuf);
} else {
    perror("Error opening input file");
    if (fout != stdout) {
        fclose(fout);
    }
}
```

### Printf Functions

*Header file:* [printf.h](api/printf_8h.html)

In this utility header you can find `printf()`-like functions that can write the formatted output to an arbitrary
stream (or UCX buffer, resp.), or to memory allocated by an allocator within a single function call.
With the help of these convenience functions, you do not need to `snprintf` your string to a temporary buffer anymore,
plus you do not need to worry about too small buffer sizes, because the functions will automatically allocate enough
memory to contain the entire formatted string.

### Compare Functions

*Header file:* [compare.h](api/compare_8h.html)

This header file contains a collection of compare functions for various data types.
Their signatures are designed to be compatible with the `cx_compare_func` function pointer type.

mercurial