| 1 # String |
1 # String |
| 2 |
2 |
| 3 <warning> |
3 UCX strings store character arrays together with a length and come in two variants: immutable (`cxstring`) and mutable (`cxmutstr`). |
| 4 Outdated Section - will be updated soon! |
4 |
| 5 </warning> |
5 In general, UCX strings are *not* necessarily zero-terminated. |
| 6 |
6 If a function guarantees to return a zero-terminated string, it is explicitly mentioned in the documentation. |
| 7 UCX strings come in two variants: immutable (`cxstring`) and mutable (`cxmutstr`). |
7 As a rule of thumb, you _should not_ pass a character array of a UCX string structure to another API without explicitly |
| 8 The functions of UCX are designed to work with immutable strings by default but in situations where it is necessary, |
|
| 9 the API also provides alternative functions that work directly with mutable strings. |
|
| 10 Functions that change a string in-place are, of course, only accepting mutable strings. |
|
| 11 |
|
| 12 When you are using UCX functions, or defining your own functions, you are sometimes facing the "problem", |
|
| 13 that the function only accepts arguments of type `cxstring` but you only have a `cxmutstr` at hand. |
|
| 14 In this case you _should not_ introduce a wrapper function that accepts the `cxmutstr`, |
|
| 15 but instead you should use the `cx_strcast()` function to cast the argument to the correct type. |
|
| 16 |
|
| 17 In general, UCX strings are **not** necessarily zero-terminated. If a function guarantees to return zero-terminated |
|
| 18 string, it is explicitly mentioned in the documentation of the respective function. |
|
| 19 As a rule of thumb, you _should not_ pass the strings of a UCX string structure to another API without explicitly |
|
| 20 ensuring that the string is zero-terminated. |
8 ensuring that the string is zero-terminated. |
| 21 |
9 |
| 22 <!-- |
|
| 23 ## Basics |
10 ## Basics |
| 24 |
11 |
| 25 ### cx_mutstr |
12 > To make documentation simpler, we introduce the pseudo-type `AnyStr` with the meaning that |
| 26 ### cx_mutstrn |
13 > both `cxstring` and `cxmutstr` are accepted for that argument. |
| 27 ### cx_str |
14 > The implementation is actually hidden behind a macro which uses `cx_strcast()` to guarantee compatibility. |
| 28 ### cx_strn |
15 {style="note"} |
| 29 ### cx_strcast |
16 |
| 30 ### cx_strfree |
17 ```C |
| 31 ### cx_strfree_a |
18 #include <cx/string.h> |
| 32 ### cx_strdup |
19 |
| 33 ### cx_strdup_a |
20 struct cx_string_s {const char *ptr; size_t length;}; |
| 34 ### cx_strlen |
21 |
| 35 ### cx_strtrim |
22 struct cx_mutstr_s {char *ptr; size_t length;}; |
| 36 ### cx_strtrim_m |
23 |
| 37 ### cx_strlower |
24 typedef struct cx_string_s cxstring; |
| 38 ### cx_strupper |
25 |
| |
26 typedef struct cx_mutstr_s cxmutstr; |
| |
27 |
| |
28 cxstring cx_str(const char *cstring); |
| |
29 |
| |
30 cxstring cx_strn(const char *cstring, size_t length); |
| |
31 |
| |
32 cxmutstr cx_mutstr(char *cstring); |
| |
33 |
| |
34 cxmutstr cx_mutstrn(char *cstring, size_t length); |
| |
35 |
| |
36 cxstring cx_strcast(AnyStr str); |
| |
37 |
| |
38 cxmutstr cx_strdupa(AnyStr string); |
| |
39 |
| |
40 cxmutstr cx_strdup_a(const CxAllocator *allocator, AnyStr string); |
| |
41 |
| |
42 void cx_strfree(cxmutstr *str); |
| |
43 |
| |
44 void cx_strfree_a(const CxAllocator *alloc, cxmutstr *str); |
| |
45 ``` |
| |
46 |
| |
47 > Documentation work in progress. |
| |
48 >{style="warning"} |
| |
49 |
| |
50 > When you want to convert a string _literal_ into a UCX string, you can also use the `CX_STR(lit)` macro. |
| |
51 > This macro uses the fact that `sizeof(lit)` for a string literal `lit` is always the string length plus one, |
| |
52 > effectively saving an invocation of `strlen()`. |
| |
53 > However, this only works for literals - in all other cases you must use `cx_str()` or `cx_strn`. |
| 39 |
54 |
| 40 ## Comparison |
55 ## Comparison |
| 41 |
56 |
| 42 ### cx_strcmp |
57 ```C |
| 43 ### cx_strcmp_p |
58 #include <cx/string.h> |
| 44 ### cx_strcasecmp |
59 |
| 45 ### cx_strcasecmp_p |
60 int cx_strcmp(cxstring s1, cxstring s2); |
| 46 ### cx_strprefix |
61 |
| 47 ### cx_strsuffix |
62 int cx_strcmp_p(const void *s1, const void *s2); |
| 48 ### cx_strcaseprefix |
63 |
| 49 ### cx_strcasesuffix |
64 bool cx_strprefix(cxstring string, cxstring prefix); |
| |
65 |
| |
66 bool cx_strsuffix(cxstring string, cxstring suffix); |
| |
67 |
| |
68 int cx_strcasecmp(cxstring s1, cxstring s2); |
| |
69 |
| |
70 int cx_strcasecmp_p(const void *s1, const void *s2); |
| |
71 |
| |
72 bool cx_strcaseprefix(cxstring string, cxstring prefix); |
| |
73 |
| |
74 bool cx_strcasesuffix(cxstring string, cxstring suffix); |
| |
75 ``` |
| |
76 |
| |
77 > Documentation work in progress. |
| |
78 >{style="warning"} |
| 50 |
79 |
| 51 ## Concatenation |
80 ## Concatenation |
| 52 |
81 |
| 53 ### cx_strcat_ma |
82 ```C |
| |
83 #include <cx/string.h> |
| |
84 |
| |
85 cxmutstr cx_strcat(size_t count, ... ); |
| |
86 |
| |
87 cxmutstr cx_strcat_a(const CxAllocator *alloc, size_t count, ... ); |
| |
88 |
| |
89 cxmutstr cx_strcat_m(cxmutstr str, size_t count, ... ); |
| |
90 |
| |
91 cxmutstr cx_strcat_ma(const CxAllocator *alloc, |
| |
92 cxmutstr str, size_t count, ... ); |
| |
93 |
| |
94 size_t cx_strlen(size_t count, ...); |
| |
95 ``` |
| |
96 |
| |
97 > Documentation work in progress. |
| |
98 >{style="warning"} |
| 54 |
99 |
| 55 ## Find Characters and Substrings |
100 ## Find Characters and Substrings |
| 56 |
101 |
| 57 ### cx_strchr |
102 ```C |
| 58 ### cx_strchr_m |
103 #include <cx/string.h> |
| 59 ### cx_strrchr |
104 |
| 60 ### cx_strrchr_m |
105 cxstring cx_strchr(cxstring string, int chr); |
| 61 ### cx_strstr |
106 |
| 62 ### cx_strstr_m |
107 cxmutstr cx_strchr_m(cxmutstr string, int chr); |
| 63 ### cx_strsubs |
108 |
| 64 ### cx_strsubsl |
109 cxstring cx_strrchr(cxstring string,int chr); |
| 65 ### cx_strsubsl_m |
110 |
| 66 ### cx_strsubs_m |
111 cxmutstr cx_strrchr_m(cxmutstr string, int chr); |
| |
112 |
| |
113 cxstring cx_strstr(cxstring haystack, cxstring needle); |
| |
114 |
| |
115 cxmutstr cx_strstr_m(cxmutstr haystack, cxstring needle); |
| |
116 |
| |
117 cxstring cx_strsubs(cxstring string, size_t start); |
| |
118 |
| |
119 cxstring cx_strsubsl(cxstring string, size_t start, size_t length); |
| |
120 |
| |
121 cxmutstr cx_strsubs_m(cxmutstr string, size_t start); |
| |
122 |
| |
123 cxmutstr cx_strsubsl_m(cxmutstr string, size_t start, size_t length); |
| |
124 |
| |
125 cxstring cx_strtrim(cxstring string); |
| |
126 |
| |
127 cxmutstr cx_strtrim_m(cxmutstr string); |
| |
128 ``` |
| |
129 |
| |
130 > Documentation work in progress. |
| |
131 >{style="warning"} |
| 67 |
132 |
| 68 ## Replace Substrings |
133 ## Replace Substrings |
| 69 |
134 |
| 70 ### cx_strreplacen_a |
135 ```C |
| |
136 #include <cx/string.h> |
| |
137 |
| |
138 cxmutstr cx_strreplace(cxstring str, cxstring pattern, cxstring repl); |
| |
139 |
| |
140 cxmutstr cx_strreplace_a(const CxAllocator *allocator, cxstring str, |
| |
141 cxstring pattern, cxstring repl); |
| |
142 |
| |
143 cxmutstr cx_strreplacen(cxstring str, cxstring pattern, cxstring repl, |
| |
144 size_t replmax); |
| |
145 |
| |
146 cxmutstr cx_strreplacen_a(const CxAllocator *allocator, cxstring str, |
| |
147 cxstring pattern, cxstring repl, size_t replmax); |
| |
148 ``` |
| |
149 |
| |
150 > Documentation work in progress. |
| |
151 >{style="warning"} |
| 71 |
152 |
| 72 ## Basic Splitting |
153 ## Basic Splitting |
| 73 |
154 |
| 74 ### cx_strsplit |
155 ```C |
| 75 ### cx_strsplit_a |
156 #include <cx/string.h> |
| 76 ### cx_strsplit_m |
157 |
| 77 ### cx_strsplit_ma |
158 size_t cx_strsplit(cxstring string, cxstring delim, |
| |
159 size_t limit, cxstring *output); |
| |
160 |
| |
161 size_t cx_strsplit_a(const CxAllocator *allocator, |
| |
162 cxstring string, cxstring delim, |
| |
163 size_t limit, cxstring **output); |
| |
164 |
| |
165 size_t cx_strsplit_m(cxmutstr string, cxstring delim, |
| |
166 size_t limit, cxmutstr *output); |
| |
167 |
| |
168 size_t cx_strsplit_ma(const CxAllocator *allocator, |
| |
169 cxmutstr string, cxstring delim, |
| |
170 size_t limit, cxmutstr **output); |
| |
171 ``` |
| |
172 |
| |
173 > Documentation work in progress. |
| |
174 >{style="warning"} |
| 78 |
175 |
| 79 ## Complex Tokenization |
176 ## Complex Tokenization |
| 80 |
177 |
| 81 ### cx_strtok_ |
178 ```C |
| 82 ### cx_strtok_delim |
179 #include <cx/string.h> |
| 83 ### cx_strtok_next |
180 |
| 84 ### cx_strtok_next_m |
181 CxStrtokCtx cx_strtok(AnyStr str, AnyStr delim, size_t limit); |
| |
182 |
| |
183 void cx_strtok_delim(CxStrtokCtx *ctx, |
| |
184 const cxstring *delim, size_t count); |
| |
185 |
| |
186 bool cx_strtok_next(CxStrtokCtx *ctx, cxstring *token); |
| |
187 |
| |
188 bool cx_strtok_next_m(CxStrtokCtx *ctx, cxmutstr *token); |
| |
189 ``` |
| |
190 |
| |
191 > Documentation work in progress. |
| |
192 >{style="warning"} |
| 85 |
193 |
| 86 ## Conversion to Numbers |
194 ## Conversion to Numbers |
| 87 |
195 |
| 88 ### cx_strtod_lc_ |
196 For each integer type, as well as `float` and `double`, there are functions to convert a UCX string to a number of that type. |
| 89 ### cx_strtof_lc_ |
197 |
| 90 ### cx_strtoi16_lc_ |
198 Integer conversion comes in two flavours: |
| 91 ### cx_strtoi32_lc_ |
199 ```C |
| 92 ### cx_strtoi64_lc_ |
200 int cx_strtoi(AnyStr str, int *output, int base); |
| 93 ### cx_strtoi8_lc_ |
201 |
| 94 ### cx_strtoi_lc_ |
202 int cx_strtoi_lc(AnyStr str, int *output, int base, |
| 95 ### cx_strtol_lc |
203 const char *groupsep); |
| 96 ### cx_strtoll_lc |
204 ``` |
| 97 ### cx_strtos_lc |
205 |
| 98 ### cx_strtou16_lc |
206 The basic variant takes a string of any UCX string type, a pointer to the `output` integer, and the `base` (one of 2, 8, 10, or 16). |
| 99 ### cx_strtou32_lc |
207 Conversion is attempted with respect to the specified `base` and respects possible special notations for that base. |
| 100 ### cx_strtou64_lc |
208 Hexadecimal numbers may be prefixed with `0x`, `x`, or `#`, and binary numbers may be prefixed with `0b` or `b`. |
| 101 ### cx_strtou8_lc |
209 |
| 102 ### cx_strtou_lc |
210 The `_lc` versions of the integer conversion functions are equivalent, except that they allow the specification of an |
| 103 ### cx_strtoul_lc |
211 array of group separator chars, each of which is simply ignored during conversion. |
| 104 ### cx_strtoull_lc |
212 The default group separator for the basic version is a comma `,`. |
| 105 ### cx_strtous_lc |
213 |
| 106 ### cx_strtouz_lc |
214 The signature for the floating point conversions is quite similar: |
| 107 ### cx_strtoz_lc |
215 ```C |
| 108 --> |
216 int cx_strtof(AnyStr str, float *output); |
| |
217 |
| |
218 int cx_strtof_lc(AnyStr str, float *output, |
| |
219 char decsep, const char *groupsep); |
| |
220 ``` |
| |
221 |
| |
222 The two differences are that the floating point versions do not support different bases, |
| |
223 and the `_lc` variant allows specifying not only an array of group separators, |
| |
224 but also the character used for the decimal separator. |
| |
225 |
| |
226 In the basic variant, the group separator is again a comma `,`, and the decimal separator is a dot `.`. |
| |
227 |
| |
228 > The floating point conversions of UCX 3.1 do not achieve the same precision as standard library implementations |
| |
229 > which usually use more sophisticated algorithms. |
| |
230 > The precision might increase in future UCX releases, |
| |
231 > but until then be aware of slight inaccuracies, in particular when working with `double`. |
| |
232 {style="warning"} |
| |
233 |
| |
234 > The UCX string to number conversions are intentionally not considering any locale settings |
| |
235 > and are therefore independent of any global state. |
| |
236 {style="note"} |
| 109 |
237 |
| 110 <seealso> |
238 <seealso> |
| 111 <category ref="apidoc"> |
239 <category ref="apidoc"> |
| 112 <a href="https://ucx.sourceforge.io/api/string_8h.html">string.h</a> |
240 <a href="https://ucx.sourceforge.io/api/string_8h.html">string.h</a> |
| 113 </category> |
241 </category> |