tips for interacting with c

(originally posted here: https://zig.news/liyu1981/tips-for-interacting-with-c-1oo8)

(Following is a summary for myself as I progressed day by day in getting better on this. Hopefully will be useful to others :))

One thing attracted me to zig is because it is a much better c and really seriously in working with c or even legacy c code. There is a whole section dedicated in zig language book. But as I am following with it, more and more questions just surfaces themselves. Some of them have answers, while others may not yet. But anyway, let below list to be the Catch-them-all section.

All codes showed below are uploaded to https://github.com/liyu1981/zig_c_tips

General usage points

1. convert c headers to zig

TLDR; with: zig translate-c hello.h, or use @cImport in zig source.

But it will output a lot, so sometimes I use prefix names and grep to help to reduce them.

for example

#include <stdio.h>
int my_create_a_hello_string(char** buf);

If directly zig translate-c will have too many lines (not very necessary), I will use zig translate-c hello.h | grep "my_" to get following result

pub extern fn my_create_a_hello_string(buf: [*c][*c]u8) c_int;

For complex .h file, I have wrote a small tool called translate_c_extract, which can accept something like follows

#include <stddef.h>
#include <stdint.h>

// translate-c provide-begin: /#define\s(?<tk>\S+)\s.+/
#define CONST_A 'a'
#define CONST_ONE 1
// translate-c provide-end: /#define\s(?<tk>\S+)\s.+/

// translate-c provide: RegexMatchResult
typedef struct {
    size_t start;
    size_t len;
} Loc;

// translate-c provide: get_last_error_message
void get_last_error_message(Loc* loc);
// translate-c provide: my_create_a_hello_string
int my_create_a_hello_string(char** buf);

#endif

to follows

pub extern fn get_last_error_message(loc: [*c]Loc) void;
pub extern fn my_create_a_hello_string(buf: [*c][*c]u8) c_int;
pub const CONST_A = 'a';
pub const CONST_ONE = @as(c_int, 1);

2. convert .zig to .h

TLDR; zig provides this feature, but currently not as smooth as I hoped.

// hello.zig
pub export fn hello(buf: [*c]u8, buf_len: usize) u8 {
    const h = "hello";
    const to_copy_len = @min(buf_len, h.len);
    for (0..to_copy_len) |i| buf[i] = h[i];
    return to_copy_len;
}

To get a hello.h, we will need do zig build-lib -femit-h hello.zig. An file hello.h will be emitted in the same dir of hello.zig. But it is also a bit of messy. In particular, it will be

#include "zig.h"
zig_extern uint8_t hello(uint8_t *const a0, uintptr_t const a1);
...lots of other fns...

only the hello line is what we need. And if taking the 2 lines to c/cpp compiler (zig cc/clang/gcc), there will be errors around zig.h. And as discussed in here, now there is so far not a good way of getting this done. So my solutions is to take out hello line and get following .h

// hello.h
#include <stdint.h>
#define zig_extern
zig_extern uint8_t hello(uint8_t *const a0, uintptr_t const a1);

This file will then work with no problem in other c/cpp compiler.

With the headers and zig files generated, we may find that not everything can be mapped from zig to c or vice versa. And pay attention, what I am talking about is whether zig can operate on c ABI or vice versa (they are guaranteed working by zig‘s design), but those syntax sugar/good parts of zig.

In the rest of this note, I will try to list them one by one

Use case and example

pointers

Most scalar data types have their c counterparts, so just look up in language spec. They are simple to deal with. In reverse direction, zig also provides common c types like c_int etc, as their size (or alignment) is platform dependent. Again, check language spec here.

One tricky thing worth talking more is pointers. zig has special [*c]T for c pointer. So

  1. c int* will be zig [*c]c_int, or c uint8_t* will be zig [*c]u8
  2. c char** will be zig [*c][*c]c_char, and c char*** will be zig [*c][*c][*c]c_char
  3. const applies, like
// c                         zig
// pointer to u8, pointer & value mutable
uint8_t * p1;             => var p1: *u8 = undefined;
// pointer to const u8, only pointer mutable
const uint8_t * p2;       => var p2: *const u8 = undefined;
// const pointer to u8, only value mutable
uint8_t * const p3;       => const p3: *u8 = undefined;
// const pointer to const u8, pointer & value immutable
const uint8_t * const p4; => const p4: *const u8 = undefined;

(wonder example from Pointers and constness in Zig (and why it is confusing to a C programmer))

from zig, call c

simple char pointers

// ptr.c
#include <stdio.h>

void hello_c(const char* str) {
    printf("%s\n", str);
}

in zig can use the ptr inside slice

// ptr.zig
pub extern fn hello_c(str: [*c]const u8) void;
pub fn main() void {
    const msg = "world";
    hello_c(msg.ptr);
}
zig cc -c ptr.c
zig run ptr.zig ptr.o

then how about char** or char* msgs[]

// ptr.h
#include <stdio.h>
void hello_all_c(const char* msgs[], int howmany) {
    for (int i = 0; i < howmany; i++) {
        printf("%s\n", msgs[i]);
    }
}

This time a bit of more steps, as the normal slice of zig we usually have no [*c]T ready. So need to convert them, and again use ptr from slice.

pub extern fn hello_all_c(msgs: [*c][*c]const u8, howmany: c_int) void;
pub fn main() void {
    var msgs = [_][]const u8{ "hello", "world" };
    _ = &msgs;
    var msgs_for_c: [2][*c]const u8 = undefined;
    msgs_for_c[0] = msgs[0].ptr;
    msgs_for_c[1] = msgs[1].ptr;
    hello_all_c(msgs_for_c[0..].ptr, 2);
}
zig cc -c ptr.c
zig run ptr.zig ptr.o

from c, call zig

// ptr.zig
const std = @import("std");
pub export fn hello(str: [*c]const u8, len: usize) void {
    std.debug.print("{s}\n", .{str[0..len]});
}

generate ptr.h and clean it up as described above.

// ptr.h
#include <stdint.h>
#define zig_extern
zig_extern void hello(uint8_t const *const a0, uintptr_t const a1);

(notice that zig str is with uint8_t const *const type, not char*)

then in ptr.c

#include "ptr.h"
int main() {
    char* str = "world";
    hello((uint8_t*)str, 5);
    return 0;
}

and run as zig cc ptr.c libptr.a && ./a.out

Notice that we casted char* to (uint8_t*) in c, otherwise there will be a warning but it will work too.

Now let us try char* msgs[]

// ptr.zig
const std = @import("std");
pub export fn hello_all(msgs: [*c][*c]const u8, len: usize) void {
    for (0..len) |i| {
        var msg_ptr = msgs[i];
        var j: usize = 0;
        while (true) : (j += 1) {
            if (msg_ptr[j] == 0) {
                break;
            }
        }
        std.debug.print("{s}\n", .{msg_ptr[0..j]});
    }
}

noice this time zig implementation is more complex, as c pointer is not carrying the len information (and we can not use zig slice in export fn), so we will need to manually find each msg‘s len by finding the ‘0’ sentinel. After that create a slice from c pointer then feed to print.

The generated and cleaned ptr.h is as follows

// ptr.h
#include <stdint.h>
#define zig_extern
zig_extern void hello(uint8_t const *const a0, uintptr_t const a1);
zig_extern void hello_all(uint8_t const **const a0, uintptr_t const a1);

and finally ptr.c

// ptr.c
#include "ptr.h"

int main() {
    char* msgs[] = {"hello", "world"};
    hello_all((const uint8_t**)msgs, 2);
    return 0;
}

we will still need casting in c as char is not uint8_t.

allocator

They can not be used in exported zig fn, as

hello.zig:10:21: error: parameter of type 'mem.Allocator' not allowed in function with calling convention 'C'
pub export fn test1(allocator: std.mem.Allocator) void {
                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
hello.zig:10:21: note: only extern structs and ABI sized packed structs are extern compatible

because allocator is more than just a function, but a lot more. I personally find that reading std.heap.ArenaAllocator source is an extra good way of understanding what is allocator. Its source is concise and short so easy to digest allocator from high level on what it is doning.

opaque, void* and *opaque

opaque structs, and void* are very common in any important and mature libs of c. It is a widely used technique in c to hide its internal implementation. For examle, if you every look into use SQLite with its c API, you will find like follows

// from https://sqlite.org/c3ref/prepare.html
int sqlite3_prepare(
  sqlite3 *db,            /* Database handle */
  const char *zSql,       /* SQL statement, UTF-8 encoded */
  int nByte,              /* Maximum length of zSql in bytes. */
  sqlite3_stmt **ppStmt,  /* OUT: Statement handle */
  const char **pzTail     /* OUT: Pointer to unused portion of zSql */
);

but try to locate sqlite3 type in sqlite3.h, this is what we will find

// https://github.com/GaloisInc/sqlite/blob/master/sqlite3.5/sqlite3.h#L169
typedef struct sqlite3 sqlite3;

and in nowhere we will find how struct sqlite3 is defined in sqlite3.h as c allows this definition, and it is called opaque. (the real struct sqlite3 is defined here, which is only avaliable in full source code).

void* is usually used in c lib for handle — some resource could later be generated into more than one types. So, user can provide a simple pointer, which is a void* and let lib to deal with it. Example like in PCRE2 lib’s PCRE2.h, we can find something like below

// https://github.com/PCRE2Project/pcre2/blob/master/src/pcre2.h.in#L576
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION pcre2_config(uint32_t, void *);

and if read its doc, the 2nd param is where to change this config, which could then be many different data structs.

call opaque and *opaque from zig

With the knowledge gained above, this part should not be hard

// opaque.h
typedef struct Op* Op_t;
Op_t new_op(const char* name);
Op_t new_op_all(const char** names, int len);
void free_op(Op_t op);
void hello(Op_t op);
void hello_all(Op_t op);
// opaque.c
#include "opaque.h"

#include <stdio.h>
#include <stdlib.h>

typedef struct Op {
    char *name;
    char **names;
    int howmany;
} *Op_t;

Op_t new_op(const char *name) {
    Op_t op = malloc(sizeof(struct Op));
    if (op != NULL) {
        op->name = name;
    }
    return op;
}

Op_t new_op_all(const char **names, int howmany) {
    Op_t op = malloc(sizeof(struct Op));
    if (op != NULL) {
        op->names = names;
        op->howmany = howmany;
    }
    return op;
}

void free_op(Op_t op) {
    free(op);
}

void hello(Op_t op) {
    printf("%s\n", op->name);
}

void hello_all(Op_t op) {
    for (int i = 0; i < op->howmany; i++) {
        printf("%s\n", op->names[i]);
    }
}

Notice that our c code provides new_op* functions, which is usually our c lib will provide to create opaque structs.

Translate with zig translate-c and clean up, we will have

pub const struct_Op = opaque {};
pub const Op_t = ?*struct_Op;
pub extern fn new_op(name: [*c]const u8) Op_t;
pub extern fn new_op_all(names: [*c][*c]const u8, howmany: c_int) Op_t;
pub extern fn free_op(op: Op_t) void;
pub extern fn hello(op: Op_t) void;
pub extern fn hello_all(op: Op_t) void;

and then can easily write some code to call our c functions

const std = @import("std");

pub const struct_Op = opaque {};
pub const Op_t = ?*struct_Op;
pub extern fn new_op(name: [*c]const u8) Op_t;
pub extern fn new_op_all(names: [*c][*c]const u8, howmany: c_int) Op_t;
pub extern fn free_op(op: Op_t) void;
pub extern fn hello(op: Op_t) void;
pub extern fn hello_all(op: Op_t) void;

pub fn main() !void {
    {
        const maybe_op: Op_t = new_op("world");
        if (maybe_op) |op| {
            hello(op);
            free_op(op);
        }
    }
    {
        const names = [_][]const u8{ "hello", "world" };
        var names_for_c: [2][*c]const u8 = undefined;
        names_for_c[0] = names[0].ptr;
        names_for_c[1] = names[1].ptr;
        const maybe_op: Op_t = new_op_all(names_for_c[0..].ptr, 2);
        if (maybe_op) |op| {
            hello_all(op);
            free_op(op);
        }
    }
}

zig cc -c opaque.c then zig run opaque.zig opaque.o, should work.

But may be we want to do some hacky thing, like modify or create opaque from outside zig? can we just manually redefine a struct in zig so that we can access the child fields? Sounds possible, but on the other side, zig and c compiler has different opinions on how to arrage the memory layout for child fields of struct. This may fail. There is a extern keyword in zig doc, but as I tried so far, not yet working.

const std = @import("std");

//pub const struct_Op = opaque {};
pub const struct_Op = extern struct {
    name: [*c]u8,
    names: [*c][*c]u8,
    howmany: c_int,
};

pub const Op_t = ?*struct_Op;
pub extern fn new_op(name: [*c]const u8) Op_t;
pub extern fn new_op_all(names: [*c][*c]const u8, howmany: c_int) Op_t;
pub extern fn free_op(op: Op_t) void;
pub extern fn hello(op: Op_t) void;
pub extern fn hello_all(op: Op_t) void;

pub fn main() !void {
    {
        const names = [_][]const u8{ "hello", "world" };
        var zig: [3:0]u8 = undefined;
        zig[0] = 'z';
        zig[1] = 'i';
        zig[2] = 'g';
        zig[3] = 0;
        var names_for_c: [2][*c]const u8 = undefined;
        names_for_c[0] = names[0].ptr;
        names_for_c[1] = names[1].ptr;
        var maybe_op = new_op_all(names_for_c[0..].ptr, 2);
        _ = &maybe_op;
        if (maybe_op != null) {
            std.debug.print("{any}\n", .{maybe_op.?.names[2]});
            var zig_s = zig[0..3];
            _ = &zig_s;
            std.debug.print("{any}\n", .{zig_s});
            maybe_op.?.names[2] = zig_s.ptr;
            hello_all(maybe_op.?);
            //free_op(maybe_op.?);
        }
    }
}

above code is what I have tried, but every time will cause SIG_TRAP, which as further investigated, because of the modification of names[2 has ruined the overall struct Op.

call opaque and *opaque from c

This does not make much sense as opaque is specifically designed in zig for c lib using this technique. For zig, seems there is no need to use this technique as zig has pub keyword to control visible and invisible code to outside.

call void* from zig

quite similar to opaque. Just watch the output of zig translate-c, to use *anyopaque for void*. Example is as follows

// voidstart.h
void* set(const char* name);
void* set_all(const char** names, int howmany);
void hello(void* h);
void hello_all(void* h);
// voidstar.c
#include <stdio.h>
#include <stdlib.h>

char* name_info;

void* set(const char* name) {
    name_info = name;
    return (void*)name_info;
}

struct names_info_t {
    const char** names;
    int howmany;
} names_info;

void* set_all(const char** names, int howmany) {
    names_info.names = names;
    names_info.howmany = howmany;
    return (void*)&names_info;
}

void hello(void* h) {
    printf("%s\n", (char*)h);
}

void hello_all(void* h) {
    struct names_info_t* ni = (struct names_info_t*)h;
    for (int i = 0; i < ni->howmany; i++) {
        printf("%s\n", ni->names[i]);
    }
}
// voidstar_z.zig
const std = @import("std");

pub extern fn set(name: [*c]const u8) ?*anyopaque;
pub extern fn set_all(names: [*c][*c]const u8, howmany: c_int) ?*anyopaque;
pub extern fn hello(h: ?*anyopaque) void;
pub extern fn hello_all(h: ?*anyopaque) void;

pub fn main() !void {
    {
        var h = set("hello");
        _ = &h;
        hello(h);
    }
    {
        const names = [_][]const u8{ "hello", "world" };
        var names_for_c: [2][*c]const u8 = undefined;
        names_for_c[0] = names[0].ptr;
        names_for_c[1] = names[1].ptr;
        var h = set_all(names_for_c[0..].ptr, 2);
        _ = &h;
        hello_all(h);
    }
}