Do not rely on the pointer to content of returned struct

(originally posted here: https://zig.news/liyu1981/do-not-rely-on-the-pointer-to-content-of-returned-struct-1ijc)

Recently in my project the following problem caused me sometime in wondering why? (so note here).

Let us look at following simple program

// main.zig
const std = @import("std");

const MyBuf = struct {
    buf: [8]u8 = undefined,
    buf_ptr: [*]u8 = undefined,

    pub fn init() MyBuf {
        var b = MyBuf{}; // mark 1
        for (0..8) |i| b.buf[i] = 0;
        b.buf_ptr = &b.buf; // mark 2
        return b;
    }
};

pub fn main() !u8 {
    var b = MyBuf.init(); // mark 3
    b.buf_ptr[0] = 'h';
    b.buf_ptr[1] = 'i';
    std.debug.print("1: {s}\n", .{b.buf_ptr[0..2]});
    std.debug.print("2: {s}\n", .{b.buf});
    return 0;
}

Seems that this program is: assigning chars from outside to a buffer, and print them. Not quite useful in this example, but in practice this could be part of complex buf/cache design (and of course I will call it bad design).

Look at what this program will result

Running results

Run 5 times, and not once it prints hi as expected, not the slice created by buf_ptr, nor buf. And sometimes in larger program containing this code, one of the printing will have hi, which is just making me more confusing.

The reason? Actually is quite simple: the buf_ptr is pointing to memory should never be touched, so operating on it will just mess up with memory. (and yes, zig actually allows us do that:)). This reason may still sounds confusing, let me explain in detail

  1. in mark 1 we created our struct with buf, and in mark 2 our buf_ptr points to buf, so far so correct.

  2. after return from init function, in mark 3, b is actually a copy of b in mark 1, because zig/c copy values in both param and return value. Though b in mark 3 is now a new copy at new memory location, buf_ptr is still pointing to the old memory place, which will be reused by other code (then cause a segmentation fault), or not used by other code (in unlikely short time). The latter case will cause more damage as it will produce weird bugs in runtime.

Ok. Lesson learned. Then how to fix the code.

One way is to always assign pointer when need to use.

// main.zig
const std = @import("std");

const MyBuf = struct {
    buf: [8]u8 = undefined,
    buf_ptr: [*]u8 = undefined,

    pub fn init() MyBuf {
        var b = MyBuf{};
        for (0..8) |i| b.buf[i] = 0;
        return b;
    }
};

pub fn main() !u8 {
    var b = MyBuf.init();
    b.buf_ptr = &b.buf; // <-- change!
    b.buf_ptr[0] = 'h';
    b.buf_ptr[1] = 'i';
    std.debug.print("1: {s}\n", .{b.buf_ptr[0..2]});
    std.debug.print("2: {s}\n", .{b.buf});
    return 0;
}

Or a second way is to return pointer to instance with init fn. Like

// main.zig
const std = @import("std");

const MyBuf = struct {
    buf: [8]u8 = undefined,
    buf_ptr: [*]u8 = undefined,

    pub fn init(allocator: std.mem.Allocator) !*MyBuf {
        var b = try allocator.create(MyBuf);  // <-- change!
        for (0..8) |i| b.buf[i] = 0;
        return b;
    }
};

pub fn main() !u8 {
    var b = try MyBuf.init(std.heap.page_allocator);
    defer std.heap.page_allocator.destroy(b);  // <-- change!
    b.buf_ptr = &b.buf; // <-- change!
    b.buf_ptr[0] = 'h';
    b.buf_ptr[1] = 'i';
    std.debug.print("1: {s}\n", .{b.buf_ptr[0..2]});
    std.debug.print("2: {s}\n", .{b.buf});
    return 0;
}

This will add more code and require allocator. But when return pointer to heap created struct, we will be away from dangling pointer, and will be easier for let zig help us.

Finally, should always prefer to not use this kind of design, but use slice and remember index. The best way of avoiding problem of pointer is not use it. 🙂