(originally posted here: https://zig.news/liyu1981/do-not-rely-on-the-pointer-to-content-of-returned-struct-1ijc)
Recently in my project the following problem caused me sometime in wondering why? (so note here).
Let us look at following simple program
// main.zig
const std = @import("std");
const MyBuf = struct {
buf: [8]u8 = undefined,
buf_ptr: [*]u8 = undefined,
pub fn init() MyBuf {
var b = MyBuf{}; // mark 1
for (0..8) |i| b.buf[i] = 0;
b.buf_ptr = &b.buf; // mark 2
return b;
}
};
pub fn main() !u8 {
var b = MyBuf.init(); // mark 3
b.buf_ptr[0] = 'h';
b.buf_ptr[1] = 'i';
std.debug.print("1: {s}\n", .{b.buf_ptr[0..2]});
std.debug.print("2: {s}\n", .{b.buf});
return 0;
}
Seems that this program is: assigning chars from outside to a buffer, and print them. Not quite useful in this example, but in practice this could be part of complex buf/cache design (and of course I will call it bad design).
Look at what this program will result
Run 5 times, and not once it prints hi
as expected, not the slice created by buf_ptr
, nor buf
. And sometimes in larger program containing this code, one of the printing will have hi
, which is just making me more confusing.
The reason? Actually is quite simple: the buf_ptr
is pointing to memory should never be touched, so operating on it will just mess up with memory. (and yes, zig
actually allows us do that:)). This reason may still sounds confusing, let me explain in detail
-
in
mark 1
we created our struct withbuf
, and inmark 2
ourbuf_ptr
points tobuf
, so far so correct. -
after return from
init
function, inmark 3
,b
is actually a copy ofb
inmark 1
, becausezig
/c
copy values in both param and return value. Thoughb
inmark 3
is now a new copy at new memory location,buf_ptr
is still pointing to the old memory place, which will be reused by other code (then cause a segmentation fault), or not used by other code (in unlikely short time). The latter case will cause more damage as it will produce weird bugs in runtime.
Ok. Lesson learned. Then how to fix the code.
One way is to always assign pointer when need to use.
// main.zig
const std = @import("std");
const MyBuf = struct {
buf: [8]u8 = undefined,
buf_ptr: [*]u8 = undefined,
pub fn init() MyBuf {
var b = MyBuf{};
for (0..8) |i| b.buf[i] = 0;
return b;
}
};
pub fn main() !u8 {
var b = MyBuf.init();
b.buf_ptr = &b.buf; // <-- change!
b.buf_ptr[0] = 'h';
b.buf_ptr[1] = 'i';
std.debug.print("1: {s}\n", .{b.buf_ptr[0..2]});
std.debug.print("2: {s}\n", .{b.buf});
return 0;
}
Or a second way is to return pointer to instance with init
fn. Like
// main.zig
const std = @import("std");
const MyBuf = struct {
buf: [8]u8 = undefined,
buf_ptr: [*]u8 = undefined,
pub fn init(allocator: std.mem.Allocator) !*MyBuf {
var b = try allocator.create(MyBuf); // <-- change!
for (0..8) |i| b.buf[i] = 0;
return b;
}
};
pub fn main() !u8 {
var b = try MyBuf.init(std.heap.page_allocator);
defer std.heap.page_allocator.destroy(b); // <-- change!
b.buf_ptr = &b.buf; // <-- change!
b.buf_ptr[0] = 'h';
b.buf_ptr[1] = 'i';
std.debug.print("1: {s}\n", .{b.buf_ptr[0..2]});
std.debug.print("2: {s}\n", .{b.buf});
return 0;
}
This will add more code and require allocator
. But when return pointer to heap created struct, we will be away from dangling pointer, and will be easier for let zig
help us.
Finally, should always prefer to not use this kind of design, but use slice
and remember index. The best way of avoiding problem of pointer is not use it. 🙂