(originally posted here: https://zig.news/liyu1981/jstringzig-my-javascript-inspired-string-lib-with-excellent-regex-support-3a3p)
Share with you this handy string lib I created. jstring.zig
{% embed https://github.com/liyu1981/jstring.zig %}
As the name assumed, it is a string lib inspired by Javascript (ECMA Script, precisely). The target is to get all methods specified at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String implemented, except those marked as deprecated (such as anchor, big, blink etc).
Highlight: excellen Regex support with help from PCRE2
see some examples on how it is supported
var str1 = try JStringUnmanaged.newFromSlice(arena.allocator(), "hello,hello,world");
var results = try str1.splitByRegex(arena.allocator(), "l+", 0, 0);
try testing.expectEqual(results.len, 1);
try testing.expect(results[0].eqlSlice("hello,hello,world"));
results = try str1.splitByRegex(arena.allocator(), "l+", 0, -1);
try testing.expectEqual(results.len, 4);
try testing.expect(results[0].eqlSlice("he"));
try testing.expect(results[1].eqlSlice("o,he"));
try testing.expect(results[2].eqlSlice("o,wor"));
try testing.expect(results[3].eqlSlice("d"));
or
var re = try RegexUnmanaged.init(arena.allocator(), "(hi,)(?<h>hel+o?)", 0);
try re.match(arena.allocator(), "hi,hello", 0, true, 0);
try re.reset(arena.allocator());
try re.match(arena.allocator(), "hi,hello", 0, true, 0);
const match_results = re.getResults();
const group_results = re.getGroupResults();
_ = group_results;
if (match_results) |mrs| {
try testing.expectEqual(mrs[0].start, 0);
try testing.expectEqual(mrs[0].len, 8);
}
var it = re.getGroupResultsIterator("hi,hello");
var maybe_r = it.nextResult();
try testing.expect(maybe_r != null);
if (maybe_r) |r| {
try testing.expect(std.mem.eql(u8, r.name, ""));
try testing.expectEqual(r.start, 0);
}
maybe_r = it.nextResult();
try testing.expect(maybe_r != null);
if (maybe_r) |r| {
try testing.expect(std.mem.eql(u8, r.name, "h"));
try testing.expectEqual(r.start, 3);
}
Use it in your project
jstring.zig
can be used with zig pkg manager
like below
zig fetch --save https://github.com/liyu1981/jstring.zig/archive/refs/tags/0.1.0.tar.gz
and because it has integrated PCRE2
, when build with it, you will need enable PCRE2
linkage. jstring.zig
also provided build time module for getting this part really easily done, like below
// in your build.zig
const jstring_build = @import("jstring");
...
const jstring_dep = b.dependency("jstring", .{});
exe.addModule("jstring", jstring_dep.module("jstring"));
jstring_build.linkPCRE(exe, jstring_dep);
How about the performance?
jstring.zig
is designed with performance in mind, and it should approach bare []const u8
as much as it can. Though the benchmark part is still work-in-progress. But my early test shows that, jstring.zig
outperforms C++’s std:string
, ~70% faster.
benchmark % ./zig-out/bin/benchmark
|zig create/release: | [ooooo] | avg= 16464000ns | min= 14400000ns | max= 20975000ns |
|cpp create/release: | [ooooo] | avg= 56735400ns | min= 56137000ns | max= 57090000ns |
(test is done by randomly allocating/releasing 1M short/long strings).
but I want to attribute the credits to zig
, not jstring.zig
, because zig
is really a cache friendly language, and very easy to get your program fast!
coverage
One of my goal when build this lib (as a practice) is to push the coverage to max. And in jstring.zig
, the coverage is 100% on both the zig
and c
code. Take a look at the report here.
it can be used as a single file lib too
just copy jstring.zig
to your project will do the job too. For Regex
support it can be turned off by modifying the comptime var enable_pcre
in the beginning of file. Though without pcre
support, it still has excellent performance, and even has built in KMP
fast search algorithm implemented. KMP
will be very useful if you need use jstring
to search high repeating strings (like scientific data), it will turn O(n^2)
time to O(n)
time.
zig
doc
browsing zig
doc of jstring
here.