`jstring.zig`, my javascript inspired string lib with excellent Regex support

(originally posted here: https://zig.news/liyu1981/jstringzig-my-javascript-inspired-string-lib-with-excellent-regex-support-3a3p)

Share with you this handy string lib I created. jstring.zig

{% embed https://github.com/liyu1981/jstring.zig %}

As the name assumed, it is a string lib inspired by Javascript (ECMA Script, precisely). The target is to get all methods specified at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String implemented, except those marked as deprecated (such as anchor, big, blink etc).

Highlight: excellen Regex support with help from PCRE2

see some examples on how it is supported

var str1 = try JStringUnmanaged.newFromSlice(arena.allocator(), "hello,hello,world");
var results = try str1.splitByRegex(arena.allocator(), "l+", 0, 0);
try testing.expectEqual(results.len, 1);
try testing.expect(results[0].eqlSlice("hello,hello,world"));
results = try str1.splitByRegex(arena.allocator(), "l+", 0, -1);
try testing.expectEqual(results.len, 4);
try testing.expect(results[0].eqlSlice("he"));
try testing.expect(results[1].eqlSlice("o,he"));
try testing.expect(results[2].eqlSlice("o,wor"));
try testing.expect(results[3].eqlSlice("d"));

or

var re = try RegexUnmanaged.init(arena.allocator(), "(hi,)(?<h>hel+o?)", 0);
try re.match(arena.allocator(), "hi,hello", 0, true, 0);
try re.reset(arena.allocator());
try re.match(arena.allocator(), "hi,hello", 0, true, 0);
const match_results = re.getResults();
const group_results = re.getGroupResults();
_ = group_results;
if (match_results) |mrs| {
   try testing.expectEqual(mrs[0].start, 0);
   try testing.expectEqual(mrs[0].len, 8);
}
var it = re.getGroupResultsIterator("hi,hello");
    var maybe_r = it.nextResult();
    try testing.expect(maybe_r != null);
    if (maybe_r) |r| {
        try testing.expect(std.mem.eql(u8, r.name, ""));
        try testing.expectEqual(r.start, 0);
     }
     maybe_r = it.nextResult();
     try testing.expect(maybe_r != null);
     if (maybe_r) |r| {
         try testing.expect(std.mem.eql(u8, r.name, "h"));
         try testing.expectEqual(r.start, 3);
}

Use it in your project

jstring.zig can be used with zig pkg manager like below

zig fetch --save https://github.com/liyu1981/jstring.zig/archive/refs/tags/0.1.0.tar.gz

and because it has integrated PCRE2, when build with it, you will need enable PCRE2 linkage. jstring.zig also provided build time module for getting this part really easily done, like below

// in your build.zig
const jstring_build = @import("jstring");
...
const jstring_dep = b.dependency("jstring", .{});
exe.addModule("jstring", jstring_dep.module("jstring"));
jstring_build.linkPCRE(exe, jstring_dep);

How about the performance?

jstring.zig is designed with performance in mind, and it should approach bare []const u8 as much as it can. Though the benchmark part is still work-in-progress. But my early test shows that, jstring.zig outperforms C++’s std:string, ~70% faster.

benchmark % ./zig-out/bin/benchmark
|zig create/release: | [ooooo] | avg=    16464000ns | min=    14400000ns | max=    20975000ns |
|cpp create/release: | [ooooo] | avg=    56735400ns | min=    56137000ns | max=    57090000ns |

(test is done by randomly allocating/releasing 1M short/long strings).

but I want to attribute the credits to zig, not jstring.zig, because zig is really a cache friendly language, and very easy to get your program fast!

coverage

One of my goal when build this lib (as a practice) is to push the coverage to max. And in jstring.zig, the coverage is 100% on both the zig and c code. Take a look at the report here.

it can be used as a single file lib too

just copy jstring.zig to your project will do the job too. For Regex support it can be turned off by modifying the comptime var enable_pcre in the beginning of file. Though without pcre support, it still has excellent performance, and even has built in KMP fast search algorithm implemented. KMP will be very useful if you need use jstring to search high repeating strings (like scientific data), it will turn O(n^2) time to O(n) time.

zig doc

browsing zig doc of jstring here.