Skip to content

feat(v2): init code + basic shred receiver#1204

Open
Sobeston wants to merge 38 commits intomainfrom
sobe/v2
Open

feat(v2): init code + basic shred receiver#1204
Sobeston wants to merge 38 commits intomainfrom
sobe/v2

Conversation

@Sobeston
Copy link
Contributor

@Sobeston Sobeston commented Feb 3, 2026

Working

  • Child process initialisation
  • Child process exit handling
    • fmt printing error return traces
    • fmt printing panics traces
  • memory sharing
  • high level of security
    • regions are shared only as-needed, with write perms as-needed
    • mseal to stop our shared regions from being later modified by anyone
    • closing out all FDs, except for an optional stderr
    • seccomp to ban almost all syscalls
      • write syscalls only allowed on single provided stderr FD
      • intending that these will be per-service once we have some
  • segfault/signal handling
  • a basic net service
  • a basic shred receiver service

Example output:

$ zig build run -- config/testnet.zig.zon 
config: .{ .cluster = .testnet, .leader_schedule_file = { 115, 99, 104, 101, 100, 117, 108, 101, 46, 116, 120, 116 }, .gossip = .{ .port = 8001 }, .shred_network = .{ .recv_port = 8002 } }
Initialising: .net_pair
Initialised: Region `net_pair` shared with [ shred_receiver_0 (rw), net_0 (rw), ]
Initialising: .leader_schedule
Initialised: Region `leader_schedule` shared with [ shred_receiver_0 (ro), ]
Starting Service `shred_receiver_0`, pid: 975397
Starting Service `net_0`, pid: 975398
(net)binding 0.0.0.0:8002
Waiting for shreds on port 8002
slot: 442890464
erasure_set_index: 160
index: 162
shred_type: .code

slot: 442890464
erasure_set_index: 160
index: 169
shred_type: .data

slot: 442890464
erasure_set_index: 160
index: 173
shred_type: .data

What a minimal service looks like:

const std = @import("std");
const start = @import("start");

comptime {
    _ = start;
}

pub const name = "prng";
pub const panic = start.panic;

pub const ReadWrite = struct {
    prng_state: *std.Random.Xoroshiro128,
};

pub fn main(writer: *std.io.Writer, rw: ReadWrite) !noreturn {
    _ = writer;

    rw.prng_state.seed(123);
    while (true) rw.prng_state.seed(rw.prng_state.next());
}

@github-project-automation github-project-automation bot moved this to 🏗 In progress in Sig Feb 3, 2026
@Sobeston Sobeston self-assigned this Feb 3, 2026
@codecov
Copy link

codecov bot commented Feb 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
see 17 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Sobeston Sobeston marked this pull request as ready for review February 5, 2026 06:45
@Sobeston Sobeston requested review from dnut, ultd and yewman as code owners February 5, 2026 06:45
@github-project-automation github-project-automation bot moved this from 🏗 In progress to 👀 In review in Sig Feb 6, 2026
@Sobeston Sobeston changed the title feat(v2): init code feat(v2): init code + basic shred receiver Feb 18, 2026
@Sobeston Sobeston force-pushed the sobe/v2 branch 3 times, most recently from f2abeda to eba600b Compare February 18, 2026 04:58
@Sobeston Sobeston requested a review from dnut February 18, 2026 17:04
@Sobeston
Copy link
Contributor Author

Couple things

  1. this is failing CI because I added a README.md, causing doc step to fail. Not sure what's up with that
  2. v2-check step is failing as we're using zig 0.14.1 in CI, we can either update that in this PR or wait for chore: upgrade zig to 0.15 #1225

Sobeston and others added 4 commits February 27, 2026 01:07
@Sobeston Sobeston requested a review from yewman February 26, 2026 21:52
Comment on lines +104 to +145
inline for (@import("src/services.zon")) |service_name| {
const service_mod = b.createModule(.{
.target = target,
.optimize = optimize,
.root_source_file = b.path("src/services").path(b, service_name ++ ".zig"),
.single_threaded = true,
.omit_frame_pointer = false,
});
service_mod.addImport("common", common);
service_mod.addImport("start", start_service);
service_mod.addImport("tracy", tracy);

const lib_svc = b.addLibrary(.{
.name = service_name,
.root_module = service_mod,
.use_llvm = true,
});
sig_init.linkLibrary(lib_svc);

const service_tests = b.addTest(.{ .root_module = service_mod, .name = service_name });
const service_tests_run = b.addRunArtifact(service_tests);
test_step.dependOn(&service_tests_run.step);
}

const validate_services_list_exe = b.addExecutable(.{
.name = "validate_services_list",
.root_module = b.createModule(.{
.root_source_file = b.path("scripts/validate_services_list.zig"),
.target = b.graph.host,
.optimize = .Debug,
.imports = &.{
.{
.name = "services",
.module = b.createModule(.{ .root_source_file = b.path("src/services.zon") }),
},
},
}),
});
const validate_services_list_run = b.addRunArtifact(validate_services_list_exe);
validate_services_list_run.addDirectoryArg(b.path("src/services"));
b.getInstallStep().dependOn(&validate_services_list_run.step);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could inline the service list here instead of the zon file. Would remove the need for the zon file + the validate_service_list script.

const SIZE_OF_MERKLE_ROOT: usize = Hash.SIZE;

/// Analogous to [Shred](https://github.com/anza-xyz/agave/blob/8c5a33a81a0504fd25d0465bed35d153ff84819f/ledger/src/shred.rs#L245)
pub const Shred = union(ShredType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be in common? There's a good chance this won't be used outside the shred service. I'd rather keep things scoped in services and only move them into common when it's clearly needed by multiple services.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping services lean and focused on the data structures and algorithms which facilitate their higher level logic is preferable. A specific data structure being used by only a single service is not a good justification for its implementation being defined within that service. For example we would not implement the zksdk inside the zk_elgamal program simply because that is the only place it is used at the moment. Instead the zk_elgamal program defines only its instruction type and execution function which together define its 'higher level logic' (i.e. what is it doing).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's consider a hypothetical data structure that defines a domain-specific concept which is only meaningful in the one very narrow context that is fully implemented by a single service. The concept has no relevance in any other context outside that service, and it never well. It's not general purpose code that will likely to be useful in any other context. In that case, it makes no sense for this data structure to exist in a common library. Can we agree on this much? My claim is that Shred follows this pattern.

Comment on lines +3 to +5
test {
_ = std.testing.refAllDecls(@This());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to be needed for every namespace that contains tests? I get that refAllDeclsRecursive isn't perfect, but I also don't like needing to add this to every file.

Also, I don't think you need to put this in a test {} block. This makes the test counts confusing because it adds another unit test for every instance of test {}. It's not a big problem but you could just put it in a comptime block instead to achieve the same thing. refAllDecls is a noop if it's not a test build.

Suggested change
test {
_ = std.testing.refAllDecls(@This());
}
comptime {
std.testing.refAllDecls(@This());
}

Comment on lines +3 to +8
const common = @import("../../common.zig");

const std = @import("std");

const Slot = common.solana.Slot;
const Pubkey = common.solana.Pubkey;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const common = @import("../../common.zig");
const std = @import("std");
const Slot = common.solana.Slot;
const Pubkey = common.solana.Pubkey;
const std = @import("std");
const solana = @import("../solana.zig");
const Slot = solana.Slot;
const Pubkey = solana.Pubkey;

The code is more modular if it only reaches out into the nearest parent scope that's necessary to get its dependencies. We shouldn't make every file depend on the overall structure of common if it's not necessary. It'll be easier to refactor the code if things are more tightly scoped, and I don't see a downside to it.

defer diag.deinit(allocator);

break :cfg std.zon.parse.fromSlice(Config, allocator, cfg_str, &diag, .{}) catch |err| {
std.debug.print("{f}\n", .{diag});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be std.log?

Comment on lines +71 to +85
const shared_regions: []const services.SharedRegion = &.{
.{
.region = .{ .net_pair = .{ .port = config.shred_network.recv_port } },
.shares = &.{
.{ .instance = .{ .service = .shred_receiver }, .rw = true },
.{ .instance = .{ .service = .net }, .rw = true },
},
},
.{
.region = .{ .leader_schedule = .{ .schedule_string = &reader.interface } },
.shares = &.{
.{ .instance = .{ .service = .shred_receiver } },
},
},
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not in scope right now but I'm curious on your thoughts. It seems like the types and numbers of expected shares per service could actually be partly inferred from each serviceMain's parameters and validated at comptime against these shares.

var status: u32 = 0;
const exited_pid: i32 = pid: {
const ret: usize = linux.waitpid(-1, &status, 0);
std.debug.assert(ret != -1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a tautology for usize.

is this more meaningful?

Suggested change
std.debug.assert(ret != -1);
std.debug.assert(ret != std.math.maxInt(usize));

or should we be checking e(ret) != .SUCCESS?

Comment on lines +54 to +55
pub var stderr: std.os.linux.fd_t = undefined;
pub var exit: *common.Exit = undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it safe to set these to undefined? what if something tries to use one of these before they are set?

}

pub fn get(self: *const MerkleProofEntryList, index: usize) ?MerkleProofEntry {
if (index > self.len) return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-existing bug

Suggested change
if (index > self.len) return null;
if (index >= self.len) return null;

}

const SECCOMP = std.os.linux.SECCOMP;
const syscalls = std.os.linux.syscalls.X64;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const syscalls = std.os.linux.syscalls.X64;
const syscalls = std.os.linux.SYS;

Comment on lines +468 to +471
if (std.os.linux.syscall3(.close_range, 0, @intCast(stderr - 1), 0) != 0)
std.debug.panic("close_range failed\n", .{});
if (std.os.linux.syscall3(.close_range, @intCast(stderr + 1), max_fd, 0) != 0)
std.debug.panic("close_range failed\n", .{});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not that i would expect stderr to actually reach numbers this high or low, but saturating is equally correct and theoretically safer.

Suggested change
if (std.os.linux.syscall3(.close_range, 0, @intCast(stderr - 1), 0) != 0)
std.debug.panic("close_range failed\n", .{});
if (std.os.linux.syscall3(.close_range, @intCast(stderr + 1), max_fd, 0) != 0)
std.debug.panic("close_range failed\n", .{});
if (std.os.linux.syscall3(.close_range, 0, @intCast(stderr -| 1), 0) != 0)
std.debug.panic("close_range failed\n", .{});
if (std.os.linux.syscall3(.close_range, @intCast(stderr +| 1), max_fd, 0) != 0)
std.debug.panic("close_range failed\n", .{});

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 👀 In review

Development

Successfully merging this pull request may close these issues.

feat: v2 sig init feat(shred): v2 service feat: v2 IPC feat: networking service

7 participants