Add bitSplit codec for splitting numeric values by bit ranges #355

Cyan4973 · 2026-01-29T05:18:00Z

Summary:
Introduces a new core bitSplit transform that splits a numeric stream into multiple output streams according to a configurable set of bit widths.

As a core transform, bitSplit can be instantiated to produce multiple nodes that all share a common, compatible decoder.

On the encode path, the transform extracts consecutive bit ranges from each element (from LSB to MSB) into separate streams. On the decode path, it reconstructs the original values by reassembling these ranges. Partial coverage is supported (sum(widths) < element_width); in that case, the remaining high bits must be zero.

Performance depends on the number of ranges and the total covered bit width, but is on the order of ~1 GB/s on my devserver for both encoding and decoding.

bitSplit also enables specialized nodes, e.g. floating-point component separation (sign, exponent, mantissa) across formats and sizes (FP32, FP16, BF16, etc.). Internally, it supports specialized instances to improve performance for common scenarios.

This patch ships a BF16-specialized instance. On my devserver it reaches ~30 GB/s on decode and ~40 GB/s on encode using plain C; such performance requires vectorization, and the compiler auto-vectorizes the hot loop with AVX2.

The initial motivation is to replace several corner-case graphs where zigzag is effectively used as a “poor man’s bitshift” in combination with transpose. Recent sample analysis shows this pattern is common in compressor graphs produced by ACE. These multi-stage constructs can be replaced with a single bitSplit node, improving throughput, slightly reducing output size, and providing a clearer decision signal than the current indirect emulation.

Test Plan:

make gtests && ./gtests --gtest_filter="*BitSplit*"

All 28 tests pass covering:

Full coverage splits (8/16/32/64-bit inputs)
Partial coverage cases
Single stream passthrough
Edge cases (empty input, single element, all zeros/ones)
Asymmetric and many-small-width splits

meta-codesync · 2026-01-29T16:19:08Z

@Cyan4973 has imported this pull request. If you are a Meta employee, you can view this in D91787385.

Summary: Introduces a new core bitSplit transform that splits numeric input elements into multiple output streams based on configurable bit widths. This is a core transform, that will then be used to generate multiple nodes, all compatible with the same decoder transform. The encoder extracts consecutive bit ranges (LSB to MSB order) from each input element into separate typed streams. The decoder reconstructs the original values by combining the bit ranges. Supports partial coverage where sum(widths) < element_width, with top bits required to be zero. The specific nodes that can be created from bitSplit include, for example, Floating component separation (sign, exponent, mantissa), for all size and formats (FP32, FP16, BF16, etc.). But the main use case I want to use this transform for is to replace a bunch of weird corner cases where the `zigzag` transform is employed as a sort of "poor man's bitshift" to produce a certain effect _in combination with_ `transpose`. According to analysis I did recently on a bunch of samples, this is very common in compressor graphs created by ACE. All of this could be replaced by a single evocation of a `bitSplit` derivative node, no only being faster and a little bit more compact, but more importantly providing a clear decision, as opposed to a muddy one emulated through multiple abused stages. Test Plan: make gtests && ./gtests --gtest_filter="*BitSplit*" All 28 tests pass covering: - Full coverage splits (8/16/32/64-bit inputs) - Partial coverage cases - Single stream passthrough - Edge cases (empty input, single element, all zeros/ones) - Asymmetric and many-small-width splits

for capabilities shared between compression and decompression

to be pure C11 with no dependencies

unused variable when debug is disabled

and removed local config file

use ZL_memcpy, not `memcpy` directly

Cyan4973 self-assigned this Jan 29, 2026

meta-cla bot added the cla signed label Jan 29, 2026

Cyan4973 force-pushed the bitsplit branch from a10d5c0 to 3b10125 Compare January 30, 2026 06:03

Cyan4973 added 19 commits January 29, 2026 22:05

clang-format

0591487

bitsplit: create shared unit

c97b3fc

for capabilities shared between compression and decompression

bitsplit: Simplified kernel implementation

03be1eb

to be pure C11 with no dependencies

Reorganize kernels so that they own the hot loop

620958d

fix kernel decoder implementation

c637d56

bitsplit: small decoder speed optimization

4bcc0d6

minor optimization to bitsplit decode kernel

0f18f5d

use clang-format

841a2d6

added bitsplit_decode benchmarks

5e07fd7

added a bf16 specialized decoder implementation

70292c9

minor: fixed parameter order

782afb2

update validation functions signatures for assert

7b5e4f2

add preconditions tests and documentation

6369cee

minor encoding speed optimizations

b876b93

added bitsplit encode benchmark scenarios

8d350c9

added bf16 specialized encoding instance

d336101

employ clang-format

5947ec9

fix minor warning

ee73a10

unused variable when debug is disabled

Cyan4973 force-pushed the bitsplit branch from 3b10125 to ee73a10 Compare January 30, 2026 06:05

fix .gitignore

0f75293

and removed local config file

Cyan4973 force-pushed the bitsplit branch from e46530e to 0f75293 Compare January 30, 2026 06:29

Cyan4973 added 4 commits January 30, 2026 09:45

fix naming convention

5603a03

minor: improved naming and checks

fe00dc1

Check input bit width

473ee06

fix naming scheme consistency

06304fe

Cyan4973 added 2 commits January 30, 2026 17:07

minor adjustments

548ea48

use ZL_memcpy, not `memcpy` directly

more minor naming scheme adjustments

5d8d4a5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bitSplit codec for splitting numeric values by bit ranges #355

Add bitSplit codec for splitting numeric values by bit ranges #355

Cyan4973 commented Jan 29, 2026 •

edited

Loading

Uh oh!

meta-codesync bot commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add bitSplit codec for splitting numeric values by bit ranges #355

Are you sure you want to change the base?

Add bitSplit codec for splitting numeric values by bit ranges #355

Conversation

Cyan4973 commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync bot commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Cyan4973 commented Jan 29, 2026 •

edited

Loading