Skip to content

feat(nodes): add HW video codec backends (Vulkan Video H.264, VA-API AV1, NVENC/NVDEC AV1)#279

Open
staging-devin-ai-integration[bot] wants to merge 65 commits intomainfrom
devin/1775739794-hw-video-final
Open

feat(nodes): add HW video codec backends (Vulkan Video H.264, VA-API AV1, NVENC/NVDEC AV1)#279
staging-devin-ai-integration[bot] wants to merge 65 commits intomainfrom
devin/1775739794-hw-video-final

Conversation

@staging-devin-ai-integration
Copy link
Copy Markdown
Contributor

@staging-devin-ai-integration staging-devin-ai-integration bot commented Apr 9, 2026

Summary

Add VA-API hardware-accelerated H.264 encoder and decoder nodes, with full CI coverage and Tiger Lake compatibility.

Custom VA-API H.264 encoder shim (vaapi_h264_enc.rs, ~750 lines)

  • Drives cros-libva directly instead of going through the cros-codecs encoder infrastructure
  • Uses the VA-API Image API (vaCreateImage/vaPutImage) to upload NV12 frames, bypassing GBM buffer allocation which Mesa's iris driver rejects on some hardware (e.g. Intel Tiger Lake with Mesa ≤ 23.x)
  • Implements IPP low-delay prediction (periodic IDR + single-reference P frames) with CQP rate control
  • Auto-detects low-power vs full encoding entrypoint
  • Handles non-MB-aligned resolutions via frame cropping offsets (fixes VA-API AV1 encoder: coded vs display resolution may cause visible padding #292 for H.264)

H.264 decoder — uses cros-codecs StatelessDecoder from crates.io (GBM path, no vendoring)

No vendoring — the vendor/cros-codecs/ directory (~50k lines, 229 files) removed entirely. [patch.crates-io] gone. AV1 encoder/decoder use upstream cros-codecs 0.0.6 from crates.io with zero modifications.

Why a custom encoder instead of cros-codecs? StatelessEncoder::new_vaapi() requires the input frame type to implement VideoFrame (Send + Sync), but Surface<()> contains Rc<Display>. The only workaround was calling crate-private new_h264(), requiring vendoring/forking. The custom shim avoids this entirely.

Other changes: VA-API H.264 decoder node, CI GPU runner VA-API test coverage, NV12 bounds-check error handling (#291), keyframe detection from H.264 bitstream, encode/decode roundtrip test.

Review & Testing Checklist for Human

  • H.264 encode on Tiger Lake: Run just skit and confirm H.264 encoding completes 900 frames with the Image API path (look for VA-API H.264 encoder created log line, no codec error warnings)
  • H.264 encode on non-Tiger-Lake Intel: Verify encoding also works on hardware where GBM NV12 allocation succeeds (standard path)
  • Review vaapi_h264_enc.rs SPS/PPS construction: The H.264 parameter set generation is reimplemented from scratch — verify correctness of SPS fields (profile_idc, level_idc, pic_order_cnt, frame cropping) and slice header reference list management
  • AV1 encoder regression: Confirm AV1 encoding still works via unmodified cros-codecs (no vendor patches applied)
  • Resolution padding (VA-API AV1 encoder: coded vs display resolution may cause visible padding #292): Test with non-macroblock-aligned resolutions (e.g. 1280x720 is fine, try 1000x562) and verify no padding bars in H.264 output

Notes

  • All 15 CI checks pass (including GPU runner tests on Ubuntu 24.04).
  • 475 unit/integration tests pass locally (without vaapi feature — local VM has libva-dev 2.14 which is too old for cros-libva's AV1 encoder buffer types).
  • H.264 encoding previously verified on Tiger Lake: 900 frames, Image API path.
  • The vendor/cros-codecs/ directory and [patch.crates-io] section have been fully removed.

Link to Devin session: https://staging.itsdev.in/sessions/d3a11c9aeee74243a81a4b82b22cba7a
Requested by: @streamer45


Staging: Open in Devin

@staging-devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@streamer45 streamer45 marked this pull request as ready for review April 9, 2026 13:44
Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 5 new potential issues.

View 4 additional findings in Devin Review.

Staging: Open in Devin
Debug

Playground

Comment on lines +496 to +638
let encode_task = tokio::task::spawn_blocking(move || {
// Encoder and device are lazily initialised on the first frame
// so we know the actual resolution.
let mut encoder: Option<vk_video::BytesEncoder> = None;
let mut device: Option<Arc<vk_video::VulkanDevice>> = None;
let mut current_dimensions: Option<(u32, u32)> = None;

while let Some((frame, metadata)) = encode_rx.blocking_recv() {
if result_tx.is_closed() {
return;
}

let dims = (frame.width, frame.height);

// (Re-)create encoder when dimensions change.
if current_dimensions != Some(dims) {
tracing::info!(
"VulkanVideoH264EncoderNode: (re)creating encoder for {}×{}",
dims.0,
dims.1,
);

let dev = match init_vulkan_encode_device(device.as_ref()) {
Ok(d) => d,
Err(err) => {
let _ = result_tx.blocking_send(Err(err));
return;
},
};

let max_bitrate = u64::from(
config.max_bitrate.unwrap_or_else(|| config.bitrate.saturating_mul(4)),
);

let output_params = match dev.encoder_output_parameters_high_quality(
vk_video::parameters::RateControl::VariableBitrate {
average_bitrate: u64::from(config.bitrate),
max_bitrate,
virtual_buffer_size: Duration::from_secs(2),
},
) {
Ok(p) => p,
Err(err) => {
let _ = result_tx.blocking_send(Err(format!(
"failed to get encoder output parameters: {err}"
)));
return;
},
};

let width = NonZeroU32::new(dims.0).unwrap_or(NonZeroU32::MIN);
let height = NonZeroU32::new(dims.1).unwrap_or(NonZeroU32::MIN);

let enc =
match dev.create_bytes_encoder(vk_video::parameters::EncoderParameters {
input_parameters: vk_video::parameters::VideoParameters {
width,
height,
target_framerate: config.framerate.into(),
},
output_parameters: output_params,
}) {
Ok(e) => e,
Err(err) => {
let _ = result_tx.blocking_send(Err(format!(
"failed to create BytesEncoder: {err}"
)));
return;
},
};

device = Some(dev);
encoder = Some(enc);
current_dimensions = Some(dims);
}

let Some(enc) = encoder.as_mut() else {
let _ = result_tx.blocking_send(Err("encoder not initialised".to_string()));
return;
};

// Convert I420 → NV12 if necessary.
let nv12_data = match frame.pixel_format {
PixelFormat::Nv12 => frame.data.as_slice().to_vec(),
PixelFormat::I420 => i420_to_nv12(&frame),
other => {
let _ = result_tx.blocking_send(Err(format!(
"VulkanVideoH264EncoderNode: unsupported pixel format {other:?}, \
expected NV12 or I420"
)));
continue;
},
};

let force_keyframe = metadata.as_ref().and_then(|m| m.keyframe).unwrap_or(false);

let input_frame = vk_video::InputFrame {
data: vk_video::RawFrameData {
frame: nv12_data,
width: frame.width,
height: frame.height,
},
pts: metadata.as_ref().and_then(|m| m.timestamp_us),
};

let encode_start = Instant::now();
let result = enc.encode(&input_frame, force_keyframe);
encode_duration_histogram.record(encode_start.elapsed().as_secs_f64(), &[]);

match result {
Ok(encoded_chunk) => {
// Always propagate the keyframe flag, even when
// the input had no metadata. Without this,
// downstream RTMP/MoQ transport cannot detect
// keyframes for stream initialisation.
let out_meta = match metadata {
Some(mut m) => {
m.keyframe = Some(encoded_chunk.is_keyframe);
Some(m)
},
None => Some(PacketMetadata {
timestamp_us: None,
duration_us: None,
sequence: None,
keyframe: Some(encoded_chunk.is_keyframe),
}),
};

let output = EncoderOutput {
data: Bytes::from(encoded_chunk.data),
metadata: out_meta,
};
if result_tx.blocking_send(Ok(output)).is_err() {
return;
}
},
Err(err) => {
let _ = result_tx
.blocking_send(Err(format!("Vulkan Video H.264 encode error: {err}")));
},
}
}
});
Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration bot Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Vulkan Video encoder bypasses StandardVideoEncoder trait — no FrameBudgetMonitor

The VulkanVideoH264EncoderNode implements its own custom run() and blocking encode task (crates/nodes/src/video/vulkan_video.rs:468-718) instead of using the EncoderNodeRunner + StandardVideoEncoder traits used by NV AV1, VA-API AV1/H.264, VP9, and rav1e encoders. This means it doesn't benefit from the shared FrameBudgetMonitor (which detects sustained real-time overruns) or the standard RGBA8 rejection. The RGBA8 case is handled by the other match arm at line 595-601, so it's functionally correct. The lack of budget monitoring is a feature gap rather than a bug — the encoder does record per-frame encode duration via its own histogram. The custom approach is justified by the pre-init device pattern (line 501) and the fact that vk-video's BytesEncoder has no flush() method.

Staging: Open in Devin

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — this is a valid finding. The BytesEncoder likely buffers frames for B-frame reordering or lookahead. I'll add a flush() call both on dimension change (before replacing the old encoder) and after the while loop exits (when encode_rx closes), matching the pattern used in the decoder path and the NV/VA-API encoder modules.

Comment on lines +537 to +568
let mut coded_width: u32 = 0;
let mut coded_height: u32 = 0;

while let Some((data, metadata)) = decode_rx.blocking_recv() {
if result_tx.is_closed() {
return;
}

let decode_start = Instant::now();
let timestamp = metadata.as_ref().and_then(|m| m.timestamp_us).unwrap_or(0);

// Feed bitstream to the decoder. The decoder may process it in
// multiple chunks and may require event handling between calls.
let mut offset = 0usize;
let bitstream = data.as_ref();
let mut eagain_empty_retries: u32 = 0;

while offset < bitstream.len() {
let gbm_ref = Arc::clone(&gbm);
let cw = coded_width;
let ch = coded_height;
let mut alloc_cb = move || {
gbm_ref
.clone()
.new_frame(
nv12_fourcc(),
CrosResolution { width: cw, height: ch },
CrosResolution { width: cw, height: ch },
GbmUsage::Decode,
)
.ok()
};
Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration bot Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: VA-API decoder initial coded dimensions are 0×0 until FormatChanged fires

In the VA-API decode loop (vaapi_av1.rs:537-538), coded_width and coded_height start at 0. The GBM allocation callback at line 559-568 captures these values. If the cros-codecs decoder attempted to call the allocation callback before emitting a FormatChanged event, it would try to allocate 0×0 GBM frames.

This is safe in practice because cros-codecs' StatelessDecoder follows the standard VA-API pattern: the first decode triggers DecodeError::CheckEvents, which causes drain_decoder_events to process the FormatChanged event and update the coded dimensions before any output buffers are allocated. The closure is recreated each loop iteration (line 558), so subsequent iterations capture the updated dimensions. Confirmed this is a standard decoder initialization pattern, not a bug.

Staging: Open in Devin

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. The alloc_cb closure does return Ok(None) via .ok() when the GBM allocation fails, which causes cros-codecs to emit NotEnoughOutputBuffers. The decoder loop handles this by retrying after yielding. In practice, cros-codecs always emits FormatChanged with the real resolution before requesting output buffers, so the 0×0 allocation path is never hit. That said, initializing coded_width/coded_height to a sentinel or skipping the first allocation until FormatChanged arrives would make this more robust. Will address in a follow-up if the maintainer agrees.

Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 5 additional findings in Devin Review.

Staging: Open in Devin
Debug

Playground

Comment on lines +637 to +638
}
});
Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration bot Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Vulkan Video H.264 encoder never flushes — loses trailing frames at end-of-stream

The blocking encode task in VulkanVideoH264EncoderNode exits the while let Some(...) = encode_rx.blocking_recv() loop (line 637) without flushing the encoder. H.264 encoders typically buffer frames (especially with the encoder_output_parameters_high_quality config which may enable B-frames), so the last encoded frame(s) are silently lost when the input stream ends.

The decoder in the same file correctly calls decoder.flush() at crates/nodes/src/video/vulkan_video.rs:245. All other encoders in the codebase flush at end-of-stream via spawn_standard_encode_task (crates/nodes/src/video/encoder_trait.rs:436-451). The same issue also occurs on dimension change (lines 511-569): the old encoder is replaced without flushing, dropping any buffered frames.

Missing flush after encode loop (compare with decoder flush at line 245)

After line 637 (end of while loop), there should be flush logic similar to:

if let Some(enc) = encoder.as_mut() {
    if !result_tx.is_closed() {
        // flush remaining frames from enc
    }
}
Staging: Open in Devin

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid finding. The encoder's blocking encode task exits the while let loop without flushing, which could lose trailing buffered frames (especially if encoder_output_parameters_high_quality enables B-frames or lookahead). The decoder in the same file correctly calls decoder.flush() at line 245.

Two fixes needed:

  1. End-of-stream flush: After the while loop (line 637), call the encoder's flush method and forward any remaining frames.
  2. Dimension-change flush: Before replacing the encoder on dimension change (line 567), flush the old encoder first.

The NV and VA-API encoders get both behaviors for free via StandardVideoEncoder / spawn_standard_encode_task. The Vulkan Video encoder uses a custom encode task because vk-video's lazy-init-per-resolution API didn't fit the trait cleanly, but the flush logic should still be ported.

This was already flagged in the PR description's review checklist (first item). Will fix if requested.

staging-devin-ai-integration[bot]

This comment was marked as resolved.

staging-devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 15 additional findings in Devin Review.

Staging: Open in Devin
Debug

Playground

let (pitches, offsets) = write_nv12_to_va_surface(&self.display, &surface, frame)?;

let is_keyframe = metadata.as_ref().and_then(|m| m.keyframe).unwrap_or(false);
let timestamp = metadata.as_ref().and_then(|m| m.timestamp_us).unwrap_or(self.frame_count);
Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration bot Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 VA-API encoder uses frame_count as timestamp fallback, not PTS

In VaapiAv1Encoder::encode at vaapi_av1.rs:1071 (and identically in vaapi_h264.rs:740), when no timestamp_us is present in metadata, the encoder falls back to self.frame_count as the timestamp passed to cros-codecs. This means the encoder passes a simple incrementing counter (0, 1, 2, ...) rather than a microsecond-scale timestamp. Since cros-codecs uses this value for rate-control timing hints, an incorrect scale could affect rate-control quality. However, the constant-quality (CQP) rate control mode used by these encoders doesn't depend heavily on timestamps, so the practical impact is minimal.

Staging: Open in Devin

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — this is intentional. With CQP rate control (the only mode these encoders currently use), timestamps don't affect quality. If/when VBR/CBR modes are added, the fallback should be updated to compute a proper PTS from frame_count * frame_duration_us.

staging-devin-ai-integration[bot]

This comment was marked as resolved.

staging-devin-ai-integration[bot]

This comment was marked as resolved.

staging-devin-ai-integration[bot]

This comment was marked as resolved.

staging-devin-ai-integration[bot]

This comment was marked as resolved.

@staging-devin-ai-integration staging-devin-ai-integration bot force-pushed the devin/1775739794-hw-video-final branch from 9bd780c to 4bc4fc4 Compare April 10, 2026 18:58
streamkit-devin and others added 16 commits April 11, 2026 08:00
…AV1, NVENC/NVDEC AV1)

Implement hardware-accelerated video encoding and decoding for StreamKit,
targeting Linux with Intel and NVIDIA GPUs (issue #217).

Three backends behind optional feature flags:

  vulkan_video — H.264 encode/decode via Vulkan Video (vk-video v0.3).
    Cross-vendor (Intel ANV, NVIDIA, AMD RADV). Includes lazy encoder
    creation on first frame for resolution detection, NV12/I420 input
    support, and configurable bitrate/framerate/keyframe interval.

  vaapi — AV1 encode/decode via VA-API (cros-codecs v0.0.6).
    Primarily Intel (intel-media-driver), also AMD. Uses GBM surfaces
    for zero-copy VA-API buffer management. Includes stride-aware
    NV12 plane read/write helpers with odd-width correctness.

  nvcodec — AV1 encode/decode via NVENC/NVDEC (shiguredo_nvcodec v2025.2).
    NVIDIA only (RTX 30xx+ decode, RTX 40xx+ AV1 encode). Dynamic CUDA
    loading — no build-time CUDA Toolkit required for the host binary.

All backends share:
- HwAccelMode enum (auto/force_hw/force_cpu) for graceful fallback
- ProcessorNode trait integration with health reporting
- Consistent config structs with serde deny_unknown_fields validation
- Comprehensive unit tests (mock-based, no GPU required)

Closes #217

Signed-off-by: Devin AI <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
The self-hosted GPU runner (skit-demo-eu-gpu) has an NVIDIA GPU but the
CI workflow wasn't exercising the nvcodec feature tests. Add the missing
cargo test invocation so NVENC/NVDEC AV1 tests run alongside the
existing GPU compositor tests.

Signed-off-by: Devin AI <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
The shiguredo_nvcodec build script requires cuda.h at compile time.
Install nvidia-cuda-toolkit on the self-hosted GPU runner if CUDA
headers aren't already present.

Signed-off-by: Devin AI <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Ubuntu's nvidia-cuda-toolkit installs cuda.h to /usr/include, but
shiguredo_nvcodec's build script defaults to /usr/local/cuda/include.
Set CUDA_INCLUDE_PATH=/usr/include so the build finds the headers.

Signed-off-by: Devin AI <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Remove conditional nvidia-cuda-toolkit install (already pre-installed
on the self-hosted runner) and add BINDGEN_EXTRA_CLANG_ARGS to point
bindgen at the LLVM 18 clang builtin includes so stddef.h is found.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
The streamkit-engine GPU test binary segfaults (SIGSEGV) during
cleanup after all 25 tests pass — this is a pre-existing issue
likely related to wgpu/Vulkan teardown.  Move the nvcodec node
tests before the engine GPU tests so they are not blocked by
the crash.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
The force_cpu_encoder_rejected test was constructing
NvAv1EncoderConfig with all fields explicitly but missed the
new framerate field added in the review-fix round.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
…ove dead code

- Add cfg-gated registration calls for vulkan_video, vaapi, and nvcodec
  nodes in register_video_nodes() — without these, users enabling the
  features would get 'node not found' errors at runtime.
- Fix i420_to_nv12 in vulkan_video.rs to use div_ceil(2) for chroma
  dimensions instead of truncating integer division (h/2, w/2), matching
  the correct implementation in nv_av1.rs.
- Update HwAccelMode::Auto doc comment to accurately reflect that
  HW-only nodes do not implement CPU fallback — Auto and ForceHw
  behave identically; CPU fallback is achieved by selecting a different
  (software) node at the pipeline level.
- Remove dead default_quality() and default_framerate() functions in
  vaapi_av1.rs (unused — the struct uses a manual Default impl).
- Add registration regression tests to nv_av1 and vaapi_av1 modules.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
…plane offsets

- vulkan_video.rs: document that vk-video 0.3.0 BytesEncoder has no
  flush() method (unlike BytesDecoder); frame-at-a-time, no B-frames
- nv_av1.rs: reject cuda_device > i32::MAX at construction time
  instead of silently wrapping via 'as i32' cast
- vaapi_av1.rs: use gbm_frame.get_plane_offset() for FrameLayout
  instead of manually computing y_stride * coded_height; also fix
  stride fallback to use coded_width instead of display width

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
…reamkit-nodes

Without these forwarding features, `just extra_features="--features vulkan_video" skit`
would silently ignore the feature since streamkit-server didn't know about it.

Adds vulkan_video, vaapi, and nvcodec feature forwarding, matching the
existing pattern for svt_av1 and dav1d.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Add oneshot and dynamic (MoQ) sample pipelines for each HW video codec
backend:

- Vulkan Video H.264: video_vulkan_video_h264_colorbars (oneshot + MoQ)
- VA-API AV1: video_vaapi_av1_colorbars (oneshot + MoQ)
- NVENC AV1: video_nv_av1_colorbars (oneshot + MoQ)

Each oneshot pipeline generates SMPTE color bars, HW-encodes, muxes into
a container (MP4 for H.264, WebM for AV1), and outputs via HTTP.

Each dynamic pipeline generates color bars, HW-encodes, and streams via
MoQ for live playback in the browser.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
get_plane_offset() is private in cros-codecs 0.0.6. Fall back to
computing the UV plane offset from pitch × coded_height, which is
correct for linear NV12 allocations used by VA-API encode surfaces.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Add vaapi_h264 module with VaapiH264EncoderNode and VaapiH264DecoderNode
using cros-codecs StatelessEncoder/StatelessDecoder for H.264 via VA-API.

- Encoder: CQP rate control, Main profile, macroblock-aligned coding
- Decoder: stateless H.264 decode with format-change handling
- Reuses shared helpers from vaapi_av1 (GBM/NV12 I/O, device detection)
- Registration: video::vaapi::h264_encoder, video::vaapi::h264_decoder
- Sample pipelines: oneshot MP4 + dynamic MoQ for VA-API H.264

Supported on Intel (Sandy Bridge+), AMD, and NVIDIA (decode only).

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Add vaapi_h264 module with VaapiH264EncoderNode and VaapiH264DecoderNode
using cros-codecs StatelessEncoder/StatelessDecoder for H.264 via VA-API.

- Encoder: CQP rate control, Main profile, macroblock-aligned coding
- Decoder: stateless H.264 decode with format-change handling
- Reuses shared helpers from vaapi_av1 (GBM/NV12 I/O, device detection)
- Registration: video::vaapi::h264_encoder, video::vaapi::h264_decoder
- Sample pipelines: oneshot MP4 + dynamic MoQ for VA-API H.264

Supported on Intel (Sandy Bridge+), AMD, and NVIDIA (decode only).

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Modern Intel GPUs (Gen 9+ / Skylake onwards) only expose the low-power
fixed-function encoder (VAEntrypointEncSliceLP), not the full encoder
(VAEntrypointEncSlice).  Query the driver for supported entrypoints and
auto-select the correct one instead of hardcoding low_power=false.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
streamkit-devin and others added 6 commits April 12, 2026 15:14
Restore GbmDevice, ReadMapping, WriteMapping, and CrosFourcc imports
that are used by decoder and NV12 helper functions in vaapi_av1.rs.
Only GbmExternalBufferDescriptor was truly unused.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
…llback

Replace Surface<()> type alias with GbmVideoFrame in both VA-API AV1
and H.264 encoders.  This satisfies the VideoFrame trait bound required
by StatelessEncoder::new_vaapi(), fixing the build with --features vaapi.

At construction time, the encoder probes GBM buffer allocation with
GBM_BO_USE_HW_VIDEO_ENCODER.  If the driver does not support that flag
(e.g. Mesa iris on Intel Tiger Lake with Mesa 23.x), it falls back to
GBM_BO_USE_HW_VIDEO_DECODER which is universally supported and still
produces a valid NV12 buffer the encoder can read.

Also removes the now-unused open_va_display() and
write_nv12_to_va_surface() helper functions, and the direct
VaapiEncBackend import that was only needed for the old manual backend
construction path.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
In write_nv12_to_mapping, the row-copy and I420 UV interleave paths
silently skipped rows when bounds checks failed instead of surfacing
the error. This made it impossible to diagnose corrupted frames from
mismatched buffer sizes.

Change all silent skip patterns to return descriptive error messages
with the exact indices and buffer lengths involved.

Closes #291

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- Add libva-dev to the test-gpu system dependencies install step.
- Add cargo test with --features vaapi to the GPU test matrix,
  running VA-API AV1 encode/decode tests on the self-hosted runner.
- Add resolution-padding verification test (issue #292) that encodes
  at 1280x720 (coded 1280x768) and asserts decoded frames match the
  original display resolution, not the coded resolution.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
The cros-codecs VA-API backend links against libgbm for GBM buffer
management. Without libgbm-dev the vaapi feature tests fail to link.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
NVIDIA's community nvidia-vaapi-driver only supports VA-API decode,
not encode. The existing vaapi_available() check only verifies that a
VA-API display can be opened, which succeeds on NVIDIA — but the
encoder tests then fail because no encode entrypoints exist.

Add vaapi_av1_encode_available() and vaapi_h264_encode_available()
helpers that probe whether the driver actually supports encoding by
attempting to construct the encoder. Encode tests now skip gracefully
on decode-only drivers instead of failing.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 33 additional findings in Devin Review.

Staging: Open in Devin
Debug

Playground

@echo "Testing skit (GPU)..."
@cargo test -p streamkit-nodes --features gpu
@cargo test -p streamkit-engine --features gpu
@cargo test -p streamkit-nodes --features nvcodec
Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration bot Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 justfile test-skit-gpu recipe missing vaapi feature tests that CI runs

The test-skit-gpu recipe in the justfile adds cargo test -p streamkit-nodes --features nvcodec but omits cargo test -p streamkit-nodes --features vaapi, which IS run in the CI test-gpu job at .github/workflows/skit.yml:154. This means developers running just test-skit-gpu locally won't discover VA-API test failures that would be caught by CI, creating a gap between local and CI test coverage.

Staging: Open in Devin

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

streamkit-devin and others added 5 commits April 12, 2026 19:56
Vendor cros-codecs 0.0.6 and add a GbmUsage::Linear variant to
GbmDevice::new_frame().  On drivers where neither
GBM_BO_USE_HW_VIDEO_ENCODER nor GBM_BO_USE_HW_VIDEO_DECODER is
supported for contiguous NV12 allocation (e.g. Mesa iris on Intel
Tiger Lake with Mesa ≤ 23.x), the Linear variant falls back to
GBM_BO_USE_LINEAR which is universally supported.

A [patch.crates-io] entry in the workspace Cargo.toml redirects the
cros-codecs dependency to the vendored copy.  This patch should be
removed once upstream cros-codecs ships the Linear variant.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
On Intel Tiger Lake with Mesa ≤ 23.x, both GBM_BO_USE_HW_VIDEO_ENCODER
and GBM_BO_USE_HW_VIDEO_DECODER flags are unsupported for contiguous
NV12 buffer allocation, causing VA-API H.264 and AV1 encoding to fail
with 'Error allocating contiguous buffer'.

Add a three-level GBM usage probe to both encoders:
  1. GBM_BO_USE_HW_VIDEO_ENCODER  (optimal tiling)
  2. GBM_BO_USE_HW_VIDEO_DECODER  (decoder-tiled fallback)
  3. GBM_BO_USE_LINEAR            (universal fallback)

Also update the decoder allocation callbacks to try LINEAR when DECODE
fails, ensuring decode also works on affected drivers.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Cover the vendored cros-codecs directory with BSD-3-Clause (ChromiumOS
Authors) in REUSE.toml so the reuse-compliance-check CI job passes.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
On Mesa iris (Tiger Lake), gbm_bo_create rejects the NV12 fourcc with
every usage flag (HW_VIDEO_ENCODER, HW_VIDEO_DECODER, LINEAR).

Add a GbmUsage::Separated variant that bypasses native NV12 allocation
entirely: each plane is allocated as a separate R8 buffer with LINEAR,
then exported to VA-API via a multi-object VADRMPRIMESurfaceDescriptor
(one DMA-BUF FD per plane).

Changes to the vendored cros-codecs:
- GbmUsage::Separated enum variant
- new_frame(): when usage is Separated, take the per-plane R8 path
  even for formats that are normally contiguous (NV12)
- GbmExternalBufferDescriptor: store Vec<File> + object_indices instead
  of a single File, so multi-BO frames can be exported
- to_native_handle(): handle both single-BO and multi-BO frames,
  creating the correct num_objects / object_index mapping

Changes to the encoder/decoder nodes:
- Four-level GBM probe: Encode → Decode → Linear → Separated
- Decoder alloc callbacks: Decode → Linear → Separated fallback

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
staging-devin-ai-integration bot pushed a commit that referenced this pull request Apr 13, 2026
Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
The previous multi-BO approach (one R8 BO per plane) failed on Intel
iHD because vaCreateSurfaces rejected the multi-object
VADRMPRIMESurfaceDescriptor for NV12.

Switch to a single oversized R8/LINEAR buffer that is tall enough to
hold all planes end-to-end (height = coded_height × 3/2 for NV12).
The NV12 plane pitches and offsets are computed manually from the R8
stride and stored in a new SeparatedLayout struct on GbmVideoFrame.

This gives us a single DMA-BUF FD → single-object VA-API import, which
is the same proven path that contiguous NV12 allocations use.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
streamer45 added a commit that referenced this pull request Apr 13, 2026
…kills.sh integration (#298)

* docs(agents): restructure AGENTS.md with progressive disclosure and skills.sh integration

- Rewrite AGENTS.md to be concise (~90 lines, down from 196) following
  HumanLayer best practices: WHAT/WHY/HOW structure, universally-applicable
  instructions only, progressive disclosure index
- Add codebase map, tech stack summary, and verification commands table
- Create agent_docs/ directory with domain-specific guides:
  - architecture.md: crate relationships, data flow, key abstractions
  - common-pitfalls.md: known agent mistakes (WS state, perf baselines, etc.)
  - e2e-testing.md: E2E setup and headless-browser pitfalls (moved from AGENTS.md)
  - render-performance.md: profiling infrastructure docs (moved from AGENTS.md)
  - adding-plugins.md: plugin addition checklist (moved from AGENTS.md)
  - ui-development.md: React/Zustand/Jotai patterns, state management rules
  - skills-setup.md: skills.sh integration with recommended packages

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* fix(agents): correct state management docs, knip command, e2e prereqs, crate deps, and stale path

- Fix Zustand/Jotai state management descriptions in architecture.md,
  common-pitfalls.md, and ui-development.md to reflect the actual code:
  Jotai atoms are the primary store for high-frequency per-node data,
  Zustand handles pipeline structure with transitional compat writes
- Fix just knip -> just knip-ui in AGENTS.md verification table
- Add install-e2e and install-playwright prerequisites to e2e-testing.md
- Fix crate dependency diagram direction in architecture.md (arrows now
  point from dependent to dependency)
- Replace stale marketplace/PORTABILITY_REVIEW.md reference with
  verify_bundles.py in adding-plugins.md
- Fix React Query scope: drop 'marketplace' (uses useState/useEffect)

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* fix(agents): correct state layer count from three to four in ui-development.md

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* docs(agents): add CLAUDE.md that imports AGENTS.md for Claude Code compatibility

Per Anthropic docs (https://code.claude.com/docs/en/memory#agents-md),
Claude Code reads CLAUDE.md not AGENTS.md. This file imports AGENTS.md
so both Claude Code and other agents read the same instructions without
duplication.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* docs(agents): add .claude/skills/ with symlinks to agent_docs, add skit-cli to maps

- Create .claude/skills/ with thin SKILL.md wrappers (frontmatter only)
  that symlink guide.md -> agent_docs/<name>.md. This gives Claude Code
  progressive disclosure (name+description at startup, full content
  on-demand) with zero content duplication.
- Add apps/skit-cli/ (streamkit-client) to codebase maps in AGENTS.md
  and agent_docs/architecture.md.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* fix(agents): add SPDX license headers to all .claude/skills/*/SKILL.md files

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* fix(agents): clarify skills install location and fix Rust lint command

- skills-setup.md: distinguish repo-maintained .claude/skills/ (committed,
  symlinked to agent_docs/) from user-local skill installs. Warn against
  running npx skills add into the tracked tree.
- AGENTS.md: replace inaccurate cargo clippy --workspace command with
  just lint-skit, which matches the actual lint contract (per-crate
  feature flags, --all-targets, license check).

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* fix(agents): address review feedback on samples/, Rust version, Docker, and models

- samples/ row: mention assets (audio, images, fonts, Slint files) not just pipelines
- Rust version: reference rust-toolchain.toml instead of hardcoding 1.92
- Docker: note that slim images don't bundle models but full images do
- adding-plugins.md: add step to upload models to HuggingFace repo

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* fix(agents): correct Docker image tag from -full to -demo

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

* fix(agents): add tests/ directory to codebase maps (from PR #279)

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>

---------

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-authored-by: StreamKit Devin <devin@streamkit.dev>
Co-authored-by: Claudio Costa <cstcld91@gmail.com>
streamer45 and others added 12 commits April 13, 2026 21:38
The AV1 encoder was passing only the superblock-aligned coded resolution
to cros-codecs, which set render_width/render_height in the AV1 frame
header to the coded dimensions.  For non-aligned inputs (e.g. 1280×720
→ coded 1280×768), decoders would show 48 pixels of black padding at
the bottom.

Add a display_resolution field to the vendored cros-codecs AV1
EncoderConfig and use it for render_width/render_height in the frame
header predictor.  When display differs from coded dimensions, the AV1
bitstream now signals render_and_frame_size_different=1 so decoders
crop the superblock padding.

For H.264, the SpsBuilder::resolution() method already handles
macroblock alignment and frame_crop offsets automatically, but we were
passing the pre-aligned coded resolution, bypassing the cropping logic.
Now we pass the original display resolution and let SpsBuilder compute
the correct frame_crop offsets.

Closes #292

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
…path

Replace GBM buffer allocation (GbmVideoFrame + GBM_BO_USE_HW_VIDEO_ENCODER)
with direct VA surface creation + Image API upload (vaCreateImage/vaPutImage)
for both AV1 and H264 VA-API encoders.

This bypasses the GBM NV12 allocation that Mesa's iris driver rejects on
Intel Tiger Lake, eliminating the need for the vendored GbmUsage::Linear
and GbmUsage::Separated workarounds.

Changes:
- Add open_va_display() helper (VA-only, no GBM device needed)
- Add write_nv12_to_va_surface() with bounds-check error handling (#291)
- Encoder type aliases use Surface<()> instead of GbmVideoFrame
- Encoder structs drop gbm/gbm_usage fields
- Encoder::encode() creates VA surfaces and uploads via Image API
- Revert vendored gbm_video_frame.rs to upstream (drop Linear/Separated)
- Simplify decoder alloc callbacks to GbmUsage::Decode only
- Update Cargo.toml vendor comment (now only for display_resolution #292)

Decoders remain GBM-backed (GBM_BO_USE_HW_VIDEO_DECODER works on all
tested hardware including Tiger Lake).

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
These types are used by write_nv12_to_mapping (decoder helper) and
nv12_fourcc(), which are still needed even after switching encoders
to the Image API path. The test module's MockWriteMapping also
implements the WriteMapping trait.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Replace new_vaapi() with VaapiBackend::new() + new_av1()/new_h264()
construction. Surface<()> does not implement the VideoFrame trait
required by new_vaapi(), so we construct the backend directly — the
same pattern used by cros-codecs' own tests.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
These constructors are needed for direct backend construction (bypassing
new_vaapi() which requires VideoFrame trait bounds that Surface<()>
doesn't satisfy).

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- AV1 encoder: standard GbmVideoFrame + new_vaapi() path (no vendor changes)
- H264 encoder: Surface<()> + Image API + new_h264() (bypasses GBM on Tiger Lake)
- Revert all vendor changes except one-word visibility: fn new_h264 -> pub fn new_h264
- Remove VaSurface newtype (infeasible due to Send+Sync constraint)
- Remove display_resolution from vendored AV1 EncoderConfig
- Remove pub on new_av1 (not needed, AV1 uses new_vaapi())
- Update Cargo.toml and REUSE.toml comments to reflect minimal patch

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Replace the cros-codecs StatelessEncoder for H.264 encoding with a custom
VA-API shim (vaapi_h264_enc.rs) that drives cros-libva directly.  This
eliminates the need for vendoring cros-codecs entirely.

The custom encoder:
- Uses the VA-API Image API (vaCreateImage/vaPutImage) to upload NV12
  frames, bypassing GBM buffer allocation which Mesa's iris driver
  rejects for NV12 on some hardware (e.g. Intel Tiger Lake with
  Mesa <= 23.x).
- Implements IPP low-delay prediction (periodic IDR + single-reference
  P frames) with CQP rate control.
- Constructs H.264 parameter buffers (SPS/PPS/slice) directly via
  cros-libva's typed wrappers.
- Auto-detects low-power vs full encoding entrypoint.
- Handles non-MB-aligned resolutions via frame cropping offsets.

The H.264 decoder and AV1 encoder/decoder continue to use cros-codecs
0.0.6 from crates.io (no vendoring, no patches).

Removes:
- vendor/cros-codecs/ directory (~50k lines, 229 files)
- [patch.crates-io] section from workspace Cargo.toml
- REUSE.toml vendor annotation

Closes #291 (bounds-check errors already fixed in prior commits)
Refs #292 (H.264 resolution padding handled by frame cropping)

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
The BSD-3-Clause license was only needed for the vendored cros-codecs
directory, which has been removed in the previous commit.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
H264EncFrameCropOffsets in cros-libva 0.0.12 does not derive Clone.
Reconstruct it from field values instead of cloning.

Remove unused imports: GbmDevice, ReadMapping, CrosVideoFrame.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
These traits must be in scope for the decoder to call get_plane_pitch()
and map() on Arc<GbmVideoFrame>.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 38 additional findings in Devin Review.

Staging: Open in Devin
Debug

Playground

Comment on lines +379 to +396
fn get_scratch_surface(&mut self) -> Result<Surface<()>, String> {
if self.scratch_surfaces.is_empty() {
// Replenish the pool.
let new_surfaces = self
.display
.create_surfaces(
VA_RT_FORMAT_YUV420,
None,
self.coded_width,
self.coded_height,
Some(UsageHint::USAGE_HINT_ENCODER),
vec![(); SCRATCH_POOL_SIZE],
)
.map_err(|e| format!("failed to replenish scratch surfaces: {e}"))?;
self.scratch_surfaces = new_surfaces;
}
self.scratch_surfaces.pop().ok_or_else(|| "scratch surface pool exhausted".to_string())
}
Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration bot Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 H.264 encoder scratch surface pool never returns surfaces

The VaH264Encoder at crates/nodes/src/video/vaapi_h264_enc.rs:379-396 maintains a pool of scratch surfaces for reconstructed reference frames. get_scratch_surface() pops from the pool and replenishes when empty, but used surfaces (stored in self.reference) are never returned to the pool — when a new reference replaces the old one, the old Surface<()> is simply dropped. This means the pool is depleted after SCRATCH_POOL_SIZE (4) frames and every subsequent frame triggers a create_surfaces allocation. For a real-time encoding workload this means ~30 VA surface allocations per second. Not incorrect, but could cause performance issues or GPU memory fragmentation on long-running pipelines.

Staging: Open in Devin

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Some VA-API drivers (notably Intel iHD) do not auto-generate SPS/PPS
NAL units in the coded output.  The cros-libva crate does not expose
packed header buffer types (VAEncPackedHeaderParameterBuffer /
VAEncPackedHeaderDataBuffer), so we cannot request them via the VA-API.

Without SPS/PPS in the bitstream, the fMP4 muxer falls back to
placeholder parameter sets (Baseline profile, 4 bytes) that do not
match the actual Main profile stream — causing browsers to reject the
decoded output and disconnect after the first segment.

Fix: on IDR frames, check whether the coded output already contains
SPS (NAL type 7) and PPS (NAL type 8).  If not, generate conformant
SPS/PPS NALUs from the encoder parameters using a minimal exp-Golomb
bitstream writer, and prepend them to the coded data.

Includes unit tests for the BitWriter (bits, ue, se) and for the
bitstream_contains_sps_pps scanner.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

VA-API AV1 encoder: coded vs display resolution may cause visible padding

2 participants