Skip to content

EGL BadNativeWindow Error: Surface API Architecture Mismatch #288

@Vanuan

Description

@Vanuan

EGL BadNativeWindow Error: Surface API Architecture Mismatch

Motivation

Someone asked in Zed's discord (#GPUI) whether Zed (and other GPUI applications) can run with OpenGL ES instead of Vulkan on Linux. While blade-graphics supports GLES, attempting to run with the gles config flag results in an immediate crash:

thread 'main' (881883) panicked at blade-graphics-0.7.0/src/gles/egl.rs:431:26:
called `Result::unwrap()` on an `Err` value: BadNativeWindow

This issue affects any windowed application using blade's GLES backend on Linux with Wayland or X11.

Root Cause Analysis

The crash stems from an architectural mismatch between blade's Surface API design and EGL's initialization requirements:

The Problem

Blade's Surface API (works fine in Vulkan, Metal, headless and ANGLE EGL):

Context::init() → create platform-agnostic context
create_surface(window) → create surface for specific window

EGL's Requirements:

Context::init(display) → requires platform-specific display (X11/Wayland/etc.) or compositor (Mutter/Sway)
create_surface(window) → create surface on that display

What's Happening

  1. During context initialization ([egl.rs:154](https://github.com/kvark/blade/blob/main/blade-graphics/src/gles/egl.rs#L154)), EGL selects a display platform based solely on available extensions, without any window system information:
pub unsafe fn init(desc: crate::ContextDesc) -> Result<Self, crate::NotSupportedError> {
    // No window handle available here!
    let display = if let Some(egl1_5) = egl.upcast::<egl::EGL1_5>() {
        // ...
        } else if client_extensions.contains("EGL_MESA_platform_surfaceless") {
            // ❌ Selects surfaceless even for windowed apps!
            egl1_5.get_platform_display(EGL_PLATFORM_SURFACELESS_MESA, ...)
        }
    // ...
}
  1. The surfaceless platform is designed for headless/offscreen rendering and cannot create window surfaces.

  2. During surface creation ([egl.rs:322](https://github.com/kvark/blade/blob/main/blade-graphics/src/gles/egl.rs#L322)), when the code attempts to create a window surface on the incompatible surfaceless display:

egl.create_platform_window_surface(
    inner.egl.display,  // ❌ Surfaceless display!
    inner.egl.config,
    native_window_ptr,  // Wayland/X11 window
    &attributes_usize,
)
.unwrap()  // 💥 Panics with BadNativeWindow

Why This Regression Occurred

The platform-specific display initialization code was removed during the Surface API refactoring (see [PR #203](https://github.com/kvark/blade/pull/203/files#diff-fd9eb1747f951b0069bfdd20edb28c3c90a0366a3ebd6898f2f24c7383fbb899)). The old init_windowed function properly detected X11/Wayland displays, but the unified init function lost this logic.

Evidence of the regression:

const _EGL_PLATFORM_WAYLAND_KHR: u32 = 0x31D8;
const _EGL_PLATFORM_X11_KHR: u32 = 0x31D5;
const _EGL_PLATFORM_XCB_EXT: u32 = 0x31DC;

Proposed Solutions

I've evaluated three approaches to fix this architectural mismatch:

Option 1: Lazy EGL Initialization (Recommended) ⭐

Defer EGL context creation until the first surface is created, when proper display information is available.

Advantages:

  • ✅ Maintains API compatibility (no breaking changes)
  • ✅ Fits EGL's natural model (context tied to actual display)
  • ✅ Handles multi-window correctly (all surfaces from same display)
  • ✅ Follows the pattern already working in Vulkan backend

Diagram:

Image

Implementation sketch:

struct ContextInner {
    egl: Option<EglContext>,  // Uninitialized until first surface
    glow: Option<glow::Context>,
    // ...
}

impl super::Context {
    fn ensure_egl_initialized(
        &self, 
        display_handle: raw_window_handle::RawDisplayHandle
    ) -> Result<()> {
        let mut inner = self.platform.inner.lock().unwrap();
        if inner.egl.is_none() {
            // Initialize EGL with proper platform display
            let egl_context = match display_handle {
                RawDisplayHandle::Xlib(h) => {
                    EglContext::init_with_platform(EGL_PLATFORM_X11_KHR, h.display)?
                }
                RawDisplayHandle::Wayland(h) => {
                    EglContext::init_with_platform(EGL_PLATFORM_WAYLAND_KHR, h.display)?
                }
                // ... other platforms
            };
            inner.egl = Some(egl_context);
            inner.glow = Some(/* load GL functions */);
        }
        Ok(())
    }
    
    pub fn create_surface<I: HasWindowHandle + HasDisplayHandle>(
        &self,
        window: I,
    ) -> Result<super::Surface, NotSupportedError> {
        // Initialize EGL with actual display information
        self.ensure_egl_initialized(window.display_handle()?.as_raw())?;
        // ... existing surface creation code
    }
}

Option 2: Platform Display in ContextDesc

Add platform-specific display information to ContextDesc:

pub struct ContextDesc {
    pub presentation: bool,
    // ...
    #[cfg(gles)]
    pub platform_display: Option<PlatformDisplay>,
}

Disadvantages:

  • Requires breaking API changes
  • Forces applications to handle platform detection
  • Less ergonomic for common cases

Option 3: Extension Trait for GLES

Create platform-specific initialization API:

#[cfg(gles)]
pub trait GlesContextExt {
    fn init_with_display<I: HasDisplayHandle>(
        display: I,
        desc: ContextDesc,
    ) -> Result<Self, NotSupportedError>;
}

Disadvantages:

  • Requires different initialization path for GLES vs Vulkan
  • More complex API surface
  • Harder to maintain backend parity

Recommendation

Option 1 (Lazy Initialization) best addresses the architectural mismatch while maintaining API compatibility. It acknowledges that EGL and Vulkan have fundamentally different initialization models and adapts the EGL backend accordingly.

The Vulkan backend already demonstrates the correct pattern of extracting display handles during surface creation ([surface.rs:69-72](https://github.com/kvark/blade/blob/main/blade-render/src/surface.rs#L69-L72)). EGL should follow this same timing, just with deferred context initialization.

Additional Context

  • Running on: Linux with GNOME (Mutter/Wayland compositor)
  • The ContextDesc.presentation flag already exists but is currently unused during EGL platform selection

WDYT?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions