sisglib is designed specifically for compositional (asset-based) 3D indoor scene generation - constructing scenes by selecting, placing, and arranging discrete 3D assets rather than generating scenes as unified representations.
This document explains what compositional scene generation is, how it differs from other approaches, and why it's the preferred paradigm for many real-world applications.
Compositional scene generation creates 3D scenes by:
- Selecting individual 3D assets (furniture, objects, architectural elements)
- Arranging assets to satisfy spatial constraints and functional relationships
- Placing these assets in 3D space with appropriate positions and orientations
- Composing the final scene from discrete, reusable components
Each asset remains a distinct, manipulable entity with its own geometry, materials, and metadata.
The core advantage: When you use pre-existing assets, they come with everything already built-in - physics, logic, interactivity, and functionality.
Real-world examples:
- Game assets: A Harry Potter wand asset comes with spell-casting logic, physics interactions, particle effects, and animations already configured
- Architectural assets: A door from a BIM library includes opening mechanisms, accessibility compliance data, fire ratings, and construction specifications
- Interactive objects: Kitchen appliances with working animations, collision meshes, sound effects, and user interaction scripts
Scene generated with compositional approach, imported into Unity - doors open, objects have physics and interactivity built-in.
Why this matters: Neural or unified mesh approaches regenerate geometry from scratch, losing all this functionality. You'd need to manually re-add physics, re-script interactions, and re-create optimizations. Compositional generation preserves the work already invested in assets.
Architects work with pre-existing BIM/CAD models where furniture and fixtures come from manufacturer catalogs with real-world dimensions, materials data, and compliance information. Game studios build entire ecosystems around licensed asset libraries - Unity Asset Store, Quixel Megascans, proprietary studios assets - all with pre-configured LOD levels, collision meshes, and platform-specific optimizations.
Industry-standard libraries (Sketchfab, TurboSquid, Quixel, Unity Asset Store) provide thousands of pre-cleared assets with usage rights, consistent art styles, and production-ready quality. Neural approaches force you to regenerate assets from scratch, immediately invalidating licensing agreements and losing years of optimization work.
Each object is a distinct entity (e.g. chair, table, lamp) and can be moved, replaced, edited, or removed individually. Enables designer-in-the-loop workflows, automated constraint checking (clearances, accessibility), and iterative refinement without full regeneration.
Scenes represented as JSON-like scene states (e.g., sissf) and standard formats (GLTF/GLB, USD, FBX). Same scene works in web viewers, game engines, CAD tools, and VR/AR platforms.
Apply object-level constraints ("sofa against wall"), semantic rules ("bedside tables flank bed"), and enforce physical plausibility through discrete collision detection. Neural methods struggle with precise spatial/semantic constraints.
Standard GPU pipelines with LOD (level of detail) swapping, instancing for duplicated objects, and on-demand asset streaming. Neural representations require expensive per-frame rendering with limited optimization.
Seamless integration with game engines (Unity, Unreal, Godot), rendering engines (V-Ray, Arnold, Cycles), physics engines (PhysX, Bullet), and VR/AR SDKs - all expect asset-based scene graphs.
Neural Scene Representations (NeRFs, Gaussian Splatting): Continuous neural fields or point clouds that excel at photorealistic capture and view synthesis from images, but lack object-level editability, cannot reuse licensed assets, and struggle with standard 3D export formats.
Unified Mesh-Based Generation: Entire scenes as single meshes - no object-level manipulation, incompatible with asset-based workflows, difficult to apply per-object materials or physics.
Pixel-Based Generators: 2D image synthesis without true 3D geometry - cannot navigate from arbitrary viewpoints or export for VR/architecture/games.
- Neural methods excel at: Reconstructing real-world scenes from photos, view synthesis, relighting captured environments
- Compositional methods excel at: Creating new scenes with editable and interactive objects, architectural visualization, game development (with functionality), simulation (with physics), training data synthesis
These approaches are complementary - neural methods capture reality, compositional methods create new realities.
sisglib focuses exclusively on compositional 3D indoor scene generation because:
- It aligns with real-world production workflows in architecture and gaming
- It enables interoperability through standardized formats like sissf
- It provides interpretability and control essential for design applications
- It supports reusability of existing asset ecosystems and licenses
- It facilitates research reproducibility through explicit, inspectable scene representations
By standardizing compositional scene generation, sisglib aims to accelerate research while maintaining compatibility with industry practices.
- sissf - Spatial Intelligence Scene State Format - Standard for compositional scene representation
- Project Vision - sisglib's goals and philosophy
- Custom Strategies Guide - Implementing compositional generation methods

