A modern web application that automatically generates DBT (Data Build Tool) rules from CSV and Excel files using Large Language Models (LLMs). Upload your data files and get comprehensive schema analysis, column descriptions, and production-ready DBT configurations.
- Multi-format Support: Upload CSV or Excel files with multiple sheet support
- Intelligent Schema Generation: Automatic schema inference from file headers and sample data
- DBT Rules Generation: Complete DBT models, tests, and configurations
- Interactive UI: Clean, responsive interface with tabbed results view
- Export Capabilities: Download generated schemas and rules as structured JSON files
- Entity Relationship Diagrams: Interactive visualization of database schemas and relationships
- Chat Interface: Interactive Q&A about your data and schema
- Real-time Processing: Live status updates during file processing
- Privacy-aware Analysis: Column descriptions include privacy indicators
- Multiple LLM Providers: Support for OpenAI, OpenRouter, Ollama, and custom APIs
- Streaming Responses: Real-time streaming of LLM outputs as they're generated
- Interactive ER Diagrams: Drag-and-drop entity relationship diagrams with GoJS
- DBT Local Development: Complete DBT project generation for local development environments
- Sample Dataset Viewer: Built-in office viewer for previewing sample datasets directly in browser
The application is built with:
- Frontend: Modern ES6 modules with Bootstrap 5 UI
- File Processing: Client-side CSV/Excel parsing with XLSX library
- LLM Integration:
bootstrap-llm-providerfor flexible API configuration - Streaming:
asyncLLMfor real-time streaming of LLM responses - JSON Handling:
partial-jsonfor parsing incomplete JSON during streaming - Modular Design: Separated concerns across focused JavaScript modules
- Visualization: GoJS library for interactive entity relationship diagrams
schemaforge/
βββ index.html # Main application interface
βββ config.json # Sample dataset configurations
βββ js/
β βββ main.js # Application entry point and orchestration
β βββ file-parser.js # CSV/Excel file parsing logic
β βββ llm-service.js # LLM API integration and prompts
β βββ ui.js # DOM manipulation and rendering
β βββ diagram.js # Entity relationship diagram functionality
β βββ dbt-generation.js # DBT rules generation and chat functionality
β βββ dbt-local-service.js # DBT local development project creation
β βββ data-ingestion.js # Data ingestion utilities and configurations
β βββ utils.js # Shared utility functions
βββ prompts/ # LLM prompt templates
β βββ schema-generation.md
β βββ dbt-rules-generation.md
β βββ dbt-chat-system.md
βββ data/ # Sample data files
βββ README.md # This file
- Modern web browser with ES6 module support
- LLM API access (OpenAI, OpenRouter, or compatible provider)
-
Clone the repository
git clone <repository-url> cd schemaforge
-
Open the application
# Serve locally (recommended) python -m http.server 8000 # Then open http://localhost:8000 # Or open directly in browser open index.html
-
Configure LLM Provider
- Click "Configure LLM Provider" in the interface
- Enter your API key and select a provider
- Supported providers:
- OpenAI (
https://api.openai.com/v1) - OpenRouter (
https://openrouter.com/api/v1) - Ollama (
http://localhost:11434/v1) - Any OpenAI-compatible API
- OpenAI (
- Select a CSV or Excel file using the file upload section
- Files with multiple sheets are automatically detected and processed
- Supported formats:
.csv,.xlsx
- The application automatically extracts headers and sample data
- LLM analyzes the structure and generates comprehensive schema information
- View results in the "Schema Overview" tab
- Real-time streaming: Watch as schema information appears incrementally
- Review detailed column descriptions in the "Column Descriptions" tab
- Privacy indicators help identify sensitive data fields
- Inferred data types and metadata are displayed
- Navigate to the "ER Diagram" tab to see an interactive visualization
- Tables are shown as nodes with their columns listed
- Relationships between tables are displayed as connecting links
- Drag nodes to rearrange the diagram for better visualization
- Use zoom controls to focus on specific parts of the schema
- Primary keys (PK) and foreign keys (FK) are clearly marked
- Click "Generate DBT Rules" to create DBT configurations
- Watch as rules stream in real-time to the "DBT Rules" tab
- Includes models, tests, and data quality configurations
- Production-ready YAML and SQL code
- Click "Export DBT Local" to generate a complete DBT project for local development
- Downloads a ZIP file containing:
- Complete DBT project structure (
dbt_project.yml,profiles.yml,packages.yml) - SQL model files with proper seed references
- Schema configuration with data quality tests (filtered for existing columns)
- Automated setup script (
setup_dbt.sh) for one-command deployment - Documentation files and README with setup instructions
- Complete DBT project structure (
- Production-ready: Includes DuckDB configuration and automated dependency installation
- Data validation: Only generates tests for columns that actually exist in your data
- Use the chat interface to ask questions about your data
- Request modifications to the generated DBT rules
- Perform exploratory data analysis through natural language
- Streaming responses: See the assistant's responses appear in real-time
- Built-in Office Viewer: Preview sample datasets directly in browser without downloading
- Multiple Format Support:
- Excel files (.xlsx) β Microsoft Office Web Viewer
- CSV files β Raw text view in browser
- One-Click Preview: Click the "ποΈ View" button on any sample dataset card
- New Tab Opening: All previews open in new tabs for seamless workflow
- Download the complete analysis as a structured JSON file
- Includes schema, column descriptions, and DBT configurations
- Or export the full DBT local development project for immediate use
The application uses a multi-stage LLM process:
- Schema Generation: Analyzes file structure and sample data to create comprehensive schema
- DBT Rules Generation: Transforms schema into production-ready DBT configurations
- DBT Local Project Creation: Generates complete, deployable DBT projects with automated setup
SchemaForge now includes comprehensive DBT local development capabilities:
- Automated Project Setup: One-command deployment with
setup_dbt.shscript - Column Validation: Smart filtering ensures tests are only created for columns that exist in your actual data
- DuckDB Integration: Pre-configured for local development with embedded database
- Package Management: Automatic installation of
dbt-utilsand other dependencies - Data Conversion: Automatic Excel/CSV to seed conversion with proper sanitization
- Documentation: Complete project documentation with setup instructions
- Error Prevention: Eliminates "column not found" errors through validation
The application uses GoJS to create interactive entity relationship diagrams that dynamically visualize:
- Tables as nodes with expandable column lists
- Relationships between tables as connecting links
- Primary and foreign keys with clear visual indicators
- Automatic layout with force-directed positioning
The application provides seamless dataset preview capabilities:
- Microsoft Office Web Viewer Integration: Uses
view.officeapps.live.comfor Excel file viewing - CSV Direct Preview: Opens CSV files directly in browser for immediate viewing
- Smart Format Detection: Automatically determines appropriate viewer based on file extension
- Fallback Handling: Graceful degradation for unsupported formats
The application implements real-time streaming of LLM responses to provide immediate feedback during processing. This enables:
- Progressive rendering of schema information as it's generated
- Live updates to the UI during lengthy operations
- Improved user experience with visual feedback
The application is designed to work with multiple LLM providers through a flexible configuration system:
- OpenAI API
- OpenRouter
- Ollama (local deployment)
- Any OpenAI-compatible API endpoint
- CSV: Native JavaScript parsing with automatic delimiter detection
- Excel: XLSX library for multi-sheet support
- Error Handling: Graceful fallbacks for malformed files
- Data Engineers: Generate DBT boilerplate for new data sources with complete local development setup
- Analytics Teams: Quick schema documentation, data quality rules, and immediate DBT project deployment
- Data Scientists: Understand data structure before analysis with automated DBT environment setup
- Consultants: Rapid data assessment, documentation, and client-ready DBT projects
- Database Designers: Visualize and refine database schemas with production-ready implementation
- DevOps Teams: Automated DBT project scaffolding with infrastructure-as-code approach
- Business Analysts: Preview and explore sample datasets instantly without software installations
- Auditors: Automated detection and explanation of data mismatches for compliance reporting
- ES6 modules with modern JavaScript features
- Functional programming approach (no classes)
- Bootstrap 5 for styling (no custom CSS)
- Modular architecture with single responsibility principle
# Format JavaScript and Markdown
npx prettier@3.5 --print-width=120 '**/*.js' '**/*.md'
# Format HTML
npx js-beautify@1 '**/*.html' --type html --replace --indent-size 2MIT License - see the LICENSE file for details.
- Fork the repository
- Create a feature branch
- Submit a pull request
For issues and questions:
- Check the chat interface for data-related queries
- Review the LLM provider configuration for API issues
- Ensure file formats are supported (CSV, XLSX only)
- For DBT local development issues, verify Python dependencies and file permissions on the setup script
- ποΈ Built-in Office Viewer: Preview datasets directly in browser without downloads
- π Smart Format Handling: Automatic viewer selection (Office Web Viewer for Excel, direct view for CSV)
- β‘ One-Click Access: Instant preview with "View" buttons on sample dataset cards
- π Cross-Platform: Works on all modern browsers with no software requirements
- Complete DBT Project Generation: Export ready-to-use DBT projects with all necessary configuration files
- Automated Setup: One-command deployment with intelligent dependency management
- Column Validation: Smart test generation that prevents "column not found" errors
- DuckDB Integration: Pre-configured local development environment
- Production Ready: Includes proper SQL model generation, schema validation, and documentation