Skip to content

Split Wonderware pipeline into reusable connector + pipeline#194

Draft
514Ben wants to merge 8 commits intomainfrom
wonderware-pipeline
Draft

Split Wonderware pipeline into reusable connector + pipeline#194
514Ben wants to merge 8 commits intomainfrom
wonderware-pipeline

Conversation

@514Ben
Copy link
Contributor

@514Ben 514Ben commented Feb 6, 2026

Summary

This PR splits the Wonderware pipeline into two components:

  1. Reusable Connector (connector-registry/wonderware/) - Handles data access from Wonderware Historian
  2. Focused Pipeline (pipeline-registry/wonderware_to_clickhouse/) - Handles ClickHouse storage

This follows the established SAP HANA CDC pattern and enables the Wonderware connector to be reused by other pipelines.


🎯 Changes

✨ New: Wonderware Connector

Created complete connector in connector-registry/wonderware/ with 4-level hierarchy:

  • Root → Author (514-labs) → Language (python) → Implementation (default)

Core modules:

  • config.py - Connection configuration (host, port, database, credentials)
  • connection_manager.py - SQLAlchemy connection pool with circuit breaker pattern
  • reader.py - Data extraction (discover_tags, fetch_history_data, get_tag_count, test_connection)
  • connector.py - High-level facade providing simple API
  • models.py - Domain models (TagInfo, HistoryRow, ConnectorStatus)
  • Complete test suite with SQLite mock fixtures

Metadata:

  • Category: historian
  • Capabilities: Extract ✅, Transform ❌, Load ❌
  • Tags: historian, scada, aveva, wonderware, sql-server, industrial

🔄 Updated: Pipeline

Configuration changes:

  • wonderware_config.py: Renamed WonderwareConfigPipelineConfig
  • Removed all connection fields (now in connector)
  • Changed env prefix: WONDERWARE_WONDERWARE_PIPELINE_
  • Kept only pipeline-specific fields: tag_chunk_size, backfill_chunk_days, sync_schedule, etc.

Workflow updates:

  • wonderware_sync.py: Uses WonderwareConnector with clean imports
  • wonderware_backfill.py: Uses WonderwareConnector with clean imports
  • Added _get_cached_tags() helper for Redis caching in sync workflow

Infrastructure:

  • Added symlink: app/wonderware → connector source (enables clean imports without path manipulation)

Tests:

  • Updated test_wonderware_config.py to test PipelineConfig
  • Updated conftest.py with new env var prefix
  • Added test to verify connection fields are NOT in pipeline config

🗑️ Deleted

  • app/workflows/lib/wonderware_client.py - Logic moved to connector

📋 Environment Variables

Connector (unchanged from original)

WONDERWARE_HOST           # Required
WONDERWARE_PORT           # Default: 1433
WONDERWARE_DATABASE       # Default: Runtime
WONDERWARE_USERNAME       
WONDERWARE_PASSWORD       
WONDERWARE_DRIVER         # Default: mssql+pytds

Pipeline (new prefix)

WONDERWARE_PIPELINE_TAG_CHUNK_SIZE           # Default: 10
WONDERWARE_PIPELINE_BACKFILL_CHUNK_DAYS      # Default: 1
WONDERWARE_PIPELINE_SYNC_SCHEDULE            # Default: */1 * * * *
WONDERWARE_PIPELINE_BACKFILL_OLDEST_TIME     # Default: 2025-01-01 00:00:00
WONDERWARE_PIPELINE_TAG_CACHE_TTL            # Default: 3600

✅ Benefits

  1. Reusability - Wonderware connector can be used by other pipelines
  2. Separation of Concerns - Connector handles data access, pipeline handles storage
  3. Maintainability - Updates to connection logic only need to happen in the connector
  4. Consistency - Follows established SAP HANA CDC pattern
  5. Testing - Each component has independent test suite

🧪 Verification

  • ✅ All connector Python files compile without syntax errors
  • ✅ All pipeline Python files compile without syntax errors
  • ✅ Symlink resolves correctly to connector source
  • ✅ Import chain works: from wonderware import WonderwareConnector
  • ✅ No path manipulation required (clean imports)

📊 Files Changed

  • 28 files changed, 2,183 insertions(+)
  • 18 new files in connector-registry
  • 10 files in pipeline (new/modified)

🔍 Review Notes

Please verify:

  • Connector metadata is correct
  • Symlink works in your environment
  • Environment variable naming is acceptable
  • Tests are comprehensive
  • Documentation is clear

## Changes

### New: Wonderware Connector (connector-registry/wonderware/)
- Created 4-level hierarchy following SAP HANA CDC pattern
- **config.py**: Connection configuration (host, port, database, credentials)
- **connection_manager.py**: SQLAlchemy connection pool with circuit breaker
- **reader.py**: Data extraction (discover_tags, fetch_history_data)
- **connector.py**: High-level facade providing simple API
- **models.py**: Domain models (TagInfo, HistoryRow, ConnectorStatus)
- Complete test suite with mock fixtures

### Updated: Pipeline (pipeline-registry/wonderware_to_clickhouse/)
- **wonderware_config.py**: Renamed to PipelineConfig, removed connection fields
  - Changed env prefix to WONDERWARE_PIPELINE_
  - Kept only: tag_chunk_size, backfill_chunk_days, sync_schedule, etc.
- **wonderware_sync.py**: Updated to use WonderwareConnector
- **wonderware_backfill.py**: Updated to use WonderwareConnector
- **app/wonderware**: Added symlink to connector (clean imports, no path manipulation)
- **tests**: Updated to test PipelineConfig without connection fields

### Deleted
- **wonderware_client.py**: Logic moved to connector

## Benefits
- Connector can be reused by other pipelines
- Clear separation: connector handles data access, pipeline handles ClickHouse
- Follows established patterns (SAP HANA CDC)
- Each component has independent tests

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Feb 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
registry Ready Ready Preview, Comment Feb 6, 2026 11:24pm

Request Review

## Documentation Added

### Main Documentation
- **README.md**: Overview, features, installation, quick start, API summary, examples, troubleshooting

### Detailed Guides (docs/)
- **configuration.md**: Complete configuration reference
  - Environment variables
  - Connection settings
  - Security best practices
  - Advanced configuration (circuit breaker, retry logic)
  - Troubleshooting connection issues

- **getting-started.md**: Step-by-step tutorial
  - Installation and setup
  - Connection testing
  - Tag discovery
  - Historical data fetching
  - Batch processing patterns
  - Incremental sync patterns
  - Error handling examples
  - Common usage patterns

- **api-reference.md**: Complete API documentation
  - WonderwareConnector class
  - WonderwareConfig class
  - WonderwareReader class
  - ConnectionPool class
  - Data models (TagInfo, HistoryRow, ConnectorStatus)
  - Exceptions
  - Type hints and advanced usage

## Coverage

- ✅ Installation instructions (standalone + bundled)
- ✅ Configuration guide with all options
- ✅ Quick start examples
- ✅ Complete API reference
- ✅ Usage patterns and best practices
- ✅ Error handling examples
- ✅ Security best practices
- ✅ Troubleshooting guide

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
## Changes

Updated pipeline README.md to reflect the new architecture where the
pipeline uses the reusable Wonderware connector:

### Architecture Updates
- Added "What's New" section explaining connector split
- Updated component diagram showing connector as external dependency
- Updated data flow diagram with connector layer
- Clarified separation: connector handles data access, pipeline handles storage

### Configuration Updates
- Split configuration section into:
  - Connector config (WONDERWARE_* prefix)
  - Pipeline config (WONDERWARE_PIPELINE_* prefix)
- Added links to connector configuration documentation
- Clarified which settings belong where

### Code References Updates
- Replaced references to `wonderware_client.py` with connector API
- Updated workflow descriptions to show connector usage
- Added import examples: `from wonderware import WonderwareConnector`
- Removed outdated `WonderwareClient` references

### Troubleshooting Updates
- Added section for connector-specific issues
- Added links to connector troubleshooting guide
- Updated connection testing examples to use connector

### Documentation Links
- Added "Related Documentation" section with links to:
  - Connector README
  - Connector configuration guide
  - Connector API reference

## Impact

- Users now understand the two-component architecture
- Clear separation between connector and pipeline configuration
- Updated examples use the new connector API
- All internal references are now accurate

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
## Documentation Added

### getting-started.md (Updated)
Complete step-by-step tutorial covering:
- Prerequisites and installation
- Configuration (split connector vs pipeline)
- Starting the pipeline and testing connection
- Running historical backfill with Temporal UI
- Monitoring with Temporal UI and APIs
- Querying data via REST API and ClickHouse
- Next steps and troubleshooting

### configuration.md (New)
Detailed configuration reference with:
- Configuration overview (two-namespace model)
- Connector configuration (WONDERWARE_*)
- Pipeline configuration (WONDERWARE_PIPELINE_*)
- ClickHouse and Redis configuration
- Performance tuning guidelines
- Security configuration best practices
- Environment-specific configurations
- Configuration validation scripts

### workflows.md (New)
Complete workflow documentation:
- Backfill workflow (4-task DAG)
  - Task-by-task breakdown with code
  - Performance optimization tips
  - Best practices
- Sync workflow (single task)
  - Watermark logic explanation
  - Caching strategy
  - Sync frequency tuning
- Workflow management (pause/cancel/retry)
- Error handling and debugging
- Monitoring and alerting

### apis.md (New)
Complete API reference:
- All REST endpoints documented
- Request/response formats
- Query parameters
- Example curl, Python, JavaScript
- Error handling
- Rate limiting guidance
- Real-world usage examples (dashboard, export, monitoring)
- Grafana integration guide

## Coverage

✅ Installation and setup
✅ Configuration (connector + pipeline)
✅ Workflows (backfill + sync)
✅ APIs (all endpoints)
✅ Monitoring and debugging
✅ Performance tuning
✅ Security best practices
✅ Production deployment guidance
✅ Troubleshooting guides
✅ Code examples in multiple languages

## Total Documentation

- **4 comprehensive guides** (~600+ lines each)
- **~2,400 lines** of detailed documentation
- **Numerous code examples** (Python, Bash, SQL, JavaScript)
- **Diagrams and architecture explanations**
- **Links to connector documentation**

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace incorrect npm installation command with the correct bash script
installation method for Moose CLI.

Changes:
- docs/getting-started.md: Updated Moose installation from npm to bash script

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
514Ben and others added 2 commits February 6, 2026 17:42
Changes:
- Updated all .python-version files to 3.13
- Updated all README.md and getting-started.md files
- Updated setup.py python_requires and classifiers
- Affects: wonderware_to_clickhouse, qvd_to_clickhouse, sap_hana_cdc_to_clickhouse, and sap_hana_cdc connector

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Removed "Option B: Install Dependencies Only" section from the connector
getting-started guide. The connector should be installed from the registry,
not as standalone dependencies.

Changes:
- docs/getting-started.md: Removed Option B section
- Simplified to single installation method

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Removed pip-related instructions since we're not targeting new Python users.
Users are expected to already have pip installed with Python.

Changes:
- wonderware_to_clickhouse/docs/getting-started.md: Removed pip prerequisite section
- qvd_to_clickhouse/docs/getting-started.md: Removed pip/uv package manager line

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant