Skip to content

Fix QVD pipeline installation instructions to use registry install sc…#190

Merged
514Ben merged 7 commits intomainfrom
fix-qvd-pipeline-instructions
Feb 3, 2026
Merged

Fix QVD pipeline installation instructions to use registry install sc…#190
514Ben merged 7 commits intomainfrom
fix-qvd-pipeline-instructions

Conversation

@514Ben
Copy link
Contributor

@514Ben 514Ben commented Feb 3, 2026

This PR contains two sets of improvements to the QVD to ClickHouse pipeline:

1. Fix Installation Instructions (Original)

Updated installation instructions across all QVD to ClickHouse pipeline READMEs to use the correct registry install command with --type pipeline flag instead of manual directory navigation.

2. Remove JSON State File Tracking (New)

Removes the JSON-based FileTracker and consolidates on ClickHouseFileTracker as the sole tracking mechanism for the QVD pipeline.

Changes:

  • Enhanced ClickHouseFileTracker: Implemented get_status_summary(), list_errors(), reset_state() methods with actual ClickHouse queries
  • Updated qvd_sync.py: Switched from FileTracker to ClickHouseFileTracker with proper method calls and timing tracking
  • Removed config fields: Deleted state_file and use_clickhouse_tracking from QvdConfig
  • Updated CLI tool: Modified init_qvd.py to use ClickHouse tracking and removed --state-file argument
  • Cleanup: Deleted app/utils/file_tracker.py and removed .qvd_state.json from .gitignore
  • Documentation: Updated all docs to reference ClickHouse tracking instead of JSON files

Benefits:

  • Single source of truth for tracking state
  • Better querying capabilities via SQL
  • Complete audit trail in database
  • No local file dependencies
  • Real-time status via API endpoint

Testing Checklist

  • Unit tests for all public methods
  • Integration tests with mock servers
  • Test reset command: python init_qvd.py --reset-state --force
  • Test workflow runs successfully with ClickHouse tracking
  • Test API endpoint: curl http://localhost:4000/consumption/qvd_status
  • Test change detection (modify file mtime and verify reprocessing)
  • Verify tracking records appear in ClickHouse table

Note

Medium Risk
Changes the pipeline’s incremental tracking and reset behavior from a local JSON file to ClickHouse queries/table truncation, which can affect reprocessing and operational safety if misused.

Overview
Switches incremental file tracking to ClickHouse-only. Removes the JSON-based FileTracker/.qvd_state.json flow (including related config/env flags) and updates qvd_sync to record processing/completed/failed states with row counts and durations.

Makes ClickHouse tracking more operational. ClickHouseFileTracker now queries ClickHouse for status summaries and failures, and init_qvd.py --reset-state truncates local.QvdFileTracking (with confirmation) instead of deleting a local state file.

Documentation/install updates. READMEs and getting-started docs are updated to use the registry install script and to direct users to the /consumption/qvd_status API and ClickHouse tracking table for monitoring and troubleshooting.

Written by Cursor Bugbot for commit e60783b. This will update automatically on new commits. Configure here.

…ript

Updated installation instructions across all QVD to ClickHouse pipeline READMEs to use the correct registry install command with --type pipeline flag instead of manual directory navigation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Feb 3, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
registry Ready Ready Preview, Comment Feb 3, 2026 6:09pm

Request Review

@boreal-cloud
Copy link

boreal-cloud bot commented Feb 3, 2026

The latest updates on your projects. Learn more about Boreal.

Project Deployment Logs Base URL Updated (UTC)
registry ❌ Error Logs https://514-demos-registry-fix-qvd-p-87eec.boreal.cloud 2026-02-03 18:09:11

Changed from showing shell commands to showing actual .env file contents with proper newlines for better readability.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@linear
Copy link

linear bot commented Feb 3, 2026

514Ben and others added 3 commits February 3, 2026 12:43
Separated model generation commands into distinct sections with headers for better readability and proper line breaks.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Separated all troubleshooting commands into distinct sections with headers for better readability and proper line breaks.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Separated each environment variable into its own code block under filtering and processing categories for better readability.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Separated each command into its own code block with descriptive header for better readability and proper line breaks.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit removes the JSON-based FileTracker and consolidates on
ClickHouseFileTracker as the sole tracking mechanism for the QVD pipeline.

Changes:
- Implement get_status_summary(), list_errors(), reset_state() in ClickHouseFileTracker
- Update qvd_sync.py to use ClickHouseFileTracker with proper method calls
- Remove state_file and use_clickhouse_tracking fields from QvdConfig
- Update init_qvd.py CLI tool to use ClickHouse tracking
- Delete app/utils/file_tracker.py
- Remove .qvd_state.json from .gitignore
- Update all documentation to reference ClickHouse tracking instead of JSON files

Benefits:
- Single source of truth for tracking state
- Better querying capabilities via SQL
- Complete audit trail in database
- No local file dependencies
- Real-time status via API endpoint

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@514Ben 514Ben merged commit 4e68786 into main Feb 3, 2026
7 checks passed
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

print(f" ✗ Error processing {file_name}: {e}")
tracker.mark_error(file_path, str(e))
processing_duration = time.time() - start_time
tracker.mark_failed(file_path, file_info, table_name, str(e), processing_duration)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undefined table_name in exception handler causes crash

High Severity

The exception handler at line 123 references table_name, but this variable is first assigned inside the try block at line 100. If derive_table_name() or any earlier code in the try block raises an exception, table_name won't be defined, causing a NameError that crashes the error handling instead of properly recording the failure. The old code used mark_error() which didn't require table_name, but the new mark_failed() method does.

Fix in Cursor Fix in Web

print(f"\n[{idx}/{len(files_to_process)}] Processing: {file_name}")

start_time = time.time()
file_info = reader.get_file_info(file_path)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncaught exception from get_file_info outside try block

Medium Severity

The call to reader.get_file_info(file_path) at line 96 is now outside the try block, whereas previously it was inside. If this call fails (network timeout, permission denied, file deleted between listing and processing), the exception propagates uncaught. This bypasses error tracking entirely and may terminate the sync loop, leaving remaining files unprocessed. The change was made to provide file_info to the error handler, but it now lacks exception handling of its own.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant