Fix QVD pipeline installation instructions to use registry install sc…#190
Fix QVD pipeline installation instructions to use registry install sc…#190
Conversation
…ript Updated installation instructions across all QVD to ClickHouse pipeline READMEs to use the correct registry install command with --type pipeline flag instead of manual directory navigation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
The latest updates on your projects. Learn more about Boreal.
|
Changed from showing shell commands to showing actual .env file contents with proper newlines for better readability. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Separated model generation commands into distinct sections with headers for better readability and proper line breaks. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Separated all troubleshooting commands into distinct sections with headers for better readability and proper line breaks. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Separated each environment variable into its own code block under filtering and processing categories for better readability. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Separated each command into its own code block with descriptive header for better readability and proper line breaks. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit removes the JSON-based FileTracker and consolidates on ClickHouseFileTracker as the sole tracking mechanism for the QVD pipeline. Changes: - Implement get_status_summary(), list_errors(), reset_state() in ClickHouseFileTracker - Update qvd_sync.py to use ClickHouseFileTracker with proper method calls - Remove state_file and use_clickhouse_tracking fields from QvdConfig - Update init_qvd.py CLI tool to use ClickHouse tracking - Delete app/utils/file_tracker.py - Remove .qvd_state.json from .gitignore - Update all documentation to reference ClickHouse tracking instead of JSON files Benefits: - Single source of truth for tracking state - Better querying capabilities via SQL - Complete audit trail in database - No local file dependencies - Real-time status via API endpoint Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| print(f" ✗ Error processing {file_name}: {e}") | ||
| tracker.mark_error(file_path, str(e)) | ||
| processing_duration = time.time() - start_time | ||
| tracker.mark_failed(file_path, file_info, table_name, str(e), processing_duration) |
There was a problem hiding this comment.
Undefined table_name in exception handler causes crash
High Severity
The exception handler at line 123 references table_name, but this variable is first assigned inside the try block at line 100. If derive_table_name() or any earlier code in the try block raises an exception, table_name won't be defined, causing a NameError that crashes the error handling instead of properly recording the failure. The old code used mark_error() which didn't require table_name, but the new mark_failed() method does.
| print(f"\n[{idx}/{len(files_to_process)}] Processing: {file_name}") | ||
|
|
||
| start_time = time.time() | ||
| file_info = reader.get_file_info(file_path) |
There was a problem hiding this comment.
Uncaught exception from get_file_info outside try block
Medium Severity
The call to reader.get_file_info(file_path) at line 96 is now outside the try block, whereas previously it was inside. If this call fails (network timeout, permission denied, file deleted between listing and processing), the exception propagates uncaught. This bypasses error tracking entirely and may terminate the sync loop, leaving remaining files unprocessed. The change was made to provide file_info to the error handler, but it now lacks exception handling of its own.


This PR contains two sets of improvements to the QVD to ClickHouse pipeline:
1. Fix Installation Instructions (Original)
Updated installation instructions across all QVD to ClickHouse pipeline READMEs to use the correct registry install command with
--type pipelineflag instead of manual directory navigation.2. Remove JSON State File Tracking (New)
Removes the JSON-based
FileTrackerand consolidates onClickHouseFileTrackeras the sole tracking mechanism for the QVD pipeline.Changes:
get_status_summary(),list_errors(),reset_state()methods with actual ClickHouse queriesFileTrackertoClickHouseFileTrackerwith proper method calls and timing trackingstate_fileanduse_clickhouse_trackingfromQvdConfiginit_qvd.pyto use ClickHouse tracking and removed--state-fileargumentapp/utils/file_tracker.pyand removed.qvd_state.jsonfrom.gitignoreBenefits:
Testing Checklist
python init_qvd.py --reset-state --forcecurl http://localhost:4000/consumption/qvd_statusNote
Medium Risk
Changes the pipeline’s incremental tracking and reset behavior from a local JSON file to ClickHouse queries/table truncation, which can affect reprocessing and operational safety if misused.
Overview
Switches incremental file tracking to ClickHouse-only. Removes the JSON-based
FileTracker/.qvd_state.jsonflow (including related config/env flags) and updatesqvd_syncto recordprocessing/completed/failedstates with row counts and durations.Makes ClickHouse tracking more operational.
ClickHouseFileTrackernow queries ClickHouse for status summaries and failures, andinit_qvd.py --reset-statetruncateslocal.QvdFileTracking(with confirmation) instead of deleting a local state file.Documentation/install updates. READMEs and getting-started docs are updated to use the registry install script and to direct users to the
/consumption/qvd_statusAPI and ClickHouse tracking table for monitoring and troubleshooting.Written by Cursor Bugbot for commit e60783b. This will update automatically on new commits. Configure here.