Fix QVD pipeline installation instructions to use registry install sc… by 514Ben · Pull Request #190 · 514-labs/registry

514Ben · 2026-02-03T17:40:17Z

This PR contains two sets of improvements to the QVD to ClickHouse pipeline:

1. Fix Installation Instructions (Original)

Updated installation instructions across all QVD to ClickHouse pipeline READMEs to use the correct registry install command with --type pipeline flag instead of manual directory navigation.

2. Remove JSON State File Tracking (New)

Removes the JSON-based FileTracker and consolidates on ClickHouseFileTracker as the sole tracking mechanism for the QVD pipeline.

Changes:

Enhanced ClickHouseFileTracker: Implemented get_status_summary(), list_errors(), reset_state() methods with actual ClickHouse queries
Updated qvd_sync.py: Switched from FileTracker to ClickHouseFileTracker with proper method calls and timing tracking
Removed config fields: Deleted state_file and use_clickhouse_tracking from QvdConfig
Updated CLI tool: Modified init_qvd.py to use ClickHouse tracking and removed --state-file argument
Cleanup: Deleted app/utils/file_tracker.py and removed .qvd_state.json from .gitignore
Documentation: Updated all docs to reference ClickHouse tracking instead of JSON files

Benefits:

Single source of truth for tracking state
Better querying capabilities via SQL
Complete audit trail in database
No local file dependencies
Real-time status via API endpoint

Testing Checklist

Unit tests for all public methods
Integration tests with mock servers
Test reset command: python init_qvd.py --reset-state --force
Test workflow runs successfully with ClickHouse tracking
Test API endpoint: curl http://localhost:4000/consumption/qvd_status
Test change detection (modify file mtime and verify reprocessing)
Verify tracking records appear in ClickHouse table

Note

Medium Risk
Changes the pipeline’s incremental tracking and reset behavior from a local JSON file to ClickHouse queries/table truncation, which can affect reprocessing and operational safety if misused.

Overview
Switches incremental file tracking to ClickHouse-only. Removes the JSON-based FileTracker/.qvd_state.json flow (including related config/env flags) and updates qvd_sync to record processing/completed/failed states with row counts and durations.

Makes ClickHouse tracking more operational. ClickHouseFileTracker now queries ClickHouse for status summaries and failures, and init_qvd.py --reset-state truncates local.QvdFileTracking (with confirmation) instead of deleting a local state file.

Documentation/install updates. READMEs and getting-started docs are updated to use the registry install script and to direct users to the /consumption/qvd_status API and ClickHouse tracking table for monitoring and troubleshooting.

^{Written by Cursor Bugbot for commit e60783b. This will update automatically on new commits. Configure here.}

…ript Updated installation instructions across all QVD to ClickHouse pipeline READMEs to use the correct registry install command with --type pipeline flag instead of manual directory navigation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

vercel · 2026-02-03T17:40:23Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
registry	Ready	Preview, Comment	Feb 3, 2026 6:09pm

boreal-cloud · 2026-02-03T17:40:43Z

The latest updates on your projects. Learn more about Boreal.

Project	Deployment	Logs	Base URL	Updated (UTC)
registry	❌ Error	Logs	`https://514-demos-registry-fix-qvd-p-87eec.boreal.cloud`	2026-02-03 18:09:11

Changed from showing shell commands to showing actual .env file contents with proper newlines for better readability. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

linear · 2026-02-03T17:43:00Z

ENG-2069 Create a reusable registry pipeline to import QVD files from disk or s3.

Separated model generation commands into distinct sections with headers for better readability and proper line breaks. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Separated all troubleshooting commands into distinct sections with headers for better readability and proper line breaks. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Separated each environment variable into its own code block under filtering and processing categories for better readability. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Separated each command into its own code block with descriptive header for better readability and proper line breaks. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

This commit removes the JSON-based FileTracker and consolidates on ClickHouseFileTracker as the sole tracking mechanism for the QVD pipeline. Changes: - Implement get_status_summary(), list_errors(), reset_state() in ClickHouseFileTracker - Update qvd_sync.py to use ClickHouseFileTracker with proper method calls - Remove state_file and use_clickhouse_tracking fields from QvdConfig - Update init_qvd.py CLI tool to use ClickHouse tracking - Delete app/utils/file_tracker.py - Remove .qvd_state.json from .gitignore - Update all documentation to reference ClickHouse tracking instead of JSON files Benefits: - Single source of truth for tracking state - Better querying capabilities via SQL - Complete audit trail in database - No local file dependencies - Real-time status via API endpoint Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-03T18:14:28Z

pipeline-registry/qvd_to_clickhouse/v1/514-labs/python/default/app/workflows/qvd_sync.py

            print(f"  ✗ Error processing {file_name}: {e}")
-            tracker.mark_error(file_path, str(e))
+            processing_duration = time.time() - start_time
+            tracker.mark_failed(file_path, file_info, table_name, str(e), processing_duration)


Undefined table_name in exception handler causes crash

High Severity

The exception handler at line 123 references table_name, but this variable is first assigned inside the try block at line 100. If derive_table_name() or any earlier code in the try block raises an exception, table_name won't be defined, causing a NameError that crashes the error handling instead of properly recording the failure. The old code used mark_error() which didn't require table_name, but the new mark_failed() method does.

cursor · 2026-02-03T18:14:28Z

pipeline-registry/qvd_to_clickhouse/v1/514-labs/python/default/app/workflows/qvd_sync.py

        print(f"\n[{idx}/{len(files_to_process)}] Processing: {file_name}")

+        start_time = time.time()
+        file_info = reader.get_file_info(file_path)


Uncaught exception from get_file_info outside try block

Medium Severity

The call to reader.get_file_info(file_path) at line 96 is now outside the try block, whereas previously it was inside. If this call fails (network timeout, permission denied, file deleted between listing and processing), the exception propagates uncaught. This bypasses error tracking entirely and may terminate the sync loop, leaving remaining files unprocessed. The change was made to provide file_info to the error handler, but it now lacks exception handling of its own.

vercel bot deployed to Preview February 3, 2026 17:40 View deployment

Fix .env file display in getting started guide

6f29dda

Changed from showing shell commands to showing actual .env file contents with proper newlines for better readability. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

514Ben and others added 3 commits February 3, 2026 12:43

Fix Step 4 command display in getting started guide

298b9f2

Separated model generation commands into distinct sections with headers for better readability and proper line breaks. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Fix command display in troubleshooting sections

cf4db31

Separated all troubleshooting commands into distinct sections with headers for better readability and proper line breaks. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Fix additional options display in Step 2

85e9566

Separated each environment variable into its own code block under filtering and processing categories for better readability. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

vercel bot deployed to Preview February 3, 2026 17:46 View deployment

Fix Common Commands section display

4c611ef

Separated each command into its own code block with descriptive header for better readability and proper line breaks. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

vercel bot deployed to Preview February 3, 2026 17:54 View deployment

vercel bot deployed to Preview February 3, 2026 18:09 View deployment

514Ben merged commit 4e68786 into main Feb 3, 2026
7 checks passed

cursor bot reviewed Feb 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix QVD pipeline installation instructions to use registry install sc…#190

Fix QVD pipeline installation instructions to use registry install sc…#190
514Ben merged 7 commits intomainfrom
fix-qvd-pipeline-instructions

514Ben commented Feb 3, 2026 •

edited by cursor bot

Loading

Uh oh!

vercel bot commented Feb 3, 2026 •

edited

Loading

Uh oh!

boreal-cloud bot commented Feb 3, 2026 •

edited

Loading

Uh oh!

linear bot commented Feb 3, 2026

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 3, 2026

Uh oh!

cursor bot Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

514Ben commented Feb 3, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Fix Installation Instructions (Original)

2. Remove JSON State File Tracking (New)

Changes:

Benefits:

Testing Checklist

Uh oh!

vercel bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

boreal-cloud bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linear bot commented Feb 3, 2026

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 3, 2026

Choose a reason for hiding this comment

Undefined table_name in exception handler causes crash

Uh oh!

cursor bot Feb 3, 2026

Choose a reason for hiding this comment

Uncaught exception from get_file_info outside try block

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

514Ben commented Feb 3, 2026 •

edited by cursor bot

Loading

vercel bot commented Feb 3, 2026 •

edited

Loading

boreal-cloud bot commented Feb 3, 2026 •

edited

Loading

Undefined `table_name` in exception handler causes crash

Uncaught exception from `get_file_info` outside try block