Skip to content

Added tiling for pending posts and uploads#2

Closed
clragon wants to merge 10 commits intomainfrom
tiling
Closed

Added tiling for pending posts and uploads#2
clragon wants to merge 10 commits intomainfrom
tiling

Conversation

@clragon
Copy link
Copy Markdown
Member

@clragon clragon commented Dec 26, 2025

No description provided.

@clragon clragon force-pushed the tiling branch 2 times, most recently from aab528c to 6653474 Compare December 28, 2025 14:28
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a tiling system for pending posts and uploads to improve query performance. The changes include:

  • Implementation of hourly tiles for upload counts and pending post counts
  • Enhancement of the job service to support multiple independent queues for parallel processing
  • Addition of a health monitoring system for tile coverage and status
  • Database migration to add updated_at tracking to manifests

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/upload/tiles/upload-tiles.entity.ts Defines the entity for storing hourly upload tile data
src/upload/tiles/upload-tiles.service.ts Implements service for generating and managing upload tiles
src/upload/tiles/upload-tiles.worker.ts Cron worker that periodically generates upload tiles
src/upload/tiles/upload-tiles.module.ts Module configuration for upload tiles feature
src/upload/metric/upload-metric.service.ts Updated to use tiles for count queries, with separate method for per-uploader counts
src/upload/metric/upload-metric.module.ts Added UploadTilesEntity to TypeORM imports
src/upload/metric/upload-metric.controller.ts Routes upload count queries based on whether uploaderId is specified
src/upload/upload.module.ts Imports the new UploadTilesModule
src/post/tiles/post-pending-tiles.entity.ts Defines the entity for storing hourly pending post tile data
src/post/tiles/post-pending-tiles.service.ts Implements complex SQL query to calculate pending post counts over time
src/post/tiles/post-pending-tiles.worker.ts Cron worker that periodically generates pending post tiles
src/post/tiles/post-pending-tiles.module.ts Module configuration for pending post tiles feature
src/post/metric/post-metric.service.ts Simplified pending series query by using pre-computed tiles
src/post/metric/post-metric.module.ts Added PostPendingTilesEntity to TypeORM imports
src/post/post.module.ts Imports the new PostPendingTilesModule
src/job/job.service.ts Enhanced to support multiple independent queues for parallel job processing
src/job/job.entity.ts Added optional queue property to Job entity
src/health/tiles/tile-health.utils.ts Utility functions for generating tile health slices
src/health/tiles/tile-health.service.ts Service for monitoring tile coverage and health across tile types
src/health/tiles/tile-health.module.ts Module configuration for tile health monitoring
src/health/tiles/tile-health.dto.ts DTOs for tile health responses
src/health/tiles/tile-health.controller.ts REST endpoints for viewing and managing tile health
src/health/tiles/index.ts Exports for tile health module
src/health/health.module.ts Imports the new TileHealthModule
src/common/tile.ts Core tile utilities including range calculation and missing tile detection
src/common/index.ts Exports the new tile utilities and cron expressions
src/common/date/date-buckets.ts Added tile-specific count aggregation functions
src/common/cron.ts Defines custom cron expressions for scheduled jobs
src/migration/1763760586158-AddManifestUpdatedAt.ts Adds updated_at column to manifests table for tracking data freshness

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +83 to +114
const query = `
WITH pending_posts AS (
SELECT
post_version.post_id,
date_trunc('hour', post_version.updated_at) AS upload_hour,
COALESCE(
date_trunc('hour', MIN(approval_event.created_at)),
date_trunc('hour', MIN(deletion_event.created_at)),
date_trunc('hour', NOW()) + interval '1 hour'
) AS handled_hour
FROM post_versions post_version
LEFT JOIN post_events approval_event
ON post_version.post_id = approval_event.post_id
AND approval_event.action = 'approved'
AND approval_event.created_at >= $3
AND approval_event.created_at < $4
LEFT JOIN post_events deletion_event
ON post_version.post_id = deletion_event.post_id
AND deletion_event.action = 'deleted'
AND deletion_event.created_at >= $3
AND deletion_event.created_at < $4
LEFT JOIN permits permit
ON post_version.post_id = permit.id
AND permit.created_at >= $3
AND permit.created_at < $4
WHERE post_version.version = 1
AND post_version.updated_at >= $3
AND post_version.updated_at < $4
AND permit.id IS NULL
AND (approval_event.created_at IS NULL OR approval_event.created_at > $1)
AND (deletion_event.created_at IS NULL OR deletion_event.created_at > $1)
GROUP BY post_version.post_id, upload_hour
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This complex SQL query with multiple LEFT JOINs and filtering conditions could have performance issues, especially on large datasets. The query joins post_events twice (for approval and deletion) and permits, each with date range filters. Consider adding appropriate indexes on post_events.post_id, post_events.action, post_events.created_at, and permits.id to optimize these joins. Additionally, the GROUP BY on line 114 after filtering could benefit from reviewing the query execution plan.

Copilot uses AI. Check for mistakes.
(intervalHours * 60 * 60 * 1000),
);
available = Math.max(0, expectedInSlice - unavailable);
const none = tilesPerSlice - (available + unavailable);
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for counting available vs unavailable tiles appears to be inverted. The allTimes parameter contains missing tiles, and these are being counted as unavailable (line 46). However, at line 55, available is calculated as expectedInSlice - unavailable, which would mean available tiles are those that are expected minus the missing ones. This is semantically confusing. Then at line 56, none is calculated from tilesPerSlice, which is a different basis than expectedInSlice. The relationship between available, unavailable, and none is unclear and may not sum correctly to the expected total.

Suggested change
const none = tilesPerSlice - (available + unavailable);
const none = Math.max(0, expectedInSlice - (available + unavailable));

Copilot uses AI. Check for mistakes.
Comment on lines +102 to +103
this.processingFlags.delete(queueKey);
this.queues.delete(queueKey);
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting the queue and processing flag immediately after processing completes (lines 102-103) could lead to race conditions. If a new job is added to the same queue between when the while loop exits and when these deletions occur, the new job's queue will be deleted, and it will never be processed. Consider keeping the queue structure in place even when empty, or using more sophisticated synchronization.

Suggested change
this.processingFlags.delete(queueKey);
this.queues.delete(queueKey);
// Mark processing as finished but keep the queue entry to avoid
// deleting a queue that may have received new jobs concurrently.
this.processingFlags.set(queueKey, false);

Copilot uses AI. Check for mistakes.
import { TileSlice } from './tile-health.dto';

export interface TileSliceProps {
allTimes: { time: Date }[];
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter name allTimes is misleading. Based on the usage in tile-health.service.ts line 66, this parameter receives missing tiles, not all tiles. Consider renaming to missingTimes to accurately reflect what data is being passed.

Copilot uses AI. Check for mistakes.
@clragon clragon closed this Dec 28, 2025
@clragon clragon deleted the tiling branch December 28, 2025 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants