Skip to content

Conversation

@katydidnot
Copy link
Contributor

@katydidnot katydidnot commented Sep 18, 2025

  • Created a generic command to fix a given orgs monthly aggregates using already existing archived data
  • This allows us to fix/backfill when there's an issue where we have archived link events before they were aggregated

Bug: T404879
Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c

Description

  • Created a new command for fixing aggregates if we archive before the aggregate was successful

Rationale

  • To fix LinkAggregate totals based on archived data for the months of December 2024-Sept 2025 for The Wall Street Journal
  • You can run the command using:

Monthly:
python manage.py reaggregate_link_archives --month 202503 --organisation {ORG_ID} --dir "./backup/"

Daily:
python manage.py reaggregate_link_archives --day 20250301 --organisation {ORG_ID} --dir "./backup/"

Phabricator Ticket

(https://phabricator.wikimedia.org/T404879)

How Has This Been Tested?

  • I wrote some unit tests
  • I smoke tested locally
    Since this is modifying production data, we should make sure to be extra thorough in testing and reviewing it locally.

Screenshots of your changes (if appropriate):

Types of changes

What types of changes does your code introduce? Add an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

- Created a command to fix a given orgs monthly aggregates using already existing archived data
- This fixes an issue where we have archived link events before they were aggregated

Bug: T404879
Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
@katydidnot katydidnot marked this pull request as draft September 18, 2025 21:14
- Created a command to fix a given orgs monthly aggregates using already existing archived data
- This fixes an issue where we have archived link events before they were aggregated

Bug: T404879
Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
- Created a command to fix a given orgs monthly/daily aggregates using already existing archived data
- This fixes an issue where we have archived link events before they were aggregated

Bug: T404879
Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
- Created a command to fix a given orgs monthly/daily aggregates using already existing archived data
- This fixes an issue where we have archived link events before they were aggregated

Bug: T404879
Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
@katydidnot katydidnot marked this pull request as ready for review September 24, 2025 14:34
Copy link
Member

@jsnshrmn jsnshrmn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, pending the linkevents count == 0 check that we discussed to guard against trying to load linkevents that already exist. Great work! Thanks!

- Created a command to fix a given orgs monthly/daily aggregates using already existing archived data
- This fixes an issue where we have archived link events before they were aggregated

Bug: T404879
Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
Copy link
Contributor

@suecarmol suecarmol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all of your work on this! I have a few nits and changes that need to be made before we can merge this.

- Created a command to fix a given orgs monthly/daily aggregates using already existing archived data
- This fixes an issue where we have archived link events before they were aggregated

Bug: T404879
Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
@katydidnot katydidnot marked this pull request as draft September 29, 2025 21:54
- Created a command to fix a given orgs monthly/daily aggregates using already existing archived data
- This fixes an issue where we have archived link events before they were aggregated

Bug: T404879
Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
@katydidnot katydidnot marked this pull request as ready for review September 30, 2025 18:25
Copy link
Contributor

@suecarmol suecarmol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to change the _get_existing_link_aggregates function to accept a string as a parameter so we can check if user aggregates and page project aggregates exist in the object store. Right now, it's only checking for link aggregates. Other than that, it looks good, although I haven't tested the newest iteration of the code because of problems with restoring a backup in my local environment.

Copy link
Member

@jsnshrmn jsnshrmn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved pending addressing @suecarmol's feedback

before running against march 25
image

after running against march 25
image

@jsnshrmn jsnshrmn requested a review from suecarmol October 1, 2025 20:12
- Created a command to fix a given orgs monthly/daily aggregates using already existing archived data
- This fixes an issue where we have archived link events before they were aggregated

Bug: T404879
Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
@katydidnot
Copy link
Contributor Author

We need to change the _get_existing_link_aggregates function to accept a string as a parameter so we can check if user aggregates and page project aggregates exist in the object store. Right now, it's only checking for link aggregates. Other than that, it looks good, although I haven't tested the newest iteration of the code because of problems with restoring a backup in my local environment.

This should be resolved now. I modified the prefix search param to be more generic to include all aggregate types.

Copy link
Contributor

@suecarmol suecarmol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this extensive work!

@suecarmol suecarmol merged commit 36d9894 into master Oct 2, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants