-
Notifications
You must be signed in to change notification settings - Fork 10
Backfill missing Wall Street Journal aggregate data #471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Created a command to fix a given orgs monthly aggregates using already existing archived data - This fixes an issue where we have archived link events before they were aggregated Bug: T404879 Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
- Created a command to fix a given orgs monthly aggregates using already existing archived data - This fixes an issue where we have archived link events before they were aggregated Bug: T404879 Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
- Created a command to fix a given orgs monthly/daily aggregates using already existing archived data - This fixes an issue where we have archived link events before they were aggregated Bug: T404879 Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
- Created a command to fix a given orgs monthly/daily aggregates using already existing archived data - This fixes an issue where we have archived link events before they were aggregated Bug: T404879 Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
jsnshrmn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, pending the linkevents count == 0 check that we discussed to guard against trying to load linkevents that already exist. Great work! Thanks!
- Created a command to fix a given orgs monthly/daily aggregates using already existing archived data - This fixes an issue where we have archived link events before they were aggregated Bug: T404879 Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
suecarmol
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all of your work on this! I have a few nits and changes that need to be made before we can merge this.
extlinks/aggregates/management/commands/reaggregate_link_archives.py
Outdated
Show resolved
Hide resolved
extlinks/aggregates/management/commands/reaggregate_link_archives.py
Outdated
Show resolved
Hide resolved
- Created a command to fix a given orgs monthly/daily aggregates using already existing archived data - This fixes an issue where we have archived link events before they were aggregated Bug: T404879 Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
- Created a command to fix a given orgs monthly/daily aggregates using already existing archived data - This fixes an issue where we have archived link events before they were aggregated Bug: T404879 Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to change the _get_existing_link_aggregates function to accept a string as a parameter so we can check if user aggregates and page project aggregates exist in the object store. Right now, it's only checking for link aggregates. Other than that, it looks good, although I haven't tested the newest iteration of the code because of problems with restoring a backup in my local environment.
jsnshrmn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved pending addressing @suecarmol's feedback
- Created a command to fix a given orgs monthly/daily aggregates using already existing archived data - This fixes an issue where we have archived link events before they were aggregated Bug: T404879 Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
This should be resolved now. I modified the prefix search param to be more generic to include all aggregate types. |
suecarmol
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this extensive work!


Bug: T404879
Change-Id: Ia9805aee9f7dc9a707df3b50847f34bd07401b6c
Description
Rationale
Monthly:
python manage.py reaggregate_link_archives --month 202503 --organisation {ORG_ID} --dir "./backup/"Daily:
python manage.py reaggregate_link_archives --day 20250301 --organisation {ORG_ID} --dir "./backup/"Phabricator Ticket
(https://phabricator.wikimedia.org/T404879)
How Has This Been Tested?
Since this is modifying production data, we should make sure to be extra thorough in testing and reviewing it locally.
Screenshots of your changes (if appropriate):
Types of changes
What types of changes does your code introduce? Add an
xin all the boxes that apply: