Skip to content

MODSOURMAN-1395: Optimize getJobLogEntries query#1015

Open
Aliaksandr-Fedasiuk wants to merge 2 commits intomasterfrom
MODSOURMAN-1395
Open

MODSOURMAN-1395: Optimize getJobLogEntries query#1015
Aliaksandr-Fedasiuk wants to merge 2 commits intomasterfrom
MODSOURMAN-1395

Conversation

@Aliaksandr-Fedasiuk
Copy link
Contributor

Purpose

Optimization get_job_log_entries function

Approach

The limit operator is used as early as possible and windows functions was added

Learn

MODSOURMAN-1395

@sonarqubecloud
Copy link

Copy link
Contributor

@okolawole-ebsco okolawole-ebsco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A record with action_type='CREATE' that errored becomes DISCARDED. Pre-pagination puts the order near the front(because 'CREATE') then the final sort puts it in the middle(because 'DISCARDED'). The sort order is [CREATED, DISCARDED, UPDATED]. Users will see odd results.

There is an OR condition that was removed in originalSQL for the rec_titles subquery:
OR (journal_records.entity_type IN ('MARC_AUTHORITY', 'MARC_BIBLIOGRAPHIC', 'MARC_HOLDINGS') AND journal_records.title IS NOT NULL). This OR condition in the master branch has a bug due to missing parenthesis. topSQL has a corrected version of the OR condition. The removal means that MARC records with titles but no entity_id (and non-error, non-non-match status) lose their titles on the originalSQL path.

This function is very difficult to comprehend and debug. I suggest we go with the incremental summary table solution we discussed at our weekly sync. The solution should follow this direction:

  • Maintain one summary row per (job_execution_id, source_id) and upsert on every journal-record insert.
  • if possible, recompute only affected fields
  • Add some "last updated date" and the sort keys in the summary row.

This will hurt write performance for journal-record insert but might be acceptable. An alternative is async processing but then we need a background job and possibly coordination between background job workers(our current pattern is that each SRM instance will have a background job worker).

Contact me outside this PR if further discussion is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants