MODSOURMAN-1395: Optimize getJobLogEntries query#1015
MODSOURMAN-1395: Optimize getJobLogEntries query#1015Aliaksandr-Fedasiuk wants to merge 2 commits intomasterfrom
Conversation
|
There was a problem hiding this comment.
A record with action_type='CREATE' that errored becomes DISCARDED. Pre-pagination puts the order near the front(because 'CREATE') then the final sort puts it in the middle(because 'DISCARDED'). The sort order is [CREATED, DISCARDED, UPDATED]. Users will see odd results.
There is an OR condition that was removed in originalSQL for the rec_titles subquery:
OR (journal_records.entity_type IN ('MARC_AUTHORITY', 'MARC_BIBLIOGRAPHIC', 'MARC_HOLDINGS') AND journal_records.title IS NOT NULL). This OR condition in the master branch has a bug due to missing parenthesis. topSQL has a corrected version of the OR condition. The removal means that MARC records with titles but no entity_id (and non-error, non-non-match status) lose their titles on the originalSQL path.
This function is very difficult to comprehend and debug. I suggest we go with the incremental summary table solution we discussed at our weekly sync. The solution should follow this direction:
- Maintain one summary row per
(job_execution_id, source_id)and upsert on every journal-record insert. - if possible, recompute only affected fields
- Add some "last updated date" and the sort keys in the summary row.
This will hurt write performance for journal-record insert but might be acceptable. An alternative is async processing but then we need a background job and possibly coordination between background job workers(our current pattern is that each SRM instance will have a background job worker).
Contact me outside this PR if further discussion is needed.



Purpose
Optimization get_job_log_entries function
Approach
The limit operator is used as early as possible and windows functions was added
Learn
MODSOURMAN-1395