Skip to content

Updating pipeline: pln_service_bus#2429

Closed
KranthiRayipudi wants to merge 15 commits intomainfrom
Feat/2905_Refactorpln_service_busPipelinetoSupportServiceBustrigger
Closed

Updating pipeline: pln_service_bus#2429
KranthiRayipudi wants to merge 15 commits intomainfrom
Feat/2905_Refactorpln_service_busPipelinetoSupportServiceBustrigger

Conversation

@KranthiRayipudi
Copy link
Copy Markdown
Collaborator

https://pins-ds.atlassian.net/browse/THEODW-2905

Updated Pipeline Flow pln_service_bus
Initialise variables and build pipeline name

Extract entity_name
Construct: pln_service_bus - <entity_name>
Log start to Application Insights
Stage: OnProgress
Message: “Progressing to load service bus: <entity_name>”

Create tables in Standardised, Harmonised, Curated layers

Runs create_table_from_schema for all 3 layers
Includes pipeline metadata
Raw → Standardised notebook execution
Notebook: py_sb_raw_to_std
Handles success + failure logging
Standardised → Harmonised notebook execution
Notebook: py_sb_std_to_hrm
Handles success + failure logging
Load orchestration config for entity‑specific harmonised logic

Loads orchestration_ServiceBus.json
Filters matching Source_Folder = ServiceBus and Source_Frequency_Folder = <entity_name>
Extracts harmonised_notebook_name
Run configured harmonised notebook(s)

Uses Switch condition:
NOT_EMPTY → run configured notebook
NOT_EMPTY_NSIP_PROJECT → run notebooks for nsip-project
EMPTY → skip harmonised step
Log completion or failure status

On success: Stage = Completion
On failure: Stage = Fail
Switch error captured if applicable

PR Template

Note: Run the correct ADO pipeline for this PR - check the list here:
ODW Repositories

  1. JIRA Ticket Reference :

    [ Enter JIRA ticket number and Title here]

  2. Summary of the work :

    [ Enter Summary here]

  3. New Source-to-Raw Datasets

    • New source data has been added
      • A trigger has been attached at the appropriate frequency
  4. New Tables in Standardised Layer

    • New standardised tables have been created
      • orchestration.json is updated and tested in Dev, and PR is open or merged to main
      • Schema exists in odw-config/standardised-table-definitions or is about to be PRd
  5. New Tables in Harmonised or Curated Layers

    • New harmonised or curated tables have been created
      • Script is configured in the pipeline pln_post_deployments
      • Schema exists in odw-config/harmonised-table-definitions or curated-table-definitions or is about to be PRd
  6. Schema or Column Changes
    (Only new columns or columns with changed data types are in scope)

    • Changes to table structure or columns
      • py_change_table is set to run in pln_post_deployments
      • A script has been created to backfill or populate new column(s) in Test and Prod
        • Avoid dropping and recreating tables unless strictly necessary
  7. Script Execution in Build

    • Scripts have run in isolation in Build
      • Script has been added to pln_post_deployments
      • Script is now part of a scheduled pipeline with correct triggers
    • No scripts have run or no action required in Test/Prod
  8. Table Creation and Schema Validation

    • All required tables have been created
    • Schema has been validated against the requirements
  9. Deployment and Schema Change Documentation

    • Deployment steps and rollback procedures are documented
    • Schema change handling is outlined and tested
  10. Archiving Process Review

    • Automatic archiving logic has been reviewed
    • Archiving schedules and retention policies are validated

Copy link
Copy Markdown
Collaborator

@Fred83200 Fred83200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update — removing the old Service Bus polling / message-count logic is definitely the right direction now that all entities have SB triggers in place.

That said, I don’t think this PR matches the approach we discussed, so I don’t think it should be merged in its current state.

Main concern

We agreed not to use the orchestration file for this routing, and instead to simplify the pipeline by replacing the large set of IfConditions with a single Switch on entity_name.

This PR does not do that. Instead it:

  • adds a lookup to orchestration_ServiceBus.json
  • filters metadata to find a notebook name
  • introduces a Switch based on EMPTY / NOT_EMPTY / NOT_EMPTY_NSIP_PROJECT
  • still keeps nsip-project as a hardcoded special case anyway

So this is not really the clean Switch-based routing we discussed. It adds extra complexity and introduces metadata dependency into the pipeline when the intention was to keep the routing explicit and simple inside the pipeline itself.

Orchestration file point:
We also discussed not adding anything to the existing orchestration file.

The current orchestration file is already being used for broader source / ingestion metadata, and we didn’t want to start mixing in extra pipeline-routing metadata there as well.

If we ever decide to go down the orchestration-driven route in future, then it should be done via a separate orchestration/config file built specifically for this pipeline, rather than extending the current shared orchestration metadata and mixing responsibilities together.

So even if we were to accept metadata-driven routing later, it still shouldn’t be implemented by adding more logic to the current orchestration structure.

Why this is a problem
1 - The Switch is not routing by entity

  • The case names (EMPTY, NOT_EMPTY, NOT_EMPTY_NSIP_PROJECT) are not clear processing routes.
  • We wanted a Switch on the actual entity, which would be much easier to read and maintain.
    2 - The orchestration file should not be involved here
  • This adds unnecessary lookup/filter/set-variable steps.
  • It also creates another place where the pipeline can silently break if metadata is missing or inconsistent.
    3 - nsip-project is still hardcoded separately
  • If we are moving to a Switch-based approach, then the entity routing should be explicit and consistent for all entities.

What I think the pipeline should do instead:

After the generic notebooks, the pipeline should go to a single Switch activity on entity_name, for example:

service-user → py_sb_horizon_harmonised_service_user
nsip-project → py_sb_horizon_harmonised_nsip_project then py_sb_harmonised_nsip_invoice and py_sb_harmonised_nsip_meeting
nsip-representation → py_sb_horizon_harmonised_nsip_representation
nsip-document → py_sb_horizon_harmonised_nsip_document
nsip-exam-timetable → py_sb_horizon_harmonised_nsip_exam_timetable
s51-advice → py_sb_horizon_harmonised_nsip_s51_advice
appeal-document → py_sb_horizon_harmonised_appeal_document
appeal-event → py_sb_horizon_harmonised_appeal_event
appeal-has → py_sb_horizon_harmonised_appeal_has
appeal-s78 → py_sb_horizon_harmonised_appeal_s78
default → generic py_sb_std_to_hrm

That would:

  • remove the need for the orchestration lookup entirely
  • make the pipeline easier to understand
  • match the approach we agreed
  • keep all entity routing in one visible place

Other cleanup points:
There are also still a few leftover items from the old design that should be removed if we are refactoring this pipeline properly, for example:

  • Number_of_messages
  • any old naming/messages that still refer to checking/loading Service Bus rather than processing already-landed data

Suggested action:
I’d recommend reworking this PR so it:

  • removes the orchestration lookup/filter/variable steps
  • replaces them with one Switch directly on entity_name
  • routes each entity explicitly
  • uses default branch for generic

At the moment this feels like a partial redesign in a different direction than what we agreed.

@KranthiRayipudi
Copy link
Copy Markdown
Collaborator Author

@frederic Jonquieres Thanks for the feedback. I wanted to clarify one point: the use of the orchestration lookup was an intentional design decision. After discussing the options with Ram, we agreed to keep the orchestration lookup because it will support our future maintenance strategy. It provides a central place to manage notebook routing as the number of entities grows, rather than hardcoding the routing logic directly into the pipeline.
During that meeting, Ramana Varagani also mentioned that if this approach didn’t take too much time to implement, we should proceed with it — which is why I created the ticket. He’s happy with this direction, as it aligns the pipeline with the longer term architecture we’re aiming for.

@Fred83200
Copy link
Copy Markdown
Collaborator

Fred83200 commented Mar 30, 2026

@frederic Jonquieres Thanks for the feedback. I wanted to clarify one point: the use of the orchestration lookup was an intentional design decision. After discussing the options with Ram, we agreed to keep the orchestration lookup because it will support our future maintenance strategy. It provides a central place to manage notebook routing as the number of entities grows, rather than hardcoding the routing logic directly into the pipeline. During that meeting, Ramana Varagani also mentioned that if this approach didn’t take too much time to implement, we should proceed with it — which is why I created the ticket. He’s happy with this direction, as it aligns the pipeline with the longer term architecture we’re aiming for.

Thanks for the clarification, but I want to be clear that we did not agree to use the existing orchestration file.

What we discussed were two possible approaches:

1 - Use a Switch on entity_name

  • quick
  • simple
  • easy to implement
  • keeps the routing explicit in the pipeline

2 - If we wanted to go down the orchestration/config route

  • create a new config/orchestration file specifically for this pipeline
  • keep it separate from the existing shared orchestration metadata
  • avoid mixing pipeline routing concerns with other source metadata such as SAP HR and the rest of the current orchestration content

That is not what has been done here.

In this PR, routing has been added by using the existing orchestration file, which is specifically what we said not to do.

So just to be clear, the two agreed options were:

  • a Switch
  • or a new dedicated orchestration/config file for this pipeline

At the moment, this implementation is neither of those, so I don’t think this PR can be merged in its current form.

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).
4 pipeline(s) were filtered out due to trigger conditions.

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).
4 pipeline(s) were filtered out due to trigger conditions.

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).
4 pipeline(s) were filtered out due to trigger conditions.

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).
4 pipeline(s) were filtered out due to trigger conditions.

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).
4 pipeline(s) were filtered out due to trigger conditions.

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).
4 pipeline(s) were filtered out due to trigger conditions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants