Skip to content

Optimised Stream and Dashboard creator job , added logging mechanism#159

Open
prashanthShiksha wants to merge 22 commits intoELEVATE-Project:release-3.1.0from
prashanthShiksha:release-3.1.0-combined
Open

Optimised Stream and Dashboard creator job , added logging mechanism#159
prashanthShiksha wants to merge 22 commits intoELEVATE-Project:release-3.1.0from
prashanthShiksha:release-3.1.0-combined

Conversation

@prashanthShiksha
Copy link
Contributor

@prashanthShiksha prashanthShiksha commented Feb 16, 2026

Summary by CodeRabbit

  • New Features

    • Combined Dashboard Creator: automated Metabase dashboard generation for projects, surveys, observations, users and mentoring.
    • Combined Stream Processor: real-time ingestion and processing pipeline that persists events and emits dashboard updates.
    • Multi-tenant support with configurable topics, dashboards and parameter mappings.
  • Documentation

    • Added comprehensive logging configurations and runtime/test configs.
  • Tests

    • Added extensive test fixtures, sources, sinks and integration specs to validate end-to-end flows.
  • Chores

    • New Maven modules wired into the build.

@coderabbitai
Copy link

coderabbitai bot commented Feb 16, 2026

📝 Walkthrough

Walkthrough

Adds two new Flink-based modules: a Combined Dashboard Creator (Metabase orchestration) and a Combined Stream Processor (real-time ingestion). Both include new domain models, Flink ProcessFunctions, Postgres integration, Kafka wiring, Maven modules, extensive logging, and numerous test fixtures and specs.

Changes

Cohort / File(s) Summary
Repo / Build & infra
Dockerfile, .gitignore, pom.xml
Updated Docker JAVA_HOME setup; minor .gitignore touch; removed scala-tools repo and added jar/shade plugins in root POM.
Jobs core & Kafka connector
jobs-core/pom.xml, jobs-core/src/main/resources/base-config.conf, jobs-core/src/main/scala/.../BaseJobConfig.scala, jobs-core/src/main/scala/.../FlinkKafkaConnector.scala
Added plexus-utils dependency; switched base-config to kafka: kafka:9092 and added consumer props; BaseJobConfig reads optional groupId and optional max poll settings; FlinkKafkaConnector gained kafkaJobRequestSource overload accepting groupId.
Global logging config
Documentation/log-conf-file/log4j2.properties
New comprehensive Log4j2 configuration covering per-dashboard rolling appenders and retention policies.
Combined Dashboard Creator module (new)
metabase-jobs/combined-dashboard-creator/**, metabase-jobs/pom.xml
New Maven module with Scala/Flink dependencies, many new source files: configs, CombinedDashboardCreatorConfig/Task, extensive Metabase dashboard functions (project, survey, observation, mentoring, user, userMapping), utilities (Utils, UpdateParameters), JSON processors, and many tests/fixtures.
Dashboard creator domain models
metabase-jobs/combined-dashboard-creator/src/main/scala/.../domain/*.scala
Added event classes: MentoringEvent, ObservationEvent, ProjectEvent, SurveyEvent, UserEvent, UserMappingEvent (typed accessors).
Dashboard creator processing & helpers
.../functions/*/**, .../task/*
Many new ProcessFunction implementations (ProjectMetabaseDashboardFunction, SurveyMetabaseDashboardFunction, ObservationMetabaseDashboardFunction, MentoringMetabaseDashboardFunction, UserMetabaseDashboardFunction, ProgramServiceFunction, UserServiceFunction) plus dozens of helper objects to build/clean/update Metabase JSON, create collections/dashboards/tabs/cards, update parameters, and persist metadata.
Dashboard creator tests
metabase-jobs/combined-dashboard-creator/src/test/**
Large set of test fixtures, event sources, sinks, and ScalaTest specs using Flink MiniCluster for integration-style tests.
Combined Stream Processor module (new)
stream-jobs/combined-stream-processor/**, stream-jobs/pom.xml
New Maven module for real-time ingestion: UnifiedStreamConfig/Task, domain models (MentoringEvent, ObservationEvent, ProjectEvent, SurveyEvent, UserEvent), five stream ProcessFunctions (MentoringStreamFunction, ObservationStreamFunction, ProjectStreamFunction, SurveyStreamFunction, UserStreamFunction), many Postgres DDL strings, utilities, and tests/fixtures/sources/sinks.
Stream processor configs & logging
stream-jobs/combined-stream-processor/src/main/resources/combined-stream.conf, .../log4j2.properties, .../test.conf
New stream config and test config, and Log4j2 files for per-stream appenders and retention.
Tests & fixtures for stream processor
stream-jobs/.../src/test/...
Many test fixtures, event sources, sinks, and ScalaTest specs mirroring the new stream functions.
Areas needing focused review
(multiple files across modules)
Metabase API interactions and JSON manipulations (many helpers); SQL DDL and upsert queries; concurrency/resource handling in Flink open/close; added test scaffolding; Docker JAVA setup and environment export; kafka consumer property defaults and effects on consumers.

Sequence Diagram(s)

sequenceDiagram
    participant Kafka as Kafka (Events)
    participant Flink as Flink Processor
    participant Postgres as PostgreSQL
    participant Metabase as Metabase API
    participant KafkaOut as Kafka (Dashboard Events)

    Kafka->>Flink: Event (Project/Survey/Observation/User/Mentoring)
    Note over Flink: Parse event → domain model
    Flink->>Postgres: Upsert domain tables / fetch report_config
    Postgres-->>Flink: Config / metadata
    alt Needs dashboard creation
        Flink->>Metabase: Check/Create collection/dashboard/tabs
        Metabase-->>Flink: IDs
        loop For each report row
            Flink->>Metabase: Create/Update question card
            Metabase-->>Flink: card_id
            Flink->>Metabase: Update dashboard (append card)
        end
        Flink->>Postgres: Update dashboard metadata
        Flink->>KafkaOut: Publish dashboard event
    end
Loading
sequenceDiagram
    participant FlinkIn as Flink Stream Processor
    participant Postgres as PostgreSQL
    participant Metabase as Metabase API

    FlinkIn->>Postgres: Ensure tenant tables / insert status/questions
    Postgres-->>FlinkIn: OK
    FlinkIn->>Postgres: Query filter existence
    Postgres-->>FlinkIn: filter present?
    alt filter present
        FlinkIn->>Kafka: Emit dashboard event (to dashboard-creator)
    else
        FlinkIn-->>FlinkIn: Skip dashboard publish
    end
    Kafka->>DashboardCreator: Dashboard event triggers creator job (see previous flow)
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • Vivek-M-08

Poem

"I'm a busy rabbit, hopping through code and logs,
I stitched dashboards, cards, and many config frogs.
With hops and thumps I pushed events through the night,
Now dashboards bloom and log files glow bright.
🐇✨ Hooray — new streams and Metabase in sight!"

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title partially relates to the changeset but is vague and generic, using non-descriptive phrases like 'Optimised' without specifying what was optimized or how. Replace with a more specific and concise title that clearly highlights the primary changes, such as 'Add Combined Dashboard Creator and Stream Processor jobs with logging' or 'Implement mentoring, observation, project, survey, and user event processing pipelines'.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into release-3.1.0

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@prashanthShiksha prashanthShiksha changed the title Release 3.1.0 combined Optimised Stream and Dashboard creator job , added logging mechanism Feb 16, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 51

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
jobs-core/src/main/resources/base-config.conf (1)

11-20: ⚠️ Potential issue | 🟠 Major

Distributed checkpointing enabled but state backend points to a local filesystem.

enable.distributed.checkpointing = true on Line 11 expects a shared/distributed storage backend (e.g., HDFS, S3, Azure Blob). However, base.url on Line 20 is set to file:///opt/flink/checkpoints, which is a local path. In a multi-TaskManager Flink deployment, each TaskManager has its own local filesystem, so checkpoints written by one TM won't be visible to others — checkpoint recovery will fail.

If this is intentional for local dev only, set enable.distributed.checkpointing = false. Otherwise, restore the distributed storage URL (e.g., the previous wasbs:// path) for the release branch.

🤖 Fix all issues with AI agents
In
`@metabase-jobs/combined-dashboard-creator/src/main/resources/combined-dashboard-creator.conf`:
- Around line 82-107: The config contains hardcoded secrets (postgres.password,
metabase.username, metabase.password, metabase.metabaseApiKey) which must be
replaced with environment-driven values or secret references; update the entries
for postgres.password, metabase.username, metabase.password and metabaseApiKey
to use environment variable placeholders (e.g., optional HOCON env substitution)
or a secret manager reference, and provide non-working placeholder defaults so
no real credentials are committed, keeping the same keys (postgres, metabase)
and table names (tables.*) intact.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/domain/UserMappingEvent.scala`:
- Line 5: UserMappingEvent's constructor parameters partition and offset must
override the parent JobRequest's vals; update the class signature for
UserMappingEvent to declare override val partition: Int and override val offset:
Long (matching the pattern used in ObservationEvent) so the subclass properly
overrides JobRequest's members and compiles.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/mentoring/MentoringMetabaseDashboardFunction.scala`:
- Around line 126-150: The bug is orgId.toInt can throw NumberFormatException
when orgId is empty in createOverviewTabInsideTenantDashboard (and similarly in
createComparisionTabInsideTenantDashboard) because
createCollectionAndDashboardForTenant calls these before the orgId.nonEmpty
guard; fix by defensively parsing orgId once (e.g., val maybeOrgId = if
(orgId.nonEmpty) Some(orgId.toInt) else None) in the caller or at the start of
createOverviewTabInsideTenantDashboard/createComparisionTabInsideTenantDashboard
and pass an Option[Int] (or a safe default like 0) into
ProcessTenantConstructor.ProcessAndUpdateJsonFiles and any other uses instead of
calling orgId.toInt directly; update all references to orgId.toInt to use the
safe parsed value or handle None accordingly.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/mentoring/ProcessOrgConstructor.scala`:
- Around line 55-72: The pattern match on row.get("config") in
ProcessOrgConstructor.scala currently only handles case Some(queryValue:
PGobject) and can throw MatchError for None or other types; update the match to
exhaustively handle all cases by adding branches for case None (log a warning or
skip) and case Some(other) (log unexpected type with its class), ensuring the
existing logic for queryValue: PGobject remains unchanged (keep
jsonString/objectMapper.readTree and dashCards handling in the case
Some(queryValue: PGobject)).
- Around line 107-140: The method updateQuestionCardJsonValues returns a
questionCard JsonNode on success but the full configJson on error, causing the
caller updatePostgresDatabaseQuery to operate on the wrong structure; fix by
making the error path return the same shape as the success path (i.e., the
questionCard node) — in the catch block return configJson.get("questionCard")
(or null/Option if you prefer) instead of configJson, and keep the existing
logger warning; update the catch to safely handle absent questionCard (e.g.,
return null or Option) so callers of updatePostgresDatabaseQuery get a
consistent type.
- Around line 38-40: In ProcessOrgConstructor, guard the metabase card creation
and JSON parsing: wrap the metabaseUtil.createQuestionCard(...) +
mapper.readTree(...) call in a Try (or try/catch), verify the parsed JsonNode
has a non-null "id" field and that .isInt (or parseable) before calling
.asInt(), and only call questionCardId.append(cardId) when the id exists and >
0; on any failure (exception, non-JSON response, missing/invalid "id") log the
full response/error with context (including chartName and requestBody) and
skip/continue processing this row instead of letting the exception bubble or
appending a zero id so downstream dashCards are not polluted.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/mentoring/ProcessTenantConstructor.scala`:
- Around line 54-71: The match on row.get("config") in
ProcessTenantConstructor's heading branch is non-exhaustive and will throw a
MatchError if the key is absent; update the match for row.get("config") to
include a case None (or a default case) that safely handles missing config
(e.g., log a warning similar to the non-heading branch) and avoid calling
objectMapper.readTree when queryValue is absent, ensuring dashcardsArray is only
updated when a PGobject is present and parsed successfully.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/mentoring/UpdateParameters.scala`:
- Around line 63-65: The code unsafely casts dashboardJson.path("parameters") to
ArrayNode which will throw ClassCastException when the key is missing; update
the block around dashboardResponse/dashboardJson/currentParametersJson (use the
same safe pattern matching used in updateAdminParameterFunction) to match
dashboardJson.path("parameters") to an ArrayNode and handle MissingNode or other
types gracefully (e.g., treat as empty ArrayNode or bail with a clear error),
ensuring downstream logic uses the safely extracted ArrayNode.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/observation/ObservationMetabaseDashboardFunction.scala`:
- Around line 75-98: Replace string-interpolated SQL usage with
parameterized/prepared statements: stop building queries that embed
targetedSolutionId and targetedProgramId (and any event-derived values) directly
in postgresUtil.fetchData calls (e.g., the queries that reference
metaDataTable/entity_id, solutions/solution_id, metaDataTable/entity_id for
targetedProgramId, and solutions/program_id); instead use the postgresUtil
method that accepts a SQL string with placeholders and a parameter list or
create a PreparedStatement to bind targetedSolutionId/targetedProgramId (and
other event values) safely, then fetch results into resultMap/programName/orgId
the same way. Ensure all occurrences around variables targetedSolutionId,
targetedProgramId, postgresUtil.fetchData, metaDataTable, solutions, and
solutionTable are updated to use parameter binding to eliminate SQL injection
risk.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/observation/UpdateParameters.scala`:
- Line 68: In UpdateParameters.scala replace the unsafe cast of
dashboardJson.path("parameters").asInstanceOf[ArrayNode] (the
currentParametersJson binding) with a pattern-match on the JsonNode returned by
dashboardJson.path("parameters") to handle ArrayNode and non-array cases safely
(e.g., case arr: ArrayNode => arr and fallback to an empty ArrayNode or creating
one via the ObjectMapper used elsewhere); mirror the same pattern-match approach
used in project/UpdateParameters.scala so UpdateParameters.scala’s logic that
operates on currentParametersJson only runs when you have a real ArrayNode and a
safe fallback is provided.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/observation/UpdateQuestionDomainJsonFiles.scala`:
- Around line 85-100: The code in cleanDashboardJson unsafely casts path()
results (e.g., the template-tags, parameters, dashCards, and parameter_mappings
nodes) to ObjectNode/ArrayNode which will throw ClassCastException when keys are
missing; update each location (template-tags, parameters, dashCards,
parameter_mappings) to first check the node type (e.g., use
root.path(...).isObject / .isArray or use root.get(...) and null-check) before
casting or operating on it, and skip removal/iteration when the node is missing
or of the wrong type so the code safely handles MissingNode from
mapper.readTree(jsonStr).

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/observation/UpdateQuestionJsonFiles.scala`:
- Around line 130-141: The match in formatQuestionText is using
questionType.capitalize but the case patterns are lowercase so they never match;
update the match to use questionType.toLowerCase (or match against capitalized
literals) so the explicit cases fire (e.g., change the match subject from
questionType.capitalize to questionType.toLowerCase) and keep the existing case
branches (multiselect, slider, radio, number, text, date) to produce the
intended labels.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/observation/UpdateStatusJsonFiles.scala`:
- Around line 62-79: The code currently adds both the root JSON (optJsonNode)
and its "dashCards" child into dashcardsArray, causing duplicates; update the
logic around row.get("config")/queryValue: PGobject/jsonString/rootNode so you
do not call optJsonNode.foreach(dashcardsArray.add). Instead only inspect
optJsonNode (or node) and, if a non-missing/non-null dashCardsNode exists, add
that dashCardsNode to dashcardsArray (using dashCardsNode.add) and remove the
earlier root add to avoid double insertion.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/observation/UpdateWithoutRubricQuestionJsonFiles.scala`:
- Around line 216-228: In cleanDashboardJson, avoid unsafe .asInstanceOf casts
on nodes returned by root.path(...): the .path(...) chain can yield a
MissingNode and casting it to ObjectNode/ArrayNode (e.g., templateTags,
parametersPath, dashCards) will throw ClassCastException; fix by checking
node.isMissingNode (or node.isObject/isArray) before casting and only call
.asInstanceOf[ObjectNode]/[ArrayNode] when the node exists, otherwise skip
removal/iteration logic or create a safe empty ObjectNode/ArrayNode to operate
on; apply this pattern for templateTags, parametersPath, and any other
.path(...) casts in cleanDashboardJson.
- Around line 99-127: processHeading currently mutates the shared JSON nodes
returned by results("heading"), causing state leakage across invocations; to fix
it, create a deep copy of the parsed heading node immediately after reading it
(e.g., use ObjectNode/JsonNode deepCopy or re-parse node.toString into a fresh
JsonNode) and perform all modifications (updating visualization_settings.text,
col, row, dashboard_tab_id) on that copy, then add the copy to dashcardsArray
instead of the original node so the shared template in results("heading")
remains unmodified; ensure any uses of toOption(node) and subsequent logic
operate on the copied node.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/project/miDashboard/DistrictPage.scala`:
- Around line 356-358: dashboardJson.path("parameters") is being unsafely cast
to ArrayNode causing a ClassCastException when "parameters" is missing; replace
the direct cast when computing currentParametersJson with a safe check (use
isArray or instanceOf[ArrayNode]) and fall back to creating an empty ArrayNode
via objectMapper.createArrayNode() so downstream code can safely iterate; locate
the code around metabaseUtil.getDashboardDetailsById(dashboardId),
objectMapper.readTree(...), and the currentParametersJson assignment and apply
this safe pattern.
- Around line 60-69: The match on row.get("config") in DistrictPage.scala
currently only handles case Some(queryValue: PGobject) and will throw a
MatchError for None or non-PGobject values; update the match to mirror the other
branch by handling case Some(queryValue: PGobject) => (process as now), case
Some(_) => (skip or log and continue), and case None => (skip or log and
continue) before calling appendDashCardToDashboard; ensure you keep using
objectMapper.readTree on PGobject values and still call
appendDashCardToDashboard(metabaseUtil, optJsonNode, dashboardId) only when a
valid JSON node is produced.
- Around line 155-161: The SQL in DistrictPage.scala currently builds a raw
query with string interpolation (postgresUtil.fetchData(...)) using
targetedDistrictName and reportConfig; replace this by adding a parameterized
fetch method to PostgresUtil (e.g., fetchDataPrepared(sql: String, params:
Seq[Any])) and call that from the DistrictPage logic instead of fetchData,
passing targetedDistrictName as a bind parameter; additionally
validate/whitelist the reportConfig table name before interpolating it into the
SQL (reject or map unknown values) so only allowed table names are used; update
any callers/tests to use the new fetchDataPrepared and keep the query text with
a placeholder for the district name.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/project/miDashboard/HomePage.scala`:
- Around line 59-69: The match on row.get("config") in the HomePage code only
handles case Some(queryValue: PGobject) and can throw a MatchError for None or
non-PGobject values; update the match to be exhaustive by adding cases for None
and a fallback (e.g., case Some(other) => ...) that safely skips or logs
unexpected types before returning, keeping the existing behavior of parsing
using objectMapper.readTree and calling appendDashCardToDashboard with
toOption(rootNode) only when a valid PGobject is present; reference
row.get("config"), PGobject, objectMapper.readTree, toOption, and
appendDashCardToDashboard when locating the code to modify.
- Around line 348-355: The code unsafely casts dashboardJson.path("parameters")
to ArrayNode causing ClassCastException when parameters is missing or not an
array; change the logic that constructs currentParametersJson to safely handle
missing/non-array nodes (like the pattern used earlier around lines 181–184):
call dashboardJson.path("parameters"), check node.isArray (or use Option/when)
and if true convert to a Scala Iterable/ArrayNode, otherwise treat it as an
empty ArrayNode/list; then use that safe currentParametersJson when building
finalParametersJson (keep references to metabaseUtil.getDashboardDetailsById,
objectMapper.readTree, currentParametersJson, finalParametersJson,
slugNameToStateIdMap and updatedParameterJson).

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/project/miDashboard/StatePage.scala`:
- Around line 159-162: The SQL string in StatePage (the map_unique_id_query)
interpolates targetedStateName and reportConfig directly which is vulnerable to
SQL injection; change the call to use a parameterized/prepared query via
PostgresUtil (e.g., a fetchData/preparedFetch that accepts SQL with placeholders
and a Seq of parameters) instead of string interpolation, and do the same for
the other query that uses targetedStateId/updateQuery; ensure you pass
targetedStateName and reportConfig (or its safe identifier) as bound parameters
and retrieve stateMapId the same way into
visualizationSettings.put("map.region", stateMapId).
- Around line 60-68: The match on row.get("config") in StatePage is
non-exhaustive and can throw MatchError; update the match around
row.get("config") to handle None and non-PGobject cases (e.g., add a case None
and a default case _), only proceed to call objectMapper.readTree, toOption, and
appendDashCardToDashboard when you have a Some(queryValue: PGobject) with a
non-null queryValue.getValue; for other cases log or skip safely so
appendDashCardToDashboard and toOption are never called with invalid inputs.
- Line 359: The code unsafely casts dashboardJson.path("parameters") to
ArrayNode (assigned to currentParametersJson) which can be a MissingNode and
cause ClassCastException; change the assignment to safely obtain and validate
the node (e.g., use dashboardJson.get("parameters") or pattern match on
dashboardJson.path("parameters")) and handle the non-array case by providing an
empty ArrayNode or early return; ensure you reference the currentParametersJson
variable and the dashboardJson.path("parameters") expression when making the
fix.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/project/ProcessStateConstructor.scala`:
- Around line 121-126: The current extractCardId method can return Some(0)
because jsonResponse.get("id").asInt() yields 0 when "id" is missing; update
extractCardId to explicitly check for the presence and type of the "id" node
before converting it: after parsing with objectMapper.readTree(response) verify
jsonResponse.hasNonNull("id") (or jsonResponse.get("id") != null &&
jsonResponse.get("id").isInt) and only then call asInt(), otherwise return None;
keep the Try to handle parse errors and return None on exceptions.
- Around line 64-81: The pattern match on row.get("config") in
ProcessStateConstructor (the branch handling question_type == "heading" and the
case Some(queryValue: PGobject)) is non-exhaustive and can throw a MatchError;
update the match to handle the other possibilities (at minimum add case None =>
{ log a warning/info and skip processing } and case Some(other) => { log the
unexpected type and skip } ), or replace the match with a safe extraction (e.g.,
check row.get("config").collect { case q: PGobject => ... } or use
Option/isInstanceOf guards) so that non-PGobject or missing configs don’t crash
the loop; keep the existing handling for the PGobject case (queryValue.getValue,
objectMapper.readTree, toOption, dashCards checks) and ensure you log useful
context (e.g., question id/type) when skipping.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/project/ProjectMetabaseDashboardFunction.scala`:
- Line 319: ProjectMetabaseDashboardFunction currently interpolates JSON strings
(e.g., projectMetadataJson, mainMetadataJson, districtMetadataJson,
homeMetadataJson, adminMetadataJson, stateMetadataJson) directly into SQL passed
to postgresUtil.insertData, which risks SQL injection and broken queries; change
these UPDATE calls to use parameterized queries via postgresUtil (bind the JSON
as a parameter) or, if the util cannot accept params, consistently escape single
quotes on every JSON variable before interpolation (e.g., apply .replace("'",
"''") to projectMetadataJson, mainMetadataJson, districtMetadataJson,
homeMetadataJson, adminMetadataJson, stateMetadataJson wherever they are
embedded) and ensure the SQL uses COALESCE(...::jsonb, '[]'::jsonb) ||
'<escaped-json>'::jsonb to preserve behavior — update all occurrences in
ProjectMetabaseDashboardFunction that call postgresUtil.insertData listed in the
review (lines referencing the variables above).
- Around line 99-282: The match on event.reportType only handles case "Project"
and can throw a MatchError for other values; add a wildcard/default case to the
existing event.reportType match (the one in
ProjectMetabaseDashboardFunction.scala where case "Project" => is defined) that
logs the unexpected reportType (e.g., logger.warn or logger.info) and
returns/continues without processing so non-"Project" events are safely ignored;
ensure the default case is the last case (case _ => ...) and does not change
existing flow for the "Project" branch.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/project/UpdateParameters.scala`:
- Around line 67-68: The code unsafely casts dashboardJson.path("parameters") to
ArrayNode (variable currentParametersJson) which will throw if the key is
missing; change to a safe pattern match or conditional check similar to
UpdateAdminParameterFunction: inspect dashboardJson.path("parameters") (or
dashboardJson.get("parameters")), verify it is an ArrayNode (or not MissingNode)
and only then assign/convert to ArrayNode, otherwise handle the missing case
(e.g., treat as empty ArrayNode or return/skip) to avoid ClassCastException.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/survey/SurveyMetabaseDashboardFunction.scala`:
- Around line 286-292: The catch blocks in SurveyMetabaseDashboardFunction (in
SurveyMetabaseDashboardFunction.scala) interpolate e.getMessage directly into
SQL (e.g., the postgresUtil.insertData call) which breaks on quotes and allows
injection; update those catch blocks (all three occurrences around the handlers
that currently call postgresUtil.insertData with s"...error_message =
'${e.getMessage}'...") to either use a parameterized/prepared statement API on
postgresUtil (preferred) or at minimum escape single quotes in e.getMessage via
.replace("'", "''") before interpolation, and apply the same fix to each catch
block referenced (the ones near the current lines 288, 337, 366).

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/survey/UpdateCsvDownloadJsonFiles.scala`:
- Around line 52-68: The pattern match on row.get("config") in
UpdateCsvDownloadJsonFiles.scala is non-exhaustive and can throw MatchError; add
a case None (and a default case if other unexpected types are possible) to
safely log/warn and continue without throwing. Also fix
updateQuestionCardJsonValues so that on error it returns
configJson.path("questionCard") (the questionCard node) instead of returning the
full configJson; ensure the method references configJson.path("questionCard") in
its error/exception branches and returns that node consistently.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/survey/UpdateParameters.scala`:
- Around line 66-68: The code unsafely casts dashboardJson.path("parameters") to
ArrayNode (currentParametersJson) which can be a MissingNode and cause a
ClassCastException; change the logic that reads parameters (after
metabaseUtil.getDashboardDetailsById and objectMapper.readTree) to pattern-match
the result of dashboardJson.path("parameters") and handle cases: if it's an
ArrayNode use it, otherwise use objectMapper.createArrayNode() (or an empty
ArrayNode) so downstream code expects a valid ArrayNode; mirror the
pattern-matching approach used elsewhere in this file for parameter handling to
ensure safe behavior.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/survey/UpdateQuestionJsonFiles.scala`:
- Around line 94-123: processHeading currently iterates results("heading") for
every question causing duplication; change it to locate or build a single
heading dashCard per invocation (e.g., pick the appropriate heading row once —
first or matched by tabId/template) instead of looping all rows: in
processHeading (and the code paths that call it) replace the
results("heading").foreach loop with a single lookup (headOption or filter by
tabId) then parse that PGobject config into rootNode, update
visualization_settings.text, col, row, dashboard_tab_id and add only that one
dashCardsNode to dashcardsArray; ensure you remove the duplicated add/loop logic
and preserve the existing field updates (newRow/newCol/tabId) when creating the
single dashCard.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/survey/UpdateStatusJsonFiles.scala`:
- Around line 53-70: The match on row.get("config") in
UpdateStatusJsonFiles.scala is non-exhaustive and can throw a MatchError when
config is missing; update the match on row.get("config") to handle the
None/default case (in addition to case Some(queryValue: PGobject)) — e.g., add a
case None or a wildcard case that logs a warning via logger.warn and safely
continues so dashcardsArray, toOption, and downstream logic only run when config
is present and valid.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/user/ProcessTenantConstructor.scala`:
- Around line 67-84: The match on row.get("config") in ProcessTenantConstructor
is non-exhaustive and can throw MatchError; update the match on
row.get("config") (the block that currently handles case Some(queryValue:
PGobject)) to include fallback branches (at least case None and a case for
Some(other)) that safely handle missing or unexpected types: log an informative
message via logger (including the raw value/type), skip processing (do not call
objectMapper.readTree/toOption/dashcardsArray.add) when config is absent or not
a PGobject, and preserve current behavior for case Some(queryValue: PGobject) so
objectMapper and toOption usage remains unchanged.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/user/UserMetabaseDashboardFunction.scala`:
- Around line 132-134: The UPDATE builds SQL by interpolating the serialized
ArrayNode (userMetadataJson) directly into updateMetadataQuery and passing it to
postgresUtil.insertData, which allows SQL injection if any field contains
quotes; change this to use a parameterized/prepared statement or properly
escape/encode the JSON payload: serialize userMetadataJson to a compact JSON
string, then execute the UPDATE with a parameter placeholder binding that JSON
as a jsonb value (referencing updateMetadataQuery, userMetadataJson,
postgresUtil.insertData and metaDataTable), and apply the same fix to the other
occurrences at the reported locations (the similar concatenations at lines
~171/173).

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/userMapping/ProgramServiceFunction.scala`:
- Around line 154-163: The code in validateUserRemoval computes membershipId by
unconditionally calling .get on the Option returned by .find which throws
NoSuchElementException for non-members; change the logic so you only resolve the
found node after confirming isUserInGroup (or use pattern matching/foreach on
the Option) — e.g., keep the isUserInGroup check (computed from
groupDetails.get("members").elements().asScala.exists(...)) and move the
find(...).get call (or better, use find(...).foreach { node => val membershipId
= node.get("membership_id").asInt(); metabaseUtil.removeFromGroup(membershipId)
}) inside the true branch so metabaseUtil.removeFromGroup is only called with a
valid membershipId derived from the found node.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/userMapping/UserServiceFunction.scala`:
- Around line 99-114: The current printlns in UserServiceFunction.scala (the
block printing entity, eventType, name, uniqueUserName, tenantCode, orgIdOpt,
email, password, phone, stateId, districtId, status, isUserDeleted, orgDetails,
userRoles) expose PII and credentials; remove any prints that output email,
phone, password, uniqueUserName, orgDetails, or other PII and replace them with
non-sensitive indicators (e.g., log counts, existence flags, masked IDs, or
fixed labels like "<REDACTED>") or omit entirely, keeping only safe operational
context (entity, eventType, status, isUserDeleted, and orgId presence) in logs;
update the println calls within the function in UserServiceFunction.scala
accordingly and ensure no plaintext credentials remain in any log statements.
- Around line 538-547: validateUserRemoval currently computes isUserInGroup then
calls .find(...).get which can throw NoSuchElementException when the user is not
present; instead, call .find(...) once (e.g., val membershipNode =
groupDetails.get("members").elements().asScala.find(_.get("user_id").asInt() ==
userId)) and pattern-match or use foreach/onDefined to handle Some(node) ->
extract membership_id and call metabaseUtil.removeFromGroup(membershipId) and
None -> log "User is not a member"; update the function (validateUserRemoval) to
avoid .get on None and avoid iterating members twice.

In
`@stream-jobs/combined-stream-processor/src/main/resources/combined-stream.conf`:
- Line 71: The config sets test.mode = true while Kafka topics are named with
-prod suffix, which risks shipping test behavior into production; change
test.mode to false by default in combined-stream.conf (or make it driven by an
environment override) so production deployments use test.mode = false unless
explicitly overridden, and update any startup/config loading that reads
test.mode to prefer an environment variable (e.g., TEST_MODE) over the config
file to allow safe runtime overrides.
- Around line 73-78: The postgres config block currently hardcodes
host/port/username/password/database; replace these literal values in the
postgres block with environment-variable substitutions (e.g., use
${POSTGRES_HOST}, ${POSTGRES_PORT}, ${POSTGRES_USER}, ${POSTGRES_PASSWORD},
${POSTGRES_DB}) and ensure defaults are not sensitive; update any code expecting
those keys to read from the config (the postgres block and its keys host, port,
username, password, database) and document that deployment must supply the
corresponding env vars or overrides instead of committing secrets.

In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/domain/ProjectEvent.scala`:
- Around line 95-97: The current eager val taskCount calls tasks.size but tasks
defaults to null, causing an NPE during construction; change the tasks default
to an empty list and keep taskCount as-is (or make taskCount a def) so it never
operates on null — specifically update the tasks accessor
(readOrDefault[List[Map[String, Any]]]("tasks", ...)) to return
List.empty[Map[String, Any]] when the key is missing, ensuring taskCount =
tasks.size is safe.

In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/domain/SurveyEvent.scala`:
- Around line 62-74: The completedDate getter currently parses dateString with
Instant.parse and then with SimpleDateFormat, but if the second parse throws it
bubbles up; update completedDate to wrap the fallback
formatter.parse(dateString) call in its own try/catch (or a single outer
try/catch around both parsing attempts) so that any exception from the fallback
returns new Timestamp(System.currentTimeMillis()) instead of propagating;
reference the completedDate method and use
readOrDefault[String]("completedDate", "") and the existing
Timestamp/Instant.parse/SimpleDateFormat symbols when making the change, and
optionally log the parse failure for observability per fault-tolerance
guidelines.

In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/functions/MentoringStreamFunction.scala`:
- Around line 91-94: The tenant-derived table name construction in
MentoringStreamFunction (variables tenantSessionTable,
tenantSessionAttendanceTable, tenantConnectionsTable,
tenantOrgMentorRatingTable) uses tenantCode directly and risks SQL injection;
validate/sanitize tenantCode first (e.g., enforce /^[A-Za-z0-9_]+$/ and
reject/log invalid values) before building identifiers and never allow arbitrary
characters, and replace any inline value interpolation in SQL statements
(including those referenced around the function where tenantCode-derived
identifiers are used) with safe, parameterized queries for all
comparisons/values; add validation early in the processing path of
MentoringStreamFunction and centralize a sanitizeTenantCode helper to ensure all
identifier construction uses the validated token and all dynamic values use
prepared/parameterized statements.
- Around line 42-331: Wrap the entire body of processElement (in
MentoringStreamFunction.processElement) in a top-level try/catch so any
exception from DB calls (postgresUtil.*), helper methods (checkAndCreateTable,
deleteSessionAndAttendance, checkAndInsert, pushMentoringDashboardEvents) or
null/parse errors is caught; in the catch log the full exception and sufficient
event context (tenantCode, eventType, entity,
sessionId/attendanceId/connectionId), emit the failed event to a configured
dead-letter sink or metrics counter, and avoid rethrowing so the exception does
not propagate to Flink unhandled; ensure existing helper methods like
deleteSessionAndAttendance remain callable inside the try and that
resources/side-effects remain consistent in the finally block if needed.

In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/functions/ObservationStreamFunction.scala`:
- Around line 348-349: In the case Some(subdomains) branch of
ObservationStreamFunction (where criteriaEvent.find(_("_id") ==
criteriaId).foreach { crit => } is currently empty), implement the same domain
insertion performed in the case None branch: when a matching crit is found,
build the domain object from crit and subdomains and call the existing domain
insertion logic/service used in the None branch (reuse the same field mappings
and any helper methods or functions that create/insert domain records) so
subdomain criteria get inserted into the domain table just like the fallback
path.

In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/functions/SurveyStreamFunction.scala`:
- Around line 495-547: checkAndInsert currently injects event-derived values
directly into SQL (see variables query, getEntityNameQuery, upsertMetaDataQuery
and calls to postgresUtil.fetchData/insertData), causing SQL injection risk; fix
by parameterizing all SQL values (use prepared statements for the
EXISTS/select/insert/upsert queries and pass targetedId as a parameter) and stop
interpolating entityType/raw names into SQL structure — instead whitelist
allowed entityType values in checkAndInsert (e.g., map "program" -> table/column
names, "solution" -> table/column names, "admin" handled separately) and use
those safe, verified identifiers for table/column selection and aliases; keep
aliases/static SQL text out of string interpolation and call
postgresUtil.executePreparedUpdate / fetchData with parameter lists for
targetedId and any other dynamic values.
- Around line 549-581: The current checkExistenceOfFilterData builds SQL by
manually replacing "?" which is prone to SQL injection; replace this with a real
prepared-statement flow: construct the WHERE clause with "col = ?" placeholders
(keep using checkExistenceOfFilterData and columnValuePairs), collect params as
a Seq[String], and call a postgresUtil method that executes a prepared statement
with bound parameters (e.g., postgresUtil.executePreparedQuery(query, params){
rs => ... }); do the same for the count query (or validate/safely
quote/whitelist tableName since identifiers can't be parameterized) and remove
any use of string replaceFirst("\\?")/queryWithParams to ensure user values are
bound, not interpolated.

In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/functions/UserStreamFunction.scala`:
- Around line 345-366: The checkAndInsert function currently builds SQL by
string interpolation causing SQL injection; update checkAndInsert to use
parameterized queries for both the SELECT and INSERT instead of embedding
targetedId, entityType, and event.tenantCode directly into SQL: change the
SELECT to use a placeholder (e.g., "SELECT EXISTS (SELECT 1 FROM
${config.dashboard_metadata} WHERE entity_id = ?) AS is_present") and call
postgresUtil.fetchData with the query plus a params list/array containing
targetedId, and likewise change the INSERT to "INSERT INTO
${config.dashboard_metadata} (entity_type, entity_name, entity_id) VALUES (?, ?,
?)" and call postgresUtil.insertData with params [entityType, event.tenantCode,
targetedId]; if postgresUtil lacks parameterized methods add overloads that
accept query + params to safely bind values and ensure tenantCode is
validated/trimmed before use.
- Around line 300-321: The userMetric function currently interpolates tenantCode
directly into SQL causing SQL injection; update userMetric to pass tenantCode as
a bound parameter to postgresUtil.executePreparedUpdate (use a parameter
placeholder in the query instead of '$tenantCode') and ensure any other
user-derived values used in SQL are passed as parameters; for dynamic
identifiers like tenantUserTable (and similar uses in checkAndInsert and
checkAndCreateTable) implement strict validation/allowlist (e.g., match against
^[a-zA-Z0-9_]+$) before using them in the query string, and keep using
executePreparedUpdate for parameterized values via
postgresUtil.executePreparedUpdate.

In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/task/UnifiedStreamConfig.scala`:
- Around line 13-14: The file defines implicit TypeInformation for ProjectEvent
and MentoringEvent (mapTypeInfo, eventTypeInfo) but is missing implicits for
UserEvent, SurveyEvent, and ObservationEvent which causes implicit resolution
failures when instantiating UserStreamFunction, SurveyStreamFunction, and
ObservationStreamFunction; add implicit vals named e.g. userTypeInfo,
surveyTypeInfo, observationTypeInfo using
TypeExtractor.getForClass(classOf[UserEvent]),
TypeExtractor.getForClass(classOf[SurveyEvent]), and
TypeExtractor.getForClass(classOf[ObservationEvent]) respectively so the
compiler can find TypeInformation when constructing those stream functions.

In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/task/UnifiedStreamTask.scala`:
- Around line 164-166: The user sink is reading from the wrong side output tag:
replace the call to userStream.getSideOutput(config.eventOutputTag) with
userStream.getSideOutput(config.userOutputTag) so the UserStreamFunction's user
events are consumed correctly; update the line that chains
.addSink(kafkaConnector.kafkaStringSink(config.userOutputTopic)).name("user-sink").setParallelism(config.userSinkParallelism)
to use config.userOutputTag instead of config.eventOutputTag (locate the usage
of userStream.getSideOutput and the config.* tags).

Comment on lines 126 to 150
def createOverviewTabInsideTenantDashboard(parentCollectionId: Int, databaseId: Int, dashboardId: Int, tabIdMap: Map[String, Int], metaDataTable: String, reportConfig: String, metabaseDatabase: String, metabaseApiKey: String): Unit = {
try {
val dashboardName: String = s"Overview"
val safeTenantCode = tenantCode.replace("'", "''")
val createDashboardQuery = s"UPDATE $metaDataTable SET status = 'Failed',error_message = 'errorMessage' WHERE entity_id = '${safeTenantCode}_tenant_admin';"
val tabId: Int = tabIdMap.getOrElse(dashboardName, -1)
val orgNameId: Int = getTheColumnId(databaseId, tenantOrgRolesTable, "org_name", metabaseUtil, metabasePostgresUtil, metabaseApiKey, createDashboardQuery)
val reportConfigQuery: String = s"SELECT question_type, config FROM $reportConfig WHERE dashboard_name = 'Mentoring-Reports' AND report_name = 'Tenant-Overview' AND question_type IN ('big-number', 'graph');"
val questionCardIdList = ProcessTenantConstructor.ProcessAndUpdateJsonFiles(reportConfigQuery, parentCollectionId, databaseId, dashboardId, 0, orgNameId, 0, tenantUserTable,
tenantSessionTable, tenantSessionAttendanceTable, tenantConnectionsTable, tenantOrgMentorRatingTable,
tenantOrgRolesTable, Map.empty, orgId.toInt, tabId, metabaseUtil, postgresUtil)
val questionIdsString = "[" + questionCardIdList.mkString(",") + "]"
val parametersQuery = s"SELECT config FROM $reportConfig WHERE report_name='Tenant-Overview' AND question_type='overview-parameter'"
UpdateParameters.updateAdminParameterFunction(metabaseUtil, parametersQuery, dashboardId, postgresUtil)
val objectMapper = new ObjectMapper()
val userMetadataJson = objectMapper.createArrayNode().add(objectMapper.createObjectNode().put("collectionId", parentCollectionId).put("dashboardName", dashboardName).put("dashboardId", dashboardId).put("collectionFor", "Tenant Admin").put("questionIds", questionIdsString))
val updateMetadataQuery = s" UPDATE $metaDataTable SET main_metadata = COALESCE(main_metadata::jsonb, '[]'::jsonb) || '$userMetadataJson'::jsonb, status = 'Success' WHERE entity_id = '${safeTenantCode}_tenant_admin';"
postgresUtil.insertData(updateMetadataQuery)
}
catch {
case e: Exception =>
postgresUtil.insertData(s"UPDATE $metaDataTable SET status = 'Failed',error_message = '${e.getMessage}' WHERE entity_id = '${tenantCode}_tenant_admin';")
logger.error(s"An error occurred: ${e.getMessage}")
e.printStackTrace()
}
Copy link

@coderabbitai coderabbitai bot Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

orgId.toInt will throw NumberFormatException when orgId is empty.

createOverviewTabInsideTenantDashboard (and createComparisionTabInsideTenantDashboard) are called from createCollectionAndDashboardForTenant at Line 121-122, which executes before the orgId.nonEmpty guard at Line 104. When orgId is an empty string, orgId.toInt at Line 136 (and Line 164) throws, causing the tenant dashboard to be marked as "Failed" and leaving a partially-created dashboard.

🐛 Proposed fix — pass orgId safely or use a default
-            tenantOrgRolesTable, Map.empty, orgId.toInt, tabId, metabaseUtil, postgresUtil)
+            tenantOrgRolesTable, Map.empty, if (orgId.nonEmpty) orgId.toInt else 0, tabId, metabaseUtil, postgresUtil)

Apply the same fix at Line 164. Alternatively, refactor so orgId is parsed once and threaded as Option[Int].

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def createOverviewTabInsideTenantDashboard(parentCollectionId: Int, databaseId: Int, dashboardId: Int, tabIdMap: Map[String, Int], metaDataTable: String, reportConfig: String, metabaseDatabase: String, metabaseApiKey: String): Unit = {
try {
val dashboardName: String = s"Overview"
val safeTenantCode = tenantCode.replace("'", "''")
val createDashboardQuery = s"UPDATE $metaDataTable SET status = 'Failed',error_message = 'errorMessage' WHERE entity_id = '${safeTenantCode}_tenant_admin';"
val tabId: Int = tabIdMap.getOrElse(dashboardName, -1)
val orgNameId: Int = getTheColumnId(databaseId, tenantOrgRolesTable, "org_name", metabaseUtil, metabasePostgresUtil, metabaseApiKey, createDashboardQuery)
val reportConfigQuery: String = s"SELECT question_type, config FROM $reportConfig WHERE dashboard_name = 'Mentoring-Reports' AND report_name = 'Tenant-Overview' AND question_type IN ('big-number', 'graph');"
val questionCardIdList = ProcessTenantConstructor.ProcessAndUpdateJsonFiles(reportConfigQuery, parentCollectionId, databaseId, dashboardId, 0, orgNameId, 0, tenantUserTable,
tenantSessionTable, tenantSessionAttendanceTable, tenantConnectionsTable, tenantOrgMentorRatingTable,
tenantOrgRolesTable, Map.empty, orgId.toInt, tabId, metabaseUtil, postgresUtil)
val questionIdsString = "[" + questionCardIdList.mkString(",") + "]"
val parametersQuery = s"SELECT config FROM $reportConfig WHERE report_name='Tenant-Overview' AND question_type='overview-parameter'"
UpdateParameters.updateAdminParameterFunction(metabaseUtil, parametersQuery, dashboardId, postgresUtil)
val objectMapper = new ObjectMapper()
val userMetadataJson = objectMapper.createArrayNode().add(objectMapper.createObjectNode().put("collectionId", parentCollectionId).put("dashboardName", dashboardName).put("dashboardId", dashboardId).put("collectionFor", "Tenant Admin").put("questionIds", questionIdsString))
val updateMetadataQuery = s" UPDATE $metaDataTable SET main_metadata = COALESCE(main_metadata::jsonb, '[]'::jsonb) || '$userMetadataJson'::jsonb, status = 'Success' WHERE entity_id = '${safeTenantCode}_tenant_admin';"
postgresUtil.insertData(updateMetadataQuery)
}
catch {
case e: Exception =>
postgresUtil.insertData(s"UPDATE $metaDataTable SET status = 'Failed',error_message = '${e.getMessage}' WHERE entity_id = '${tenantCode}_tenant_admin';")
logger.error(s"An error occurred: ${e.getMessage}")
e.printStackTrace()
}
def createOverviewTabInsideTenantDashboard(parentCollectionId: Int, databaseId: Int, dashboardId: Int, tabIdMap: Map[String, Int], metaDataTable: String, reportConfig: String, metabaseDatabase: String, metabaseApiKey: String): Unit = {
try {
val dashboardName: String = s"Overview"
val safeTenantCode = tenantCode.replace("'", "''")
val createDashboardQuery = s"UPDATE $metaDataTable SET status = 'Failed',error_message = 'errorMessage' WHERE entity_id = '${safeTenantCode}_tenant_admin';"
val tabId: Int = tabIdMap.getOrElse(dashboardName, -1)
val orgNameId: Int = getTheColumnId(databaseId, tenantOrgRolesTable, "org_name", metabaseUtil, metabasePostgresUtil, metabaseApiKey, createDashboardQuery)
val reportConfigQuery: String = s"SELECT question_type, config FROM $reportConfig WHERE dashboard_name = 'Mentoring-Reports' AND report_name = 'Tenant-Overview' AND question_type IN ('big-number', 'graph');"
val questionCardIdList = ProcessTenantConstructor.ProcessAndUpdateJsonFiles(reportConfigQuery, parentCollectionId, databaseId, dashboardId, 0, orgNameId, 0, tenantUserTable,
tenantSessionTable, tenantSessionAttendanceTable, tenantConnectionsTable, tenantOrgMentorRatingTable,
tenantOrgRolesTable, Map.empty, if (orgId.nonEmpty) orgId.toInt else 0, tabId, metabaseUtil, postgresUtil)
val questionIdsString = "[" + questionCardIdList.mkString(",") + "]"
val parametersQuery = s"SELECT config FROM $reportConfig WHERE report_name='Tenant-Overview' AND question_type='overview-parameter'"
UpdateParameters.updateAdminParameterFunction(metabaseUtil, parametersQuery, dashboardId, postgresUtil)
val objectMapper = new ObjectMapper()
val userMetadataJson = objectMapper.createArrayNode().add(objectMapper.createObjectNode().put("collectionId", parentCollectionId).put("dashboardName", dashboardName).put("dashboardId", dashboardId).put("collectionFor", "Tenant Admin").put("questionIds", questionIdsString))
val updateMetadataQuery = s" UPDATE $metaDataTable SET main_metadata = COALESCE(main_metadata::jsonb, '[]'::jsonb) || '$userMetadataJson'::jsonb, status = 'Success' WHERE entity_id = '${safeTenantCode}_tenant_admin';"
postgresUtil.insertData(updateMetadataQuery)
}
catch {
case e: Exception =>
postgresUtil.insertData(s"UPDATE $metaDataTable SET status = 'Failed',error_message = '${e.getMessage}' WHERE entity_id = '${tenantCode}_tenant_admin';")
logger.error(s"An error occurred: ${e.getMessage}")
e.printStackTrace()
}
🤖 Prompt for AI Agents
In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/mentoring/MentoringMetabaseDashboardFunction.scala`
around lines 126 - 150, The bug is orgId.toInt can throw NumberFormatException
when orgId is empty in createOverviewTabInsideTenantDashboard (and similarly in
createComparisionTabInsideTenantDashboard) because
createCollectionAndDashboardForTenant calls these before the orgId.nonEmpty
guard; fix by defensively parsing orgId once (e.g., val maybeOrgId = if
(orgId.nonEmpty) Some(orgId.toInt) else None) in the caller or at the start of
createOverviewTabInsideTenantDashboard/createComparisionTabInsideTenantDashboard
and pass an Option[Int] (or a safe default like 0) into
ProcessTenantConstructor.ProcessAndUpdateJsonFiles and any other uses instead of
calling orgId.toInt directly; update all references to orgId.toInt to use the
safe parsed value or handle None accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prashanthShiksha, thank you for acknowledging this! Please ensure the fix handles the empty orgId case defensively to prevent the NumberFormatException. The same pattern should be applied at Line 164 in createComparisionTabInsideTenantDashboard as well.

Let me know if you need any clarification on the suggested approach! 🐰


🧠 Learnings used
Learnt from: prashanthShiksha
Repo: ELEVATE-Project/data-pipeline PR: 124
File: metabase-jobs/config-data-loader/projectJson/Observation-Domain/Domain-Report/json/table/1_domain_criteria_report.json:9-9
Timestamp: 2025-09-04T14:25:42.797Z
Learning: In the ELEVATE-Project/data-pipeline repository, when the maintainer prashanthShiksha confirms that entity_param and criteria_param field ID configurations have been corrected in Metabase template-tags, this indicates the intended behavior has been properly implemented for the customized entity type filter creation logic.

Comment on lines +549 to +581
private def checkExistenceOfFilterData(filterList: List[Map[String, String]], context: ProcessFunction[SurveyEvent, SurveyEvent]#Context, solutionId: String): Unit = {
logger.info(">>>>>>>>>>>>>> Checking existence of filter data...")
filterList.foreach { filter =>
val tableName = filter.getOrElse("table_name", "")
val columnValuePairs = filter.filterKeys(_ != "table_name").filter { case (_, v) => v != null && v.nonEmpty }
if (tableName.nonEmpty && columnValuePairs.nonEmpty) {
val whereClause = columnValuePairs.map { case (col, _) => s"""$col = ?""" }.mkString(" AND ")
val params = columnValuePairs.values.toSeq
val queryWithParams = params.foldLeft(s"SELECT EXISTS (SELECT 1 FROM $tableName WHERE $whereClause) AS data_exists") {
case (q, v) => q.replaceFirst("\\?", s"'${v.replace("'", "''")}'")
}
val dataExists = postgresUtil.executeQuery(queryWithParams) { rs =>
if (rs.next()) rs.getBoolean("data_exists") else false
}
val rowCountQuery = s"SELECT COUNT(*) AS row_count FROM $tableName"
val rowCount = postgresUtil.executeQuery(rowCountQuery) { rs =>
if (rs.next()) rs.getLong("row_count") else 0
}
if (!dataExists && rowCount > 0) {
val eventData = new java.util.HashMap[String, String]()
eventData.put("targetedSolution", solutionId)
eventData.put("filterTable", tableName.stripPrefix("\"").stripSuffix("\""))
eventData.put("filterSync", "Yes")
logger.info(s"As dataExits = $dataExists, rowCount = $rowCount hence pushing the message to kafka.")
pushSurveyDashboardEvents(eventData, context)
} else {
logger.info(s"As dataExits = $dataExists, rowCount = $rowCount hence stopping function.")
}
} else {
logger.info(s"table doesn't exits or filter json is empty, hence stopping function. ")
}
}
logger.info(s">>>>>>>>>>>>>> Completed checking existence of filter data...")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Manual ? placeholder replacement reconstructs an injectable query string.

Lines 557–558 use replaceFirst("\\?", s"'${v.replace("'", "''")}'") to build the final query from user-controlled filter values. This hand-rolled escaping is fragile and does not cover all injection vectors (e.g., backslash sequences in some SQL dialects). Use actual prepared statements through postgresUtil instead.

Proposed direction

Instead of manually substituting placeholders, execute the query with actual JDBC prepared statement parameters:

-        val queryWithParams = params.foldLeft(s"SELECT EXISTS (SELECT 1 FROM $tableName WHERE $whereClause) AS data_exists") {
-          case (q, v) => q.replaceFirst("\\?", s"'${v.replace("'", "''")}'")
-        }
-        val dataExists = postgresUtil.executeQuery(queryWithParams) { rs =>
+        val baseQuery = s"SELECT EXISTS (SELECT 1 FROM $tableName WHERE $whereClause) AS data_exists"
+        val dataExists = postgresUtil.executePreparedQuery(baseQuery, params.toSeq) { rs =>
           if (rs.next()) rs.getBoolean("data_exists") else false
         }
🤖 Prompt for AI Agents
In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/functions/SurveyStreamFunction.scala`
around lines 549 - 581, The current checkExistenceOfFilterData builds SQL by
manually replacing "?" which is prone to SQL injection; replace this with a real
prepared-statement flow: construct the WHERE clause with "col = ?" placeholders
(keep using checkExistenceOfFilterData and columnValuePairs), collect params as
a Seq[String], and call a postgresUtil method that executes a prepared statement
with bound parameters (e.g., postgresUtil.executePreparedQuery(query, params){
rs => ... }); do the same for the count query (or validate/safely
quote/whitelist tableName since identifiers can't be parameterized) and remove
any use of string replaceFirst("\\?")/queryWithParams to ensure user values are
bound, not interpolated.

Comment on lines +300 to +321
def userMetric(tenantUserTable: String): Unit = {
val upsertQuery =
s"""
|INSERT INTO $userMetrics (tenant_code, total_users, active_users, deleted_users, last_updated)
|SELECT
| '$tenantCode',
| COUNT(user_id) AS total_users,
| COUNT(*) FILTER (WHERE status = 'ACTIVE') AS active_users,
| COUNT(*) FILTER (WHERE is_deleted = true) AS deleted_users,
| CURRENT_TIMESTAMP
|FROM $tenantUserTable
|ON CONFLICT (tenant_code) DO UPDATE SET
| total_users = EXCLUDED.total_users,
| active_users = EXCLUDED.active_users,
| deleted_users = EXCLUDED.deleted_users,
| last_updated = EXCLUDED.last_updated;
|""".stripMargin

postgresUtil.executePreparedUpdate(upsertQuery, Seq.empty, userMetrics, tenantUserTable)
println(s"User metrics updated for tenant: $tenantUserTable")

}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

SQL injection risk — tenantCode is interpolated directly into SQL.

tenantCode originates from the incoming event and is spliced directly into the query string ('$tenantCode' at Line 305). An attacker-controlled event could inject arbitrary SQL. The same pattern exists in checkAndInsert (Line 350, 359) and checkAndCreateTable (Line 373).

Use parameterized queries (? placeholders) for all user-derived values. For dynamic table names (which can't be parameterized), validate/allowlist tenantCode against ^[a-zA-Z0-9_]+$ before use.

As per coding guidelines, "No credential exposure" and safe DB handling are required for stream-jobs.

Proposed fix for userMetric — use a parameterized approach
     def userMetric(tenantUserTable: String): Unit = {
+      // Validate tenantCode to prevent SQL injection
+      require(tenantCode.matches("^[a-zA-Z0-9_]+$"), s"Invalid tenantCode: $tenantCode")
       val upsertQuery =
         s"""
-           |INSERT INTO $userMetrics (tenant_code, total_users, active_users, deleted_users, last_updated)
+           |INSERT INTO $userMetrics (tenant_code, total_users, active_users, deleted_users, last_updated)
            |SELECT
-           |  '$tenantCode',
+           |  ?,
            |  COUNT(user_id) AS total_users,
            |  COUNT(*) FILTER (WHERE status = 'ACTIVE') AS active_users,
            |  COUNT(*) FILTER (WHERE is_deleted = true) AS deleted_users,
            |  CURRENT_TIMESTAMP
            |FROM $tenantUserTable
            |ON CONFLICT (tenant_code) DO UPDATE SET
            |  total_users = EXCLUDED.total_users,
            |  active_users = EXCLUDED.active_users,
            |  deleted_users = EXCLUDED.deleted_users,
            |  last_updated = EXCLUDED.last_updated;
            |""".stripMargin
 
-      postgresUtil.executePreparedUpdate(upsertQuery, Seq.empty, userMetrics, tenantUserTable)
+      postgresUtil.executePreparedUpdate(upsertQuery, Seq(tenantCode), userMetrics, tenantUserTable)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def userMetric(tenantUserTable: String): Unit = {
val upsertQuery =
s"""
|INSERT INTO $userMetrics (tenant_code, total_users, active_users, deleted_users, last_updated)
|SELECT
| '$tenantCode',
| COUNT(user_id) AS total_users,
| COUNT(*) FILTER (WHERE status = 'ACTIVE') AS active_users,
| COUNT(*) FILTER (WHERE is_deleted = true) AS deleted_users,
| CURRENT_TIMESTAMP
|FROM $tenantUserTable
|ON CONFLICT (tenant_code) DO UPDATE SET
| total_users = EXCLUDED.total_users,
| active_users = EXCLUDED.active_users,
| deleted_users = EXCLUDED.deleted_users,
| last_updated = EXCLUDED.last_updated;
|""".stripMargin
postgresUtil.executePreparedUpdate(upsertQuery, Seq.empty, userMetrics, tenantUserTable)
println(s"User metrics updated for tenant: $tenantUserTable")
}
def userMetric(tenantUserTable: String): Unit = {
// Validate tenantCode to prevent SQL injection
require(tenantCode.matches("^[a-zA-Z0-9_]+$"), s"Invalid tenantCode: $tenantCode")
val upsertQuery =
s"""
|INSERT INTO $userMetrics (tenant_code, total_users, active_users, deleted_users, last_updated)
|SELECT
| ?,
| COUNT(user_id) AS total_users,
| COUNT(*) FILTER (WHERE status = 'ACTIVE') AS active_users,
| COUNT(*) FILTER (WHERE is_deleted = true) AS deleted_users,
| CURRENT_TIMESTAMP
|FROM $tenantUserTable
|ON CONFLICT (tenant_code) DO UPDATE SET
| total_users = EXCLUDED.total_users,
| active_users = EXCLUDED.active_users,
| deleted_users = EXCLUDED.deleted_users,
| last_updated = EXCLUDED.last_updated;
|""".stripMargin
postgresUtil.executePreparedUpdate(upsertQuery, Seq(tenantCode), userMetrics, tenantUserTable)
println(s"User metrics updated for tenant: $tenantUserTable")
}
🤖 Prompt for AI Agents
In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/functions/UserStreamFunction.scala`
around lines 300 - 321, The userMetric function currently interpolates
tenantCode directly into SQL causing SQL injection; update userMetric to pass
tenantCode as a bound parameter to postgresUtil.executePreparedUpdate (use a
parameter placeholder in the query instead of '$tenantCode') and ensure any
other user-derived values used in SQL are passed as parameters; for dynamic
identifiers like tenantUserTable (and similar uses in checkAndInsert and
checkAndCreateTable) implement strict validation/allowlist (e.g., match against
^[a-zA-Z0-9_]+$) before using them in the query string, and keep using
executePreparedUpdate for parameterized values via
postgresUtil.executePreparedUpdate.

Comment on lines +345 to +366
def checkAndInsert(entityType: String, targetedId: String, dashboardData: java.util.HashMap[String, String], dashboardKey: String): Unit = {
if (tenantCode.isEmpty) {
println(s"Tenant code is empty, skipping insertion.")
return
}
val query = s"SELECT EXISTS (SELECT 1 FROM ${config.dashboard_metadata} WHERE entity_id = '$targetedId') AS is_present"
val result = postgresUtil.fetchData(query)

result.foreach { row =>
row.get(s"is_present") match {
case Some(isPresent: Boolean) if isPresent =>
println(s"$entityType details already exist.")
case _ =>
if (entityType == "User Dashboard") {
val insertQuery = s"INSERT INTO ${config.dashboard_metadata} (entity_type, entity_name, entity_id) VALUES ('$entityType', '${event.tenantCode}', '$targetedId')"
val affectedRows = postgresUtil.insertData(insertQuery)
println(s"Inserted userDashboard details. Affected rows: $affectedRows")
dashboardData.put("tenantCode", targetedId)
}
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

SQL injection in checkAndInsert — entity values interpolated directly.

Lines 350 and 359 splice targetedId, entityType, and event.tenantCode directly into SQL strings without parameterization. This is exploitable via crafted event data.

Proposed fix — use parameterized queries
-      val query = s"SELECT EXISTS (SELECT 1 FROM ${config.dashboard_metadata} WHERE entity_id = '$targetedId') AS is_present"
-      val result = postgresUtil.fetchData(query)
+      val query = s"SELECT EXISTS (SELECT 1 FROM ${config.dashboard_metadata} WHERE entity_id = ?) AS is_present"
+      // Use a parameterized fetch variant
-              val insertQuery = s"INSERT INTO ${config.dashboard_metadata} (entity_type, entity_name, entity_id) VALUES ('$entityType', '${event.tenantCode}', '$targetedId')"
+              val insertQuery = s"INSERT INTO ${config.dashboard_metadata} (entity_type, entity_name, entity_id) VALUES (?, ?, ?)"
+              // Pass params: Seq(entityType, event.tenantCode, targetedId)
🤖 Prompt for AI Agents
In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/functions/UserStreamFunction.scala`
around lines 345 - 366, The checkAndInsert function currently builds SQL by
string interpolation causing SQL injection; update checkAndInsert to use
parameterized queries for both the SELECT and INSERT instead of embedding
targetedId, entityType, and event.tenantCode directly into SQL: change the
SELECT to use a placeholder (e.g., "SELECT EXISTS (SELECT 1 FROM
${config.dashboard_metadata} WHERE entity_id = ?) AS is_present") and call
postgresUtil.fetchData with the query plus a params list/array containing
targetedId, and likewise change the INSERT to "INSERT INTO
${config.dashboard_metadata} (entity_type, entity_name, entity_id) VALUES (?, ?,
?)" and call postgresUtil.insertData with params [entityType, event.tenantCode,
targetedId]; if postgresUtil lacks parameterized methods add overloads that
accept query + params to safely bind values and ensure tenantCode is
validated/trimmed before use.

Comment on lines +13 to +14
implicit val mapTypeInfo: TypeInformation[ProjectEvent] = TypeExtractor.getForClass(classOf[ProjectEvent])
implicit val eventTypeInfo: TypeInformation[MentoringEvent] = TypeExtractor.getForClass(classOf[MentoringEvent])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, read the UnifiedStreamConfig.scala file to see what's actually declared
echo "=== UnifiedStreamConfig.scala content ==="
cat stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/task/UnifiedStreamConfig.scala | head -100

Repository: ELEVATE-Project/data-pipeline

Length of output: 6577


🏁 Script executed:

#!/bin/bash
# Find the stream function classes and check their constructors
echo "=== Looking for stream function class definitions ==="
find stream-jobs -name "*.scala" -type f | xargs grep -l "class.*StreamFunction" | head -10

Repository: ELEVATE-Project/data-pipeline

Length of output: 1525


🏁 Script executed:

#!/bin/bash
# Search for uses of implicit TypeInformation to understand the pattern
echo "=== All implicit TypeInformation declarations in stream-jobs ==="
rg 'implicit.*TypeInformation' stream-jobs/ --type scala

Repository: ELEVATE-Project/data-pipeline

Length of output: 14665


🏁 Script executed:

#!/bin/bash
# Check where the stream functions are instantiated
echo "=== Checking where UserStreamFunction is instantiated ==="
rg 'new UserStreamFunction|UserStreamFunction(' stream-jobs/combined-stream-processor/src/main/scala/ -A2 -B2

echo "=== Checking where SurveyStreamFunction is instantiated ==="
rg 'new SurveyStreamFunction|SurveyStreamFunction(' stream-jobs/combined-stream-processor/src/main/scala/ -A2 -B2

echo "=== Checking where ObservationStreamFunction is instantiated ==="
rg 'new ObservationStreamFunction|ObservationStreamFunction(' stream-jobs/combined-stream-processor/src/main/scala/ -A2 -B2

Repository: ELEVATE-Project/data-pipeline

Length of output: 713


🏁 Script executed:

#!/bin/bash
# Check the actual stream function class signatures
echo "=== UserStreamFunction constructor ==="
cat stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/functions/UserStreamFunction.scala | head -20

echo "=== SurveyStreamFunction constructor ==="
cat stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/functions/SurveyStreamFunction.scala | head -20

echo "=== ObservationStreamFunction constructor ==="
cat stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/functions/ObservationStreamFunction.scala | head -20

Repository: ELEVATE-Project/data-pipeline

Length of output: 3160


🏁 Script executed:

#!/bin/bash
# Fix the regex escaping and search for function instantiation
echo "=== Where UserStreamFunction is instantiated ==="
rg 'UserStreamFunction\(' stream-jobs/combined-stream-processor/src/main/scala/ -B2 -A2

echo "=== Where SurveyStreamFunction is instantiated ==="
rg 'SurveyStreamFunction\(' stream-jobs/combined-stream-processor/src/main/scala/ -B2 -A2

echo "=== Where ObservationStreamFunction is instantiated ==="
rg 'ObservationStreamFunction\(' stream-jobs/combined-stream-processor/src/main/scala/ -B2 -A2

Repository: ELEVATE-Project/data-pipeline

Length of output: 8780


Add missing implicit TypeInformation declarations for UserEvent, SurveyEvent, and ObservationEvent.

Only ProjectEvent and MentoringEvent have implicit TypeInformation declarations (lines 13-14). When UserStreamFunction, SurveyStreamFunction, and ObservationStreamFunction are instantiated in UnifiedStreamTask.scala with new UserStreamFunction(config), etc., the Scala compiler will fail to resolve the required implicit TypeInformation[UserEvent], TypeInformation[SurveyEvent], and TypeInformation[ObservationEvent] parameters, causing compilation errors.

Add:

implicit val userTypeInfo: TypeInformation[UserEvent] = TypeExtractor.getForClass(classOf[UserEvent])
implicit val surveyTypeInfo: TypeInformation[SurveyEvent] = TypeExtractor.getForClass(classOf[SurveyEvent])
implicit val observationTypeInfo: TypeInformation[ObservationEvent] = TypeExtractor.getForClass(classOf[ObservationEvent])
🤖 Prompt for AI Agents
In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/task/UnifiedStreamConfig.scala`
around lines 13 - 14, The file defines implicit TypeInformation for ProjectEvent
and MentoringEvent (mapTypeInfo, eventTypeInfo) but is missing implicits for
UserEvent, SurveyEvent, and ObservationEvent which causes implicit resolution
failures when instantiating UserStreamFunction, SurveyStreamFunction, and
ObservationStreamFunction; add implicit vals named e.g. userTypeInfo,
surveyTypeInfo, observationTypeInfo using
TypeExtractor.getForClass(classOf[UserEvent]),
TypeExtractor.getForClass(classOf[SurveyEvent]), and
TypeExtractor.getForClass(classOf[ObservationEvent]) respectively so the
compiler can find TypeInformation when constructing those stream functions.

Comment on lines +164 to +166
userStream.getSideOutput(config.eventOutputTag) // Note: User stream uses eventOutputTag for user dashboard
.addSink(kafkaConnector.kafkaStringSink(config.userOutputTopic))
.name("user-sink").setParallelism(config.userSinkParallelism)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's locate the UnifiedStreamTask.scala file and examine the critical lines
find . -name "UnifiedStreamTask.scala" -type f

Repository: ELEVATE-Project/data-pipeline

Length of output: 203


🏁 Script executed:

#!/bin/bash
# Search for eventOutputTag and userOutputTag across the codebase
rg -n 'eventOutputTag|userOutputTag' --type=scala -C2

Repository: ELEVATE-Project/data-pipeline

Length of output: 23390


🏁 Script executed:

#!/bin/bash
# Let's also check the config class definitions to understand these tags
rg -n 'eventOutputTag|userOutputTag' --type=scala | head -30

Repository: ELEVATE-Project/data-pipeline

Length of output: 5914


Replace config.eventOutputTag with config.userOutputTag at line 164.

The UserStreamFunction outputs user events to config.userOutputTag, not config.eventOutputTag. The single-stream variant (line 164) currently reads from the wrong tag—one reserved for observation events. This causes user events to be lost and observation data to be misrouted to the user sink.

Proposed fix
-        userStream.getSideOutput(config.eventOutputTag) // Note: User stream uses eventOutputTag for user dashboard
+        userStream.getSideOutput(config.userOutputTag)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
userStream.getSideOutput(config.eventOutputTag) // Note: User stream uses eventOutputTag for user dashboard
.addSink(kafkaConnector.kafkaStringSink(config.userOutputTopic))
.name("user-sink").setParallelism(config.userSinkParallelism)
userStream.getSideOutput(config.userOutputTag)
.addSink(kafkaConnector.kafkaStringSink(config.userOutputTopic))
.name("user-sink").setParallelism(config.userSinkParallelism)
🤖 Prompt for AI Agents
In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/task/UnifiedStreamTask.scala`
around lines 164 - 166, The user sink is reading from the wrong side output tag:
replace the call to userStream.getSideOutput(config.eventOutputTag) with
userStream.getSideOutput(config.userOutputTag) so the UserStreamFunction's user
events are consumed correctly; update the line that chains
.addSink(kafkaConnector.kafkaStringSink(config.userOutputTopic)).name("user-sink").setParallelism(config.userSinkParallelism)
to use config.userOutputTag instead of config.eventOutputTag (locate the usage
of userStream.getSideOutput and the config.* tags).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review continued from previous batch...

Comment on lines +38 to +40
val cardId = mapper.readTree(metabaseUtil.createQuestionCard(requestBody.toString)).path("id").asInt()
logger.info(s">>>>>>>>> Successfully created question card with card_id: $cardId for $chartName")
questionCardId.append(cardId)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Unguarded card creation — crashes on failure and silently accepts cardId = 0.

Line 38 has two compounding problems:

  1. If metabaseUtil.createQuestionCard throws or returns non-JSON, mapper.readTree(...) throws an unhandled exception that aborts the entire processing loop — no remaining rows will be processed.
  2. .path("id").asInt() returns 0 when "id" is absent or null, so a Metabase error response (e.g., {"error": "..."}) silently yields cardId = 0, which is appended to the result list and wired into dashCards.

Compare with ProcessStateConstructor which at least wraps extraction in Try/Option (though it has the same asInt() zero-default issue).

Proposed fix
-val cardId = mapper.readTree(metabaseUtil.createQuestionCard(requestBody.toString)).path("id").asInt()
-logger.info(s">>>>>>>>> Successfully created question card with card_id: $cardId for $chartName")
-questionCardId.append(cardId)
-val updatedQuestionIdInDashCard = updateQuestionIdInDashCard(configJson, dashboardId, cardId)
+val responseJson = Try(mapper.readTree(metabaseUtil.createQuestionCard(requestBody.toString)))
+responseJson match {
+  case Success(json) =>
+    val idNode = json.get("id")
+    if (idNode == null || idNode.isNull) {
+      logger.error(s"Metabase response missing 'id' for chart: $chartName. Skipping.")
+    } else {
+      val cardId = idNode.asInt()
+      logger.info(s">>>>>>>>> Successfully created question card with card_id: $cardId for $chartName")
+      questionCardId.append(cardId)
+      val updatedQuestionIdInDashCard = updateQuestionIdInDashCard(configJson, dashboardId, cardId)
+      // ... rest of dashCards logic ...
+    }
+  case Failure(ex) =>
+    logger.error(s"Failed to create question card for $chartName: ${ex.getMessage}", ex)
+}
🤖 Prompt for AI Agents
In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/mentoring/ProcessOrgConstructor.scala`
around lines 38 - 40, In ProcessOrgConstructor, guard the metabase card creation
and JSON parsing: wrap the metabaseUtil.createQuestionCard(...) +
mapper.readTree(...) call in a Try (or try/catch), verify the parsed JsonNode
has a non-null "id" field and that .isInt (or parseable) before calling
.asInt(), and only call questionCardId.append(cardId) when the id exists and >
0; on any failure (exception, non-JSON response, missing/invalid "id") log the
full response/error with context (including chartName and requestBody) and
skip/continue processing this row instead of letting the exception bubble or
appending a zero id so downstream dashCards are not polluted.

Comment on lines +75 to +98
val solutionName = postgresUtil.fetchData(s"""SELECT entity_name FROM $metaDataTable WHERE entity_id = '$targetedSolutionId'""").collectFirst { case map: Map[_, _] => map.getOrElse("entity_name", "").toString }.getOrElse("")
val targetedProgramId: String = {
val id = Option(event.targetedProgram).map(_.trim).getOrElse("")
if (id.nonEmpty) id
else {
val query = s"SELECT program_id FROM $solutions WHERE solution_id = '$targetedSolutionId'"
postgresUtil.fetchData(query) match {
case List(map: Map[_, _]) => map.get("program_id").map(_.toString).getOrElse("")
case _ => ""
}
}
}
val storedTableIds = TrieMap.empty[(Int, String), Int]
val storedColumnIds = TrieMap.empty[(Int, String), Int]
val programName = postgresUtil.fetchData(s"""SELECT entity_name from $metaDataTable where entity_id = '$targetedProgramId'""").collectFirst { case map: Map[_, _] => map.get("entity_name").map(_.toString).getOrElse("") }.getOrElse("").take(80)
val orgId = postgresUtil.fetchData(s"""SELECT org_id FROM $solutions WHERE program_id = '$targetedProgramId' AND org_id IS NOT NULL AND TRIM(org_id) <> '' LIMIT 1 """).collectFirst { case map: Map[_, _] => map.getOrElse("org_id", "").toString }.getOrElse("")
val query =
s"SELECT program_description, external_id, program_external_id, description FROM $solutionTable WHERE solution_id = '$targetedSolutionId'"
val resultMap = postgresUtil.fetchData(query).collectFirst { case map: Map[_, _] => map }.getOrElse(Map.empty)

val programDescription = resultMap.get("program_description").map(_.toString).getOrElse("")
val solutionExternalId = resultMap.get("external_id").map(_.toString).getOrElse("")
val programExternalId = resultMap.get("program_external_id").map(_.toString).getOrElse("")
val solutionDescription = resultMap.get("description").map(_.toString).getOrElse("")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

SQL injection risk: event-derived values interpolated directly into SQL queries without parameterization.

Multiple queries embed targetedSolutionId, targetedProgramId, and other event-sourced values via string interpolation:

  • Line 75: WHERE entity_id = '$targetedSolutionId'
  • Line 80: WHERE solution_id = '$targetedSolutionId'
  • Line 89: WHERE entity_id = '$targetedProgramId'
  • Line 90: WHERE program_id = '$targetedProgramId'

If any of these values contain a single quote (e.g., a malicious or malformed event payload), this breaks the query or enables SQL injection. Use parameterized queries via PreparedStatement instead of string interpolation.

As per coding guidelines, verify "No credential exposure" and correct DB handling — string-interpolated SQL with external input is a security risk.

🤖 Prompt for AI Agents
In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/observation/ObservationMetabaseDashboardFunction.scala`
around lines 75 - 98, Replace string-interpolated SQL usage with
parameterized/prepared statements: stop building queries that embed
targetedSolutionId and targetedProgramId (and any event-derived values) directly
in postgresUtil.fetchData calls (e.g., the queries that reference
metaDataTable/entity_id, solutions/solution_id, metaDataTable/entity_id for
targetedProgramId, and solutions/program_id); instead use the postgresUtil
method that accepts a SQL string with placeholders and a parameter list or
create a PreparedStatement to bind targetedSolutionId/targetedProgramId (and
other event values) safely, then fetch results into resultMap/programName/orgId
the same way. Ensure all occurrences around variables targetedSolutionId,
targetedProgramId, postgresUtil.fetchData, metaDataTable, solutions, and
solutionTable are updated to use parameter binding to eliminate SQL injection
risk.

Comment on lines +121 to +126
def extractCardId(response: String): Option[Int] = {
Try {
val jsonResponse = objectMapper.readTree(response)
jsonResponse.get("id").asInt()
}.toOption
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

extractCardId returns Some(0) when the Metabase response has no "id" field.

JsonNode.asInt() returns 0 for a missing or null node, so a failed card creation that returns a valid JSON body without "id" will silently produce Some(0) — a bogus card ID that gets appended to questionCardId and wired into dashCards.

Proposed fix
 def extractCardId(response: String): Option[Int] = {
   Try {
     val jsonResponse = objectMapper.readTree(response)
-    jsonResponse.get("id").asInt()
-  }.toOption
+    val idNode = jsonResponse.get("id")
+    if (idNode == null || idNode.isNull) throw new NoSuchElementException("No 'id' in response")
+    idNode.asInt()
+  }.toOption
 }
🤖 Prompt for AI Agents
In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/project/ProcessStateConstructor.scala`
around lines 121 - 126, The current extractCardId method can return Some(0)
because jsonResponse.get("id").asInt() yields 0 when "id" is missing; update
extractCardId to explicitly check for the presence and type of the "id" node
before converting it: after parsing with objectMapper.readTree(response) verify
jsonResponse.hasNonNull("id") (or jsonResponse.get("id") != null &&
jsonResponse.get("id").isInt) and only then call asInt(), otherwise return None;
keep the Try to handle parse errors and return None on exceptions.

Comment on lines +52 to +68
else {
row.get("config") match {
case Some(queryValue: PGobject) =>
val jsonString = queryValue.getValue
val rootNode = objectMapper.readTree(jsonString)
if (rootNode != null) {
val optJsonNode = toOption(rootNode)
optJsonNode.foreach { node =>
val dashCardsNode = node.path("dashCards")
if (!dashCardsNode.isMissingNode && !dashCardsNode.isNull) {
dashcardsArray.add(dashCardsNode)
} else {
logger.warn("No 'dashCards' key found in the JSON.")
}
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Same non-exhaustive match and inconsistent error return bugs as sibling files.

  • Lines 52-68: Missing case None/default in the heading branch — MatchError risk.
  • Line 135: updateQuestionCardJsonValues returns configJson (full config) on error instead of configJson.path("questionCard").

These are the same issues flagged in UpdateStatusJsonFiles.scala, ProcessTenantConstructor.scala, and UpdateQuestionDomainJsonFiles.scala.

🤖 Prompt for AI Agents
In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/survey/UpdateCsvDownloadJsonFiles.scala`
around lines 52 - 68, The pattern match on row.get("config") in
UpdateCsvDownloadJsonFiles.scala is non-exhaustive and can throw MatchError; add
a case None (and a default case if other unexpected types are possible) to
safely log/warn and continue without throwing. Also fix
updateQuestionCardJsonValues so that on error it returns
configJson.path("questionCard") (the questionCard node) instead of returning the
full configJson; ensure the method references configJson.path("questionCard") in
its error/exception branches and returns that node consistently.

Comment on lines +94 to +123
def processHeading(questionText: String): Unit = {
results("heading").foreach { headingRow =>
headingRow.get("config") match {
case Some(queryValue: PGobject) =>
val jsonString = queryValue.getValue
val rootNode = objectMapper.readTree(jsonString)
Option(rootNode).foreach { node =>
val dashCardsNode = node.path("dashCards")
val visualizationSettingsNode = dashCardsNode.path("visualization_settings")
if (visualizationSettingsNode.has("text")) {
visualizationSettingsNode.asInstanceOf[ObjectNode].put("text", questionText)
}
newRow += 3
dashCardsNode.asInstanceOf[ObjectNode].put("col", newCol)
dashCardsNode.asInstanceOf[ObjectNode].put("row", newRow)
dashCardsNode.asInstanceOf[ObjectNode].put("row", newRow)
dashCardsNode.asInstanceOf[ObjectNode].put("dashboard_tab_id", tabId)
toOption(node).foreach { node =>
val dashCardsNode = node.path("dashCards")
if (!dashCardsNode.isMissingNode && !dashCardsNode.isNull) {
dashcardsArray.add(dashCardsNode)
} else {
logger.warn("No 'dashCards' key found in the JSON.")
}
}
}
case None => logger.warn("Key 'config' not found in the heading result row.")
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

processHeading iterates over ALL heading config rows on every call — headings will be duplicated on the dashboard.

processHeading is called once per question (Lines 193, 207, 215). Each call iterates over results("heading") (all heading rows from the DB) and appends every heading's dashCards to dashcardsArray. If there are N questions and M heading rows, M×N heading dashCards will be added to the dashboard instead of just one per question.

This should likely construct a single heading dashCard from the template rather than iterating all heading configs.

🤖 Prompt for AI Agents
In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/survey/UpdateQuestionJsonFiles.scala`
around lines 94 - 123, processHeading currently iterates results("heading") for
every question causing duplication; change it to locate or build a single
heading dashCard per invocation (e.g., pick the appropriate heading row once —
first or matched by tabId/template) instead of looping all rows: in
processHeading (and the code paths that call it) replace the
results("heading").foreach loop with a single lookup (headOption or filter by
tabId) then parse that PGobject config into rootNode, update
visualization_settings.text, col, row, dashboard_tab_id and add only that one
dashCardsNode to dashcardsArray; ensure you remove the duplicated add/loop logic
and preserve the existing field updates (newRow/newCol/tabId) when creating the
single dashCard.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 37

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
jobs-core/src/main/resources/base-config.conf (1)

11-20: ⚠️ Potential issue | 🟠 Major

Distributed checkpointing with a local file:// path is inconsistent.

enable.distributed.checkpointing = true (Line 11) paired with base.url = "file:///opt/flink/checkpoints" (Line 20) is contradictory. Distributed checkpointing requires a shared storage backend (HDFS, S3, WASB, etc.) accessible by all TaskManagers. A local file:// path means checkpoints are only on the local node, breaking recovery in any multi-node deployment.

Additionally, the blob storage settings on lines 14–18 (account, container, checkpointing.dir) are now effectively dead configuration since base.url no longer references them. Either remove the stale blob config or restore a distributed storage URL.

🤖 Fix all issues with AI agents
In `@Documentation/log-conf-file/log4j2.properties`:
- Around line 36-40: The Delete strategy for the rolling appender named
"project" currently targets basePath = logs and will purge all files older than
7d because it lacks an IfFileName filter; update the delete rule for
appender.project.strategy.delete to include an IfFileName condition (set
appender.project.strategy.delete.ifFileName.type = IfFileName and a file name
glob like appender.project.strategy.delete.ifFileName.glob =
project-stream-*.log* or whatever pattern matches the project's archived
filenames) so only the intended subsystem files are deleted, and apply the same
pattern-scoped IfFileName change to every other appender's delete strategy to
avoid cross-subsystem deletions.

In
`@jobs-core/src/main/scala/org/shikshalokam/job/connector/FlinkKafkaConnector.scala`:
- Around line 35-39: The code mutates a shared java.util.Properties from
config.kafkaConsumerProperties in kafkaJobRequestSource (and similarly in other
source factories), which corrupts group.id for all consumers; fix by creating a
defensive copy of the Properties before calling setProperty("group.id", groupId)
(e.g., new Properties() and putAll(config.kafkaConsumerProperties)) and then
pass that copied Properties to the FlinkKafkaConsumer; update
kafkaJobRequestSource and the other factory methods (kafkaMapSource,
kafkaStringSource, single-arg kafkaJobRequestSource variants) to use the copied
Properties instead of mutating config.kafkaConsumerProperties directly.

In
`@metabase-jobs/combined-dashboard-creator/src/main/resources/combined-dashboard-creator.conf`:
- Around line 99-107: The config currently hardcodes sensitive values
(metabase.username, metabase.password, metabase.metabaseApiKey and related
fields like domainName/database/evidenceBaseUrl); remove the real email and
secrets and change these keys to load from environment variables or a secret
manager (e.g., replace username/password/metabaseApiKey with env-based
placeholders and fallback checks), update the code that reads this config to
validate presence (throw helpful error if missing) and document the required env
vars so no PII or plaintext credentials remain in the committed
combined-dashboard-creator.conf.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/mentoring/MentoringMetabaseDashboardFunction.scala`:
- Around line 210-272: The queries in getTheTableId and getTheColumnId are
vulnerable to SQL injection because tableName, columnName and similar values are
interpolated into SQL strings; change these methods to use parameterized queries
in PostgresUtil (e.g., add/use a fetchDataWithParams or prepareAndFetch method
that accepts SQL with ? placeholders and a Seq[Any] of parameters) and replace
usages in getTheTableId (the SELECT id FROM metabase_table ...) and
getTheColumnId (the SELECT id FROM metabase_field ...) to call that
parameterized API with databaseId, tableName and columnName as bound params;
also update the metaTableQuery handling to use parameter binding (or a safe
update method) instead of string replace for errorMessage and ensure any
postgresUtil.insertData variant accepts parameters to avoid concatenating
escaped strings.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/mentoring/UpdateParameters.scala`:
- Line 65: The code unsafely casts dashboardJson.path("parameters") to ArrayNode
(currentParametersJson) which will throw a ClassCastException when "parameters"
is missing (MissingNode); change UpdateParameters.scala to safely handle this by
checking the node type or pattern-matching the result of
dashboardJson.path("parameters") (similar to updateAdminParameterFunction) and
fall back to an empty ArrayNode or handle the missing case appropriately before
using currentParametersJson, ensuring no direct cast from MissingNode to
ArrayNode is performed.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/observation/UpdateQuestionDomainJsonFiles.scala`:
- Around line 83-150: The cleanDashboardJson method uses unsafe
.path(...).asInstanceOf[ObjectNode/ArrayNode] casts (e.g., templateTags,
parametersPath, dashCards, paramMappings, questionCard, datasetQuery,
nativeNode, queryNode) which will throw ClassCastException when keys are
missing; update each JSON access to first check presence and type (use get(...)
and guard with node.isObject / node.isArray or node.isMissingNode / node.isNull)
before casting, and when a node is absent supply a safe default (e.g., skip
removal logic or use mapper.createObjectNode()/createArrayNode()) so the code
never casts a MissingNode; ensure you adjust logic around template-tags,
parameters, parameter_mappings and native.query handling to be conditional on
the validated node types.
- Around line 182-206: The method updateQuestionCardJsonValues can return null
when the "questionCard" key is missing because it returns
configObjectNode.get("questionCard"); change the return to the full updated JSON
node (configObjectNode) so callers always receive a non-null JsonNode
(preserving any updates made); specifically, in updateQuestionCardJsonValues
replace the final return of configObjectNode.get("questionCard") with returning
configObjectNode (leaving the exception handler as-is).

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/observation/UpdateStatusJsonFiles.scala`:
- Around line 61-79: The code currently adds the entire parsed JSON root and
then also its "dashCards" child, causing duplicates; in UpdateStatusJsonFiles
(the block handling row.get("config") => Some(queryValue: PGobject)) remove the
addition of the root node (the optJsonNode.foreach(dashcardsArray.add) call that
adds rootNode) and only add the nested dashCards node when present (i.e., keep
the logic that extracts node.path("dashCards") and adds
dashcardsArray.add(dashCardsNode)); update any related logging if needed to
reflect only dashCards being appended and ensure symbols to change include
rootNode, optJsonNode, dashcardsArray and dashCardsNode.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/project/miDashboard/DistrictPage.scala`:
- Around line 60-69: The pattern match on row.get("config") in DistrictPage is
non-exhaustive and can throw MatchError when the "config" key is missing; update
the match in the block that calls
toOption/objectMapper/appendDashCardToDashboard to include a fallback (e.g.,
case None or case _ ) that safely handles missing or non-PGobject values—log a
warning or skip processing for that heading row and avoid calling
queryValue.getValue or appendDashCardToDashboard when config is absent or of the
wrong type.
- Around line 155-162: The SQL is built via string interpolation using
targetedDistrictName and reportConfig which allows injection (e.g., "St.
John's"); change DistrictPage.scala to use a parameterized query through
postgresUtil instead of building map_unique_id_query with string interpolation:
pass targetedDistrictName as a query parameter to postgresUtil.fetchData (or a
new prepared statement method) and avoid directly interpolating reportConfig
unless it is a compile-time/whitelisted identifier—if reportConfig must be
dynamic, validate/whitelist it first; update the block that reads
questionCard/visualizationSettings and replace the raw s"...WHERE
config_item->>'state_name' = '$targetedDistrictName'..." construction with a
parameterized call that binds targetedDistrictName and uses a safe table name.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/project/miDashboard/HomePage.scala`:
- Around line 60-68: The pattern match on row.get("config") in HomePage
currently only handles Some(queryValue: PGobject) and will throw a MatchError
for None or other types; update the match in the block that calls
objectMapper.readTree / toOption / appendDashCardToDashboard so it includes a
safe default case (e.g., case _ => or explicit None/other-type cases) that does
nothing (or logs) to avoid runtime exceptions while keeping the PGobject branch
unchanged; reference symbols: row.get("config"), PGobject,
objectMapper.readTree, toOption, appendDashCardToDashboard.
- Around line 348-355: The code unsafely casts dashboardJson.path("parameters")
to ArrayNode which will throw ClassCastException when "parameters" is missing;
change the creation of currentParametersJson in HomePage (the variable named
currentParametersJson used to build finalParametersJson) to check if
dashboardJson.has("parameters") and use objectMapper.createArrayNode() as a safe
fallback (same pattern used in appendDashCardToDashboard), then rebuild
finalParametersJson from that ArrayNode; also replace any e.printStackTrace() in
this scope with logger.error("...", e) to use the logging framework and include
the exception.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/project/ProcessAdminConstructor.scala`:
- Around line 51-68: The pattern match on row.get("config") in
ProcessAdminConstructor is non-exhaustive and can throw MatchError; update the
match to handle missing or non-PGobject values by adding a case None and a
fallback case (e.g., case Some(other)) that logs or safely ignores unexpected
types. Specifically, modify the match on row.get("config") (the branch that
currently has case Some(queryValue: PGobject)) to include a case None to handle
absent config keys and a default case to handle non-PGobject values, ensuring
you do not assume queryValue is always a PGobject before calling getValue and
objectMapper.readTree.
- Around line 104-111: The method updateQuestionCardJsonValues currently calls
configObjectNode.get("questionCard") which can return null when the key is
missing and thus reintroduce NPEs; change the code to check for the presence and
non-null value (e.g., if (configObjectNode.has("questionCard") &&
configObjectNode.get("questionCard") != null) ...) and return configJson
immediately when the key is absent instead of returning null. Also replace
logger.info with logger.warn in the catch block (logger.warn(s"...", e) or
logger.warn(s"... ${e.getMessage}")) so the exception is logged at the correct
severity; keep the existing behavior of returning configJson on error.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/project/ProcessDistrictConstructor.scala`:
- Around line 64-81: The pattern match on row.get("config") in
ProcessDistrictConstructor currently only handles case Some(queryValue:
PGobject) and will throw a MatchError when config is None or a different type;
update the match to include a fallback (e.g., case None => logger.info("No
'config' column for this row, skipping") and/or case other =>
logger.warn(s"Unexpected config value: $other")) so that rows without a PGobject
are skipped safely and processing continues; ensure the changes are made where
row.get("config") is matched (the match block shown) and that logging is used
rather than letting an exception propagate.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/project/UpdateParameters.scala`:
- Line 67: The unsafe cast to ArrayNode at the site assigning
currentParametersJson using dashboardJson.path("parameters") can throw
ClassCastException when "parameters" is missing; change the assignment in
UpdateParameters (the currentParametersJson initialization) to pattern-match the
result of dashboardJson.path("parameters") and use the ArrayNode when present,
otherwise substitute an empty ArrayNode (e.g., create one from JsonNodeFactory
or similar), so downstream code always gets a safe ArrayNode instance.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/survey/UpdateParameters.scala`:
- Around line 66-68: Replace the unsafe cast of
dashboardJson.path("parameters").asInstanceOf[ArrayNode] with a safe check
similar to UpdateAdminParameterFunction: call
metabaseUtil.getDashboardDetailsById and objectMapper.readTree as before, then
pattern-match the result of dashboardJson.path("parameters") (or use
isArray/isMissing) and if it is an ArrayNode assign it to currentParametersJson,
otherwise use objectMapper.createArrayNode() (or an empty ArrayNode) so you
never cast a MissingNode and avoid ClassCastException; reference the variables
dashboardJson and currentParametersJson and follow the pattern used in
UpdateAdminParameterFunction for locating the exact logic to change.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/user/AddQuestionCards.scala`:
- Around line 23-26: The code is adding the entire dashCards JsonNode as one
nested element into existingDashcards; update the logic around
readJsonFile/dashCardsNode handling in AddQuestionCards so that if dashCardsNode
(value) is an ArrayNode you merge its elements into existingDashcards (use
existingDashcards.addAll(value) or iterate and call existingDashcards.add(elem)
for each element), otherwise add the single node as before; ensure you reference
readJsonFile, dashCardsNode (value) and existingDashcards when implementing the
check and the addAll/iteration.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/user/ProcessAdminConstructor.scala`:
- Around line 128-132: The method updateQuestionCardJsonValues can return null
because configObjectNode.get("questionCard") may be null; change the logic in
updateQuestionCardJsonValues to null-safe check the questionCard node (e.g.,
wrap configObjectNode.get("questionCard") with Option(...) or test for null)
before calling foreach/updates and ensure the method always returns the original
configJson when the questionCard key is absent or when no updates occur;
reference the updateQuestionCardJsonValues method, the
configObjectNode.get("questionCard") call, and the Option(...).foreach usage so
you add the guard and explicit return of configJson instead of allowing a null
to propagate.
- Around line 52-69: The pattern match on row.get("config") in
ProcessAdminConstructor (the block starting with case Some(queryValue:
PGobject)) is non-exhaustive and will throw MatchError when config is None;
update the match to include a case None (or a wildcard case) that safely handles
missing config (e.g., log a debug/info message and skip processing) so the code
mirrors the safe handling used in ProcessDistrictConstructor.scala; ensure you
reference the same variables (row.get("config"), queryValue, jsonString,
rootNode) and only attempt objectMapper.readTree when a PGobject is present.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/user/Utils.scala`:
- Around line 51-62: The JSON built in createDashboard serializes collection_id
and collection_position as strings because they are wrapped in quotes; change
the request body to emit numeric values (e.g., remove quotes around
$collectionId and 1) or build the payload with a JSON builder like ujson.Obj to
ensure collection_id and collection_position are numbers before calling
metabaseUtil.createDashboard; update the construction in createDashboard
accordingly and keep the rest of the function (mapper.readTree(...) and logging)
unchanged.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/userMapping/ProgramServiceFunction.scala`:
- Around line 154-163: The function validateUserRemoval currently computes
membershipId by calling .find(...).get before checking membership existence,
causing a NoSuchElementException; change the logic to first determine
isUserInGroup (or directly use the Option from .find) and only extract
membership_id when the Option is defined, e.g. use the result of
groupDetails.get("members").elements().asScala.find(...) in a match or fold to
safely handle None vs Some and call metabaseUtil.removeFromGroup with the
extracted membershipId only when present; avoid using .get on the Option so the
else branch (user not a member) is reachable and safe.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/userMapping/UserServiceFunction.scala`:
- Around line 538-547: In validateUserRemoval, avoid calling .find(...).get
before checking membership; instead first search for the member element (e.g.,
val memberOpt = groupDetails.get("members").elements().asScala.find(...)), check
memberOpt.isDefined (or use memberOpt match) to determine isUserInGroup, then
only extract membershipId via memberOpt.get.get("membership_id").asInt() when
present and call metabaseUtil.removeFromGroup(membershipId); otherwise execute
the "not a member" branch. This ensures NoSuchElementException is prevented and
keeps logic in validateUserRemoval, referencing the existing isUserInGroup,
membershipId extraction, and metabaseUtil.removeFromGroup usage.
- Around line 99-114: The current debug block in UserServiceFunction
(UserServiceFunction.scala) prints sensitive PII and credentials (password,
email, phone, orgDetails, orgId) using println; remove the plaintext password
print entirely, stop printing raw email/phone/org details and either omit them
or replace with masked values (e.g., show only domain or last 4 chars) before
logging, and switch from println to the project logger (e.g., use the existing
logger instance) for non-sensitive diagnostic info like
entity/eventType/name/status; ensure no variable named password or full PII
appears in any log message or string interpolation.

In `@metabase-jobs/combined-dashboard-creator/src/test/resources/test.conf`:
- Around line 99-107: Replace hardcoded PII and secrets in the metabase config
block (keys: username, password, metabaseApiKey, evidenceBaseUrl, domainName,
url) with non-sensitive placeholders or environment variable references (e.g.,
test@example.com, changeme, 00000000-0000-0000-0000-000000000000 or ${ENV_VAR})
so no real email, password, API key or external service URLs remain in the test
configuration; update any code/tests that read these values to fall back to safe
defaults or ENV variables if needed.

In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/domain/MentoringEvent.scala`:
- Around line 72-87: The parseDate method currently replaces empty or
unparseable date strings with the current time; change parseDate to return null
or Option[Timestamp] (prefer Option[Timestamp]) instead of defaulting to now,
and add proper parsing for the actual input format (e.g., handle fractional
seconds and timezone offsets such as "yyyy-MM-dd HH:mm:ss.SSSSSSXXX" or use
java.time.format.DateTimeFormatter with optional fraction and offset patterns)
by attempting: 1) Instant/ISO parsing with
DateTimeFormatter.ISO_OFFSET_DATE_TIME, 2) a custom DateTimeFormatter that
accepts fractional seconds and offsets, and 3) legacy SimpleDateFormat only as a
last resort; on final failure log a warning via the existing logging facility
and return None (or null if you choose) so callers of parseDate (refer to
parseDate) must handle missing timestamps instead of silently inserting
System.currentTimeMillis().

In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/domain/ProjectEvent.scala`:
- Around line 95-97: The val taskCount eagerly calls tasks.size but tasks is
defined with readOrDefault[List[Map[String, Any]]]("tasks", null), which causes
an NPE when tasks is null; fix by returning a safe empty default from
readOrDefault (e.g., use Nil or List.empty instead of null) or by guarding the
size computation (change taskCount to compute
Option(tasks).map(_.size).getOrElse(0) or make it a lazy val/def that handles
null). Update the symbols tasks, readOrDefault, and taskCount accordingly so
construction cannot throw when "tasks" is absent.

In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/functions/MentoringStreamFunction.scala`:
- Around line 148-163: The checkAndCreateTable function is vulnerable because
tableName is directly interpolated into the SQL string; update
checkAndCreateTable to use a parameterized query via postgresUtil.executeQuery
(avoid string interpolation of tableName) or validate/sanitize tenant-derived
identifiers before constructing tableName (e.g., allow only [A-Za-z0-9_]+) and
reject/escape anything else; ensure you reference the same tableName and
createTableQuery variables and keep using postgresUtil.createTable only after a
safe existence check.
- Around line 225-232: The INSERT into tenantOrgMentorRatingTable in
MentoringStreamFunction is not idempotent and must include an ON CONFLICT
clause; change the insert SQL (the insertOrgMentorRatingQuery) to use ON
CONFLICT (org_id, mentor_id) DO UPDATE SET rating = EXCLUDED.rating,
rating_updated_at = EXCLUDED.rating_updated_at (or the equivalent columns your
schema uses) and ensure the orgMentorRatingParams/order match the prepared
statement so the postgresUtil.executePreparedUpdate call still supplies rating
and rating_updated_at appropriately.

In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/functions/ObservationStreamFunction.scala`:
- Around line 289-319: The current pattern in ObservationStreamFunction (around
statusTable: deleteQuery + postgresUtil.executePreparedDelete followed by
insertStatusQuery + postgresUtil.executePreparedUpdate) is non-atomic and can
cause data loss if the insert fails; change this to perform the delete+insert
inside a single DB transaction (begin/commit/rollback) or replace with a single
atomic upsert (INSERT ... ON CONFLICT ... DO UPDATE) so the operation is
idempotent and safe; apply the same change for the domainTable and questionTable
flows (the corresponding delete/insert calls) and ensure you use postgresUtil's
transactional API or add transaction wrappers around
checkAndCreateTable/checkExistenceOfFilterData +
executePreparedDelete/executePreparedUpdate.
- Around line 340-352: The subdomain branch currently finds matching criteria
but does nothing; implement the same insert/processing logic used in the
flat-criteria branch so subdomain-level criteria are persisted. Inside the
foreach that iterates criteriaEvent.find(_("_id") == criteriaId) (located in the
maybeChildren -> subdomains -> maybeSubCriteria -> criteria loop), build the
event record the same way the None branch does (include assembled fields such as
ruleId/criteriaId, observation details, path/subdomain context,
createdBy/createdAt, and any derived counts/metrics) and call the same
insert/update routine used for flat criteria to persist it; ensure you reference
the existing variables criteriaId, crit (the matched criteria map),
criteriaEvent, maybeChildren, and reuse the flat-criteria insert helper or logic
to avoid duplication.

In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/functions/ProjectStreamFunction.scala`:
- Around line 330-403: The checkAndInsert method builds SQL by
string-interpolating event-derived values (e.g., targetedId, entityType,
entityName) which enables SQL injection; replace all such interpolated queries
in checkAndInsert (including the EXISTS check using postgresUtil.fetchData, the
admin insert using postgresUtil.insertData, the getEntityNameQuery, and the
default insert) with parameterized/prepared statements using
postgresUtil.executePreparedUpdate or a prepared fetch variant, passing
targetedId/entityType/entityName as parameters instead of embedding them in the
SQL string; keep the existing upsert for solutions as a model
(executePreparedUpdate), update calls to use parameter lists and adjust how
results are read (e.g., parameterized WHERE entity_id = ?), and ensure
dashboardData.put(dashboardKey, ...) behavior remains the same after switching
to prepared statements.

In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/functions/SurveyStreamFunction.scala`:
- Around line 45-46: The code is using SurveyEvent.solutionId directly to build
PostgreSQL table identifiers (surveyQuestionTable and surveyStatusTable), which
is a SQL injection risk; update the code in SurveyStreamFunction to validate
and/or sanitize SurveyEvent.solutionId before any use: implement a helper like
validateIdentifier or sanitizeIdentifier that checks the value against a strict
allowlist regex (e.g., /^[A-Za-z0-9_]+$/) and either rejects the event (log and
skip processing) or returns a cleaned identifier (replace disallowed chars with
'_'), then use that validated/sanitized value when constructing
surveyQuestionTable and surveyStatusTable; ensure all downstream uses of
SurveyEvent.solutionId in DDL/DML are replaced with the validated/sanitized
identifier and log an error when an invalid id is encountered.

In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/functions/UserStreamFunction.scala`:
- Around line 42-211: The processElement method in UserStreamFunction currently
has no top-level error handling, so wrap the entire body of processElement (the
code inside override def processElement) in a try-catch that catches Throwable,
logs the full error and the offending event (use event.userId and event/JSON),
increments/emits a failure metric, and optionally forwards the event to the same
dead-letter/error handling used by ObservationStreamFunction; ensure you
reference processElement and use the same logger/metrics pattern as
ObservationStreamFunction to avoid crashing the Flink operator.
- Line 179: The call to processOrgRolesTable uses organizationsId.toInt which
will throw NumberFormatException for non-numeric org IDs; update the code where
organizationsId is parsed (the call site before processOrgRolesTable and the
organizationsId local value) to safely parse the string (e.g., use toIntOption
or try/catch) and handle invalid values by logging a warning and either skipping
the record or providing a safe fallback (null/None/default id) before invoking
processOrgRolesTable; ensure any downstream code that expects an Int can accept
the fallback or short-circuit processing to avoid crashing the Flink operator.

In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/task/UnifiedStreamTask.scala`:
- Around line 181-186: The default case in UnifiedStreamTask currently calls
runJob(config, kafkaConnector) which creates its own Flink execution environment
via FlinkUtil.getExecutionContext(config) and calls env.execute(...), and then
control returns to the original env.execute(...) causing a double submission;
update the default branch (case _) to either log a warning and return (or throw
an informative exception) instead of calling runJob, or refactor so runJob is
invoked only on the primary execution path (i.e., ensure only one Flink
environment is created/executed by adjusting UnifiedStreamTask's case handling
and removing the secondary runJob call that duplicates
FlinkUtil.getExecutionContext and env.execute).

Comment on lines +35 to +39
def kafkaJobRequestSource[T <: JobRequest](kafkaTopic: String, groupId: String)(implicit m: Manifest[T]): SourceFunction[T] = {
val properties = config.kafkaConsumerProperties
properties.setProperty("group.id", groupId)
new FlinkKafkaConsumer[T](kafkaTopic, new JobRequestDeserializationSchema[T], properties)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Mutating shared kafkaConsumerProperties corrupts config for all other consumers.

config.kafkaConsumerProperties likely returns the same mutable java.util.Properties instance. Calling setProperty("group.id", groupId) on it permanently overwrites the group ID for every subsequent consumer created from this FlinkKafkaConnector, including the single-arg kafkaJobRequestSource, kafkaMapSource, kafkaStringSource, etc.

Clone the properties before mutating:

Proposed fix
 def kafkaJobRequestSource[T <: JobRequest](kafkaTopic: String, groupId: String)(implicit m: Manifest[T]): SourceFunction[T] = {
-    val properties = config.kafkaConsumerProperties
+    val properties = config.kafkaConsumerProperties.clone().asInstanceOf[java.util.Properties]
     properties.setProperty("group.id", groupId)
     new FlinkKafkaConsumer[T](kafkaTopic, new JobRequestDeserializationSchema[T], properties)
 }
#!/bin/bash
# Verify whether kafkaConsumerProperties returns a shared mutable instance
ast-grep --pattern $'def kafkaConsumerProperties$$$'
🤖 Prompt for AI Agents
In
`@jobs-core/src/main/scala/org/shikshalokam/job/connector/FlinkKafkaConnector.scala`
around lines 35 - 39, The code mutates a shared java.util.Properties from
config.kafkaConsumerProperties in kafkaJobRequestSource (and similarly in other
source factories), which corrupts group.id for all consumers; fix by creating a
defensive copy of the Properties before calling setProperty("group.id", groupId)
(e.g., new Properties() and putAll(config.kafkaConsumerProperties)) and then
pass that copied Properties to the FlinkKafkaConsumer; update
kafkaJobRequestSource and the other factory methods (kafkaMapSource,
kafkaStringSource, single-arg kafkaJobRequestSource variants) to use the copied
Properties instead of mutating config.kafkaConsumerProperties directly.

Comment on lines 99 to 107
metabase {
url = "http://localhost:3000/api"
username = "vivek@shikshalokam.org"
password = "Test@1234"
database = "test"
domainName = "http://localhost:3000/dashboard/"
metabaseApiKey = "d3f4ult-api-key"
evidenceBaseUrl = "https://elevate-api.sunbirdsaas.com/project/v1/cloud-services/files/download?file="
}
Copy link

@coderabbitai coderabbitai bot Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Hardcoded Metabase credentials and real email address in source-controlled config.

This config ships with the artifact and contains:

  • A real email address (vivek@shikshalokam.org) — PII in source control
  • A plaintext password (Test@1234)
  • An API key (d3f4ult-api-key)

Even for dev/test environments, credentials and PII should not be committed to version control. Use environment variable substitution or external secret management.

As per coding guidelines, "No credential exposure."

🤖 Prompt for AI Agents
In
`@metabase-jobs/combined-dashboard-creator/src/main/resources/combined-dashboard-creator.conf`
around lines 99 - 107, The config currently hardcodes sensitive values
(metabase.username, metabase.password, metabase.metabaseApiKey and related
fields like domainName/database/evidenceBaseUrl); remove the real email and
secrets and change these keys to load from environment variables or a secret
manager (e.g., replace username/password/metabaseApiKey with env-based
placeholders and fallback checks), update the code that reads this config to
validate presence (throw helpful error if missing) and document the required env
vars so no PII or plaintext credentials remain in the committed
combined-dashboard-creator.conf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the current state of the combined-dashboard-creator.conf file
# to verify if credentials have been corrected

FILE="metabase-jobs/combined-dashboard-creator/src/main/resources/combined-dashboard-creator.conf"

if [ -f "$FILE" ]; then
  echo "=== Current metabase section in $FILE ==="
  # Extract the metabase section (from "metabase {" to the closing brace)
  sed -n '/^metabase {/,/^}/p' "$FILE"
  echo ""
  echo "=== Checking for hardcoded credentials ==="
  # Check for the specific problematic values
  grep -n "vivek@shikshalokam.org\|Test@1234\|d3f4ult-api-key" "$FILE" || echo "No hardcoded credentials found"
else
  echo "File not found: $FILE"
fi

Length of output: 607


@prashanthShiksha, I've verified the file and the hardcoded credentials are still present in combined-dashboard-creator.conf. The file currently contains:

username = "vivek@shikshalokam.org"     # Line 101
password = "Test@1234"                   # Line 102
metabaseApiKey = "d3f4ult-api-key"      # Line 105

These need to be replaced with environment variable references, for example:

metabase {
  url = ${?METABASE_URL}
  username = ${?METABASE_USERNAME}
  password = ${?METABASE_PASSWORD}
  database = ${?METABASE_DATABASE}
  domainName = ${?METABASE_DOMAIN_NAME}
  metabaseApiKey = ${?METABASE_API_KEY}
  evidenceBaseUrl = ${?EVIDENCE_BASE_URL}
}

The ${?VAR} syntax in Typesafe Config makes the variable optional. You should then:

  1. Document these required environment variables in your README or deployment docs
  2. Update the code that reads this config to validate presence and fail fast with helpful error messages if critical values are missing
  3. Remove the real email address and credentials from version control

This is a critical security issue per your coding guidelines: "No credential exposure."


🧠 Learnings used
Learnt from: prashanthShiksha
Repo: ELEVATE-Project/data-pipeline PR: 139
File: Documentation/migration-scripts/alter-solution-table.sh:3-3
Timestamp: 2025-10-30T05:26:19.945Z
Learning: In the ELEVATE-Project/data-pipeline repository, the maintainer prashanthShiksha prefers to use a common-config.env file to store configuration values for migration scripts, rather than hardcoding them in the script files.

Learnt from: prashanthShiksha
Repo: ELEVATE-Project/data-pipeline PR: 124
File: metabase-jobs/config-data-loader/projectJson/Observation-Domain/Domain-Report/json/table/1_domain_criteria_report.json:9-9
Timestamp: 2025-09-04T14:25:42.797Z
Learning: In the ELEVATE-Project/data-pipeline repository, when the maintainer prashanthShiksha confirms that entity_param and criteria_param field ID configurations have been corrected in Metabase template-tags, this indicates the intended behavior has been properly implemented for the customized entity type filter creation logic.

Comment on lines +210 to +272
def getTheTableId(databaseId: Int, tableName: String, metabaseUtil: MetabaseUtil, metabasePostgresUtil: PostgresUtil, metabaseApiKey: String): Int = {
storedTableIds.get((databaseId, tableName)) match {
case Some(tableId) =>
logger.info(s"[CACHE-HIT] Table ID for '$tableName' in DB $databaseId = $tableId")
tableId
case None =>
val tableQuery =
s"SELECT id FROM metabase_table WHERE name = '$tableName' AND db_id = $databaseId;"
val tableIdOpt = metabasePostgresUtil.fetchData(tableQuery) match {
case List(map: Map[_, _]) =>
map.get("id").flatMap(id => scala.util.Try(id.toString.toInt).toOption)
case _ => None
}
val tableId = tableIdOpt.getOrElse {
logger.warn(s"[WARN] Table '$tableName' not found in DB $databaseId. Trying to sync with Metabase API.")
val tableJson = metabaseUtil.syncNewTable(databaseId, tableName, metabaseApiKey)
val newTableId = tableJson("id").num.toInt
logger.info(s"[SYNCED] Table '$tableName' synced with Metabase. New table_id = $newTableId")
newTableId
}
storedTableIds.put((databaseId, tableName), tableId)
logger.info(s"[INFO] Using table_id = $tableId for table '$tableName' in DB $databaseId")
tableId
}
}

def getTheColumnId(databaseId: Int, tableName: String, columnName: String, metabaseUtil: MetabaseUtil, metabasePostgresUtil: PostgresUtil, metabaseApiKey: String, metaTableQuery: String): Int = {
try {
val tableId = getTheTableId(databaseId, tableName, metabaseUtil, metabasePostgresUtil, metabaseApiKey)
storedColumnIds.get((tableId, columnName)) match {
case Some(columnId) =>
logger.info(s"[CACHE-HIT] Column '$columnName' found in table '$tableName' (tableId: $tableId) = $columnId")
columnId
case None =>
val columnQuery = s"SELECT id FROM metabase_field WHERE table_id = $tableId AND name = '$columnName';"
val columnIdOpt = metabasePostgresUtil.fetchData(columnQuery) match {
case List(map: Map[_, _]) =>
map.get("id").flatMap(id => scala.util.Try(id.toString.toInt).toOption)
case _ => None
}
val columnId = columnIdOpt.getOrElse(-1)
if (columnId != -1) {
storedColumnIds.put((tableId, columnName), columnId)
logger.info(s"[INFO] Using column_id = $columnId for column '$columnName' in table '$tableName' (tableId: $tableId)")
columnId
} else {
val errorMessage = s"Column '$columnName' not found in table '$tableName' (tableId: $tableId)"
val escapedError = errorMessage.replace("'", "''")
val updateTableQuery = metaTableQuery.replace("'errorMessage'", s"'$escapedError'")
postgresUtil.insertData(updateTableQuery)
logger.warn(s"[WARN] $errorMessage")
-1
}
}
} catch {
case e: Exception =>
val escapedError = e.getMessage.replace("'", "''")
val updateTableQuery = metaTableQuery.replace("'errorMessage'", s"'$escapedError'")
postgresUtil.insertData(updateTableQuery)
logger.error(s"[ERROR] Failed to get column ID for '$columnName' in table '$tableName': ${e.getMessage}")
-1
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

SQL injection risk: all SQL queries use string interpolation with untrusted input.

getTheTableId (line 217) and getTheColumnId (line 244) build SQL queries via string interpolation with values like tableName and columnName that derive from event data (e.g., tenantCode, orgName). If any of these contain a single quote or other SQL metacharacters, it enables SQL injection against the Metabase internal database.

This pattern is repeated across all dashboard function files (ObservationMetabaseDashboardFunction, ProjectMetabaseDashboardFunction, UserMetabaseDashboardFunction) and their constructor objects.

Use parameterized queries (PreparedStatement / ? placeholders) in PostgresUtil instead of string interpolation for all user-derived values.

As per coding guidelines, metabase-jobs/**: "Correct DB connections & cursor handling" and "No credential exposure" — while not credential exposure per se, SQL injection is a critical security risk that must be addressed.

🤖 Prompt for AI Agents
In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/mentoring/MentoringMetabaseDashboardFunction.scala`
around lines 210 - 272, The queries in getTheTableId and getTheColumnId are
vulnerable to SQL injection because tableName, columnName and similar values are
interpolated into SQL strings; change these methods to use parameterized queries
in PostgresUtil (e.g., add/use a fetchDataWithParams or prepareAndFetch method
that accepts SQL with ? placeholders and a Seq[Any] of parameters) and replace
usages in getTheTableId (the SELECT id FROM metabase_table ...) and
getTheColumnId (the SELECT id FROM metabase_field ...) to call that
parameterized API with databaseId, tableName and columnName as bound params;
also update the metaTableQuery handling to use parameter binding (or a safe
update method) instead of string replace for errorMessage and ensure any
postgresUtil.insertData variant accepts parameters to avoid concatenating
escaped strings.

Comment on lines +330 to +403
def checkAndInsert(entityType: String, targetedId: String, dashboardData: java.util.HashMap[String, String], dashboardKey: String): Unit = {
val query = s"SELECT EXISTS (SELECT 1 FROM ${config.dashboard_metadata} WHERE entity_id = '$targetedId') AS is_${entityType}_present"
val result = postgresUtil.fetchData(query)

result.foreach { row =>
row.get(s"is_${entityType}_present") match {
case Some(isPresent: Boolean) if isPresent =>
logger.info(s"$entityType details already exist.")
case _ =>
if (entityType == "admin") {
val insertQuery = s"INSERT INTO ${config.dashboard_metadata} (entity_type, entity_name, entity_id) VALUES ('$entityType', 'Admin', '$targetedId')"
val affectedRows = postgresUtil.insertData(insertQuery)
logger.info(s"Inserted Admin details. Affected rows: $affectedRows")
dashboardData.put(dashboardKey, "1")
} else {
val (entityColumn, sourceTable) = entityType match {
case "program" => (s"${entityType}_name", config.solutions)
case "solution" => ("name", config.solutions)
case "state" => (s"${entityType}_name", config.projects)
case "district" => (s"${entityType}_name", config.projects)
}

val getEntityNameQuery =
s"""
|SELECT DISTINCT $entityColumn AS ${entityType}_name
|FROM $sourceTable
|WHERE ${entityType}_id = '$targetedId'
|""".stripMargin.replaceAll("\n", " ")

val result = postgresUtil.fetchData(getEntityNameQuery)
result.foreach { id =>
val entityName = id.get(s"${entityType}_name").map(_.toString).getOrElse("")

if (entityType == "solution") {
// Special insert/upsert logic for solution only
val upsertQuery =
s"""INSERT INTO ${config.dashboard_metadata} (
| entity_type, entity_name, entity_id,
| report_type, is_rubrics, parent_name, linked_to
|) VALUES (?, ?, ?, ?, ?, ?, ?)
|ON CONFLICT (entity_id) DO UPDATE SET
| entity_type = EXCLUDED.entity_type,
| entity_name = EXCLUDED.entity_name,
| report_type = EXCLUDED.report_type,
| is_rubrics = EXCLUDED.is_rubrics,
| parent_name = EXCLUDED.parent_name,
| linked_to = EXCLUDED.linked_to
|""".stripMargin

val params = Seq(
entityType, entityName, targetedId,
"improvementProject", false, event.entityType, event.programId
)
postgresUtil.executePreparedUpdate(upsertQuery, params, config.dashboard_metadata, targetedId)
logger.info(s"Inserted [$entityName : $targetedId] with reportType=improvementProject, isRubric=false, parent_name=${event.entityType}, linked_to=${event.programId}.")
} else {
// Default logic for program or others
val insertQuery =
s"""INSERT INTO ${config.dashboard_metadata} (
| entity_type, entity_name, entity_id
|) VALUES ('$entityType', '$entityName', '$targetedId')
|ON CONFLICT (entity_id) DO NOTHING
|""".stripMargin.replaceAll("\n", " ")

val affectedRows = postgresUtil.insertData(insertQuery)
logger.info(s"Inserted [$entityName : $targetedId] with default metadata. Affected rows: $affectedRows")
}

dashboardData.put(dashboardKey, targetedId)
}
}
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

SQL injection in checkAndInsert — event-derived values interpolated directly into SQL.

Multiple queries in checkAndInsert use string interpolation with event-sourced data:

  • Line 331: WHERE entity_id = '$targetedId'
  • Line 340: VALUES ('$entityType', 'Admin', '$targetedId')
  • Line 390: VALUES ('$entityType', '$entityName', '$targetedId')

targetedId comes from event fields like event.stateId, event.programId, event.solutionId etc. If any of these contain a single quote or malicious payload, this breaks or exploits the query. Use prepared statements consistently (as done for the upsert queries above).

As per coding guidelines, stream-jobs/**: "Kafka consumer safety (deserialization errors, dead-letter handling)" — untrusted event data must not be interpolated into SQL.

🤖 Prompt for AI Agents
In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/functions/ProjectStreamFunction.scala`
around lines 330 - 403, The checkAndInsert method builds SQL by
string-interpolating event-derived values (e.g., targetedId, entityType,
entityName) which enables SQL injection; replace all such interpolated queries
in checkAndInsert (including the EXISTS check using postgresUtil.fetchData, the
admin insert using postgresUtil.insertData, the getEntityNameQuery, and the
default insert) with parameterized/prepared statements using
postgresUtil.executePreparedUpdate or a prepared fetch variant, passing
targetedId/entityType/entityName as parameters instead of embedding them in the
SQL string; keep the existing upsert for solutions as a model
(executePreparedUpdate), update calls to use parameter lists and adjust how
results are read (e.g., parameterized WHERE entity_id = ?), and ensure
dashboardData.put(dashboardKey, ...) behavior remains the same after switching
to prepared statements.

Comment on lines +45 to +46
val surveyQuestionTable = SurveyEvent.solutionId
val surveyStatusTable = s""""${SurveyEvent.solutionId}_survey_status""""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

solutionId used directly as a PostgreSQL table name — high-risk injection vector.

Line 45 assigns SurveyEvent.solutionId directly as a table name:

val surveyQuestionTable = SurveyEvent.solutionId
val surveyStatusTable = s""""${SurveyEvent.solutionId}_survey_status""""

If solutionId contains characters like "; DROP TABLE --, this becomes a SQL injection attack surface when used in DDL and DML statements throughout the method. At minimum, validate that solutionId matches an expected pattern (e.g., ^[a-zA-Z0-9_]+$).

As per coding guidelines, stream-jobs/**: "Kafka consumer safety" — event-derived data used as table identifiers must be validated/sanitized.

🤖 Prompt for AI Agents
In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/functions/SurveyStreamFunction.scala`
around lines 45 - 46, The code is using SurveyEvent.solutionId directly to build
PostgreSQL table identifiers (surveyQuestionTable and surveyStatusTable), which
is a SQL injection risk; update the code in SurveyStreamFunction to validate
and/or sanitize SurveyEvent.solutionId before any use: implement a helper like
validateIdentifier or sanitizeIdentifier that checks the value against a strict
allowlist regex (e.g., /^[A-Za-z0-9_]+$/) and either rejects the event (log and
skip processing) or returns a cleaned identifier (replace disallowed chars with
'_'), then use that validated/sanitized value when constructing
surveyQuestionTable and surveyStatusTable; ensure all downstream uses of
SurveyEvent.solutionId in DDL/DML are replaced with the validated/sanitized
identifier and log an error when an invalid id is encountered.

Comment on lines +42 to +211
override def processElement(event: UserEvent, context: ProcessFunction[UserEvent, UserEvent]#Context, metrics: Metrics): Unit = {

println(s"***************** Start of Processing the User Event with User Id = ${event.userId} *****************")

val userId = event.userId
val tenantCode = event.tenantCode
val username = event.username
val name = event.name
val status = event.status
val isDeleted = event.isDeleted
val createdBy = event.createdBy
val createdAt = event.createdAt
val updatedAt = event.updatedAt
val userProfileOneId = event.userProfileOneId
val userProfileOneName = event.userProfileOneName
val userProfileOneExternalId = event.userProfileOneExternalId
val userProfileTwoId = event.userProfileTwoId
val userProfileTwoName = event.userProfileTwoName
val userProfileTwoExternalId = event.userProfileTwoExternalId
val userProfileThreeId = event.userProfileThreeId
val userProfileThreeName = event.userProfileThreeName
val userProfileThreeExternalId = event.userProfileThreeExternalId
val userProfileFourId = event.userProfileFourId
val userProfileFourName = event.userProfileFourName
val userProfileFourExternalId = event.userProfileFourExternalId
val userProfileFiveId = event.userProfileFiveId
val userProfileFiveName = event.userProfileFiveName
val userProfileFiveExternalId = event.userProfileFiveExternalId
val professionalRoleName = event.professionalRoleName
val professionalRoleId = event.professionalRoleId
val professionalSubroles = event.professionalSubroles
val tenantUserMetadataTable: String = s""""${tenantCode}_users_metadata""""
val orgRolesTable: String = s""""${tenantCode}_org_roles""""
val tenantUserTable: String = s""""${tenantCode}_users""""
val userMetrics: String = config.userMetrics
val eventType = event.eventType
val userDashboardReportType = "User Dashboard"
val mentoringDashboardReportType = "Mentoring"

println(s"userId: $userId")
println(s"tenantCode: $tenantCode")
println(s"username: $username")
println(s"name: $name")
println(s"status: $status")
println(s"isDeleted: $isDeleted")
println(s"createdBy: $createdBy")
println(s"createdAt: $createdAt")
println(s"updatedAt: $updatedAt")
println(s"userProfileOneId : $userProfileOneId")
println(s"userProfileOneName : $userProfileOneName")
println(s"userProfileOneExternalId : $userProfileOneExternalId")
println(s"userProfileTwoId : $userProfileTwoId")
println(s"userProfileTwoName : $userProfileTwoName")
println(s"userProfileTwoExternalId : $userProfileTwoExternalId")
println(s"userProfileThreeId : $userProfileThreeId")
println(s"userProfileThreeName : $userProfileThreeName")
println(s"userProfileThreeExternalId : $userProfileThreeExternalId")
println(s"userProfileFourId : $userProfileFourId")
println(s"userProfileFourName : $userProfileFourName")
println(s"userProfileFourExternalId : $userProfileFourExternalId")
println(s"userProfileFiveId : $userProfileFiveId")
println(s"userProfileFiveName : $userProfileFiveName")
println(s"userProfileFiveExternalId : $userProfileFiveExternalId")

val userDashboardFilters: List[Map[String, String]] = List(
Map(
"user_profile_one_name" -> userProfileOneName,
"user_profile_two_name" -> userProfileTwoName,
"user_profile_three_name" -> userProfileThreeName,
"user_profile_four_name" -> userProfileFourName,
"user_profile_five_name" -> userProfileFiveName
)
)

val userActivityDashboardFilters: List[Map[String, String]] = List(
Map(
"tenant_code" -> tenantCode
)
)

checkAndCreateTable(tenantUserMetadataTable, config.createTenantUserMetadataTable.replace("@tenantTable", tenantUserMetadataTable))
checkAndCreateTable(tenantUserTable, config.createTenantUserTable.replace("@usersTable", tenantUserTable))
checkAndCreateTable(userMetrics, config.createUserMetricsTable)

if (tenantCode.nonEmpty) {
val createTenantTable = config.createTenantUserMetadataTable.replace("@tenantTable", tenantUserMetadataTable)
checkAndCreateTable(tenantUserMetadataTable, createTenantTable)

val createOrgRolesTable = config.createOrgRolesTable.replace("@orgRolesTable", orgRolesTable)
checkAndCreateTable(orgRolesTable, createOrgRolesTable)

if (eventType == "update" || eventType == "bulk-update" || eventType == "create" || eventType == "bulk-create") {
/** checking existance of filter data for tenant user table */
val userDashboardFiltersList = ListBuffer[String]()
userDashboardFilters.foreach { filterMap =>
filterMap.foreach { case (key, value) =>
userDashboardFiltersList += checkIfValueExists(tenantUserTable.replace("\"", ""), key, value)
}
}
processUsers(tenantUserTable, userId)
val finalUserDashboardFilterList: List[String] = userDashboardFiltersList.toList
checkExistenceOfDataAndPushMessageToKafka(userDashboardReportType, finalUserDashboardFilterList, context, tenantUserTable, event.tenantCode)
checkExistenceOfDataAndPushMessageToKafka(mentoringDashboardReportType, finalUserDashboardFilterList, context, tenantUserTable, event.tenantCode, true)
val resultList = ListBuffer[String]()
resultList += checkIfValueExists(tenantUserMetadataTable.replace("\"", ""), "attribute_value", professionalRoleName)
processUserMetadata(tenantUserMetadataTable, userId, "Professional Role", professionalRoleName, professionalRoleId)
event.organizations.foreach { org =>
println(s"Organization ID: ${org.get("id")}")
val organizationsName = org.get("name").map(_.toString).getOrElse("")
val organizationsId = org.get("id").map(_.toString).getOrElse("")
/**
* Processing for Orgs data
*/
if (organizationsName.nonEmpty && organizationsId.nonEmpty) {
println(s"Upserting for attribute_code: Organizations, attribute_value: $organizationsName, attribute_label: $organizationsId")
resultList += checkIfValueExists(tenantUserMetadataTable.replace("\"", ""), "attribute_value", organizationsName)
processUserMetadata(tenantUserMetadataTable, userId, "Organizations", organizationsName, organizationsId)

val orgDashboardFiltersList = ListBuffer[String]()
orgDashboardFiltersList += checkIfValueExists(orgRolesTable, "org_name", organizationsName)
val finalOrgDashboardFilterList: List[String] = orgDashboardFiltersList.toList
checkExistenceOfDataAndPushMessageToKafka(mentoringDashboardReportType, finalOrgDashboardFilterList, context, orgRolesTable, event.tenantCode, true)
} else {
println("Org name or Org Id is empty")
}

/**
* Processing for Roles data
*/
val roles = org.get("roles").map(_.asInstanceOf[List[Map[String, Any]]]).getOrElse(List.empty)
if (roles.nonEmpty) {
val rolePairs = extractUserRolesPerRow(roles)
val subrolePairs = extractProfessionalSubrolesPerRow(professionalSubroles)
rolePairs.foreach { case (userRoleName, userRoleId) =>
resultList += checkIfValueExists(tenantUserMetadataTable.replace("\"", ""), "attribute_value", userRoleName)
println(s"Upserting for attribute_code: Platform Role, attribute_value: $userRoleName, attribute_label: $userRoleId")
processUserMetadata(tenantUserMetadataTable, userId, "Platform Role", userRoleName, userRoleId)
processOrgRolesTable(orgRolesTable, userId, organizationsId.toInt, organizationsName, userRoleId.toInt, userRoleName)
if (subrolePairs.nonEmpty) {
subrolePairs.foreach { case (professionalSubrolesName, professionalSubrolesId) =>
resultList += checkIfValueExists(tenantUserMetadataTable.replace("\"", ""), "attribute_value", professionalSubrolesName)
println(s"Upserting for attribute_code: Professional Subroles, attribute_value: $professionalSubrolesName, attribute_label: $professionalSubrolesId")
processUserMetadata(tenantUserMetadataTable, userId, "Professional Subroles", professionalSubrolesName, professionalSubrolesId)
}
} else {
println(s"No professional subroles found for user $userId in organization $organizationsId")
}
}
} else {
println("Roles object is empty")
}
}

val finalResultList: List[String] = resultList.toList
checkExistenceOfDataAndPushMessageToKafka(userDashboardReportType, finalResultList, context, tenantUserMetadataTable, event.tenantCode)
} else if (eventType == "delete") {
deleteData(tenantUserTable, userId)
}
val userMetricDashboardFiltersList = ListBuffer[String]()
userActivityDashboardFilters.foreach { filterMap =>
filterMap.foreach { case (key, value) =>
userMetricDashboardFiltersList += checkIfValueExists(userMetrics.replace("\"", ""), key, value)
}
}
userMetric(tenantUserTable)
val finalUserMetricDashboardFiltersList: List[String] = userMetricDashboardFiltersList.toList
checkExistenceOfDataAndPushMessageToKafka(userDashboardReportType, finalUserMetricDashboardFiltersList, context, userMetrics, event.tenantCode)
} else {
println("Tenant Code is Empty")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: processElement lacks top-level error handling — any exception will crash the Flink operator.

Unlike ObservationStreamFunction which wraps processElement in a try-catch, this function has no error handling. A single malformed event (e.g., bad org ID, null field) will crash the entire Flink task and stop processing all subsequent events.

Proposed fix: wrap body in try-catch
   override def processElement(event: UserEvent, context: ProcessFunction[UserEvent, UserEvent]#Context, metrics: Metrics): Unit = {
+    try {
 
     println(s"...")
     ...
     println(s"...")
+    } catch {
+      case e: Exception =>
+        logger.error(s"Error processing user event: ${event.userId}", e)
+    }
   }

As per coding guidelines, "Fault tolerance & logging of failed events".

🤖 Prompt for AI Agents
In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/functions/UserStreamFunction.scala`
around lines 42 - 211, The processElement method in UserStreamFunction currently
has no top-level error handling, so wrap the entire body of processElement (the
code inside override def processElement) in a try-catch that catches Throwable,
logs the full error and the offending event (use event.userId and event/JSON),
increments/emits a failure metric, and optionally forwards the event to the same
dead-letter/error handling used by ObservationStreamFunction; ensure you
reference processElement and use the same logger/metrics pattern as
ObservationStreamFunction to avoid crashing the Flink operator.

resultList += checkIfValueExists(tenantUserMetadataTable.replace("\"", ""), "attribute_value", userRoleName)
println(s"Upserting for attribute_code: Platform Role, attribute_value: $userRoleName, attribute_label: $userRoleId")
processUserMetadata(tenantUserMetadataTable, userId, "Platform Role", userRoleName, userRoleId)
processOrgRolesTable(orgRolesTable, userId, organizationsId.toInt, organizationsName, userRoleId.toInt, userRoleName)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

organizationsId.toInt will throw NumberFormatException if the org ID is not a valid integer.

organizationsId is extracted as a string from the event map. If it contains a UUID, empty string, or non-numeric value, .toInt will crash and — given there's no try-catch — take down the Flink operator.

Proposed fix
-              processOrgRolesTable(orgRolesTable, userId, organizationsId.toInt, organizationsName, userRoleId.toInt, userRoleName)
+              val orgIdInt = scala.util.Try(organizationsId.toInt).getOrElse {
+                logger.warn(s"Non-integer org ID: $organizationsId for user $userId, skipping org roles")
+                return
+              }
+              val roleIdInt = scala.util.Try(userRoleId.toInt).getOrElse {
+                logger.warn(s"Non-integer role ID: $userRoleId for user $userId, skipping org roles")
+                return
+              }
+              processOrgRolesTable(orgRolesTable, userId, orgIdInt, organizationsName, roleIdInt, userRoleName)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
processOrgRolesTable(orgRolesTable, userId, organizationsId.toInt, organizationsName, userRoleId.toInt, userRoleName)
val orgIdInt = scala.util.Try(organizationsId.toInt).getOrElse {
logger.warn(s"Non-integer org ID: $organizationsId for user $userId, skipping org roles")
return
}
val roleIdInt = scala.util.Try(userRoleId.toInt).getOrElse {
logger.warn(s"Non-integer role ID: $userRoleId for user $userId, skipping org roles")
return
}
processOrgRolesTable(orgRolesTable, userId, orgIdInt, organizationsName, roleIdInt, userRoleName)
🤖 Prompt for AI Agents
In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/functions/UserStreamFunction.scala`
at line 179, The call to processOrgRolesTable uses organizationsId.toInt which
will throw NumberFormatException for non-numeric org IDs; update the code where
organizationsId is parsed (the call site before processOrgRolesTable and the
organizationsId local value) to safely parse the string (e.g., use toIntOption
or try/catch) and handle invalid values by logging a warning and either skipping
the record or providing a safe fallback (null/None/default id) before invoking
processOrgRolesTable; ensure any downstream code that expects an Int can accept
the fallback or short-circuit processing to avoid crashing the Flink operator.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
metabase-jobs/mentoring-dashboard-creator/src/main/resources/metabase-mentoring-dashboard.conf (1)

11-29: ⚠️ Potential issue | 🔴 Critical

Hardcoded credentials and PII in source-controlled config.

This file contains plaintext Postgres credentials (Lines 14–15), a real email address (Line 26), a plaintext password (Line 27), and an API key (Line 29). These ship with the artifact and get committed to version control.

Use environment variable substitution (HOCON ${?ENV_VAR} syntax) or external secret management, consistent with the approach that was already addressed for combined-dashboard-creator.conf.

Proposed fix
 postgres{
-  host = "172.17.0.1"
-  port = "5432"
-  username = "postgres"
-  password = "postgres"
-  database = "test"
+  host = ${?POSTGRES_HOST}
+  port = ${?POSTGRES_PORT}
+  username = ${?POSTGRES_USERNAME}
+  password = ${?POSTGRES_PASSWORD}
+  database = ${?POSTGRES_DATABASE}
   metabaseDb= "metabase"
   tables = {
     dashboardMetadataTable = ${job.env}"_dashboard_metadata"
     reportConfigTable = ${job.env}"_report_config"
     }
 }
 
 metabase {
-  url = "http://172.17.0.1:3000/api"
-  username = "vivek@shikshalokam.org"
-  password = "Test@1234"
-  database = "test"
-  metabaseApiKey = "d3f4ult-api-key"
+  url = ${?METABASE_URL}
+  username = ${?METABASE_USERNAME}
+  password = ${?METABASE_PASSWORD}
+  database = ${?METABASE_DATABASE}
+  metabaseApiKey = ${?METABASE_API_KEY}
 }

As per coding guidelines, metabase-jobs/**: "No credential exposure".

🤖 Fix all issues with AI agents
In `@jobs-core/src/main/resources/base-config.conf`:
- Line 4: The base-config.conf defines a job-specific Kafka consumer group
("combined-dashboard-creator-group") which is unsafe as a global default; change
the default key/value (groupId) in base-config.conf to a generic placeholder
like "default-consumer-group" so jobs must intentionally override kafka.groupId
in their job-specific configs, and verify any code reading the config still uses
the same key name (groupId / kafka.groupId) so no lookup changes are needed.

In
`@metabase-jobs/combined-dashboard-creator/src/main/resources/combined-dashboard-creator.conf`:
- Around line 113-133: The notify block currently embeds cleartext passwords in
email.template and sms.template via {password} and hardcodes the notify API URL
in api.url; update email.template and sms.template to remove {password} and
instead include a secure password-reset or one-time-token placeholder (e.g.,
{resetLink} or {tempToken}) that your user-creation flow will generate, and
modify the api.url to be externalized (replace the literal URL with a config/env
placeholder such as ${NOTIFY_API_URL} or a top-level notify.apiUrl property) so
the endpoint isn’t hardcoded in this config file.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/mentoring/MentoringMetabaseDashboardFunction.scala`:
- Around line 134-136: The call to
ProcessTenantConstructor.ProcessAndUpdateJsonFiles has a syntax error because
the ternary-like expression `if (orgId.nonEmpty) orgId.toInt else` is missing
the else branch; update that argument to provide a default integer (e.g., `0`)
or a safe conversion (e.g., use toIntOption.getOrElse(0) if available) so the
parameter passed for orgId is always an Int; locate the call to
ProcessTenantConstructor.ProcessAndUpdateJsonFiles in
MentoringMetabaseDashboardFunction.scala and replace the incomplete expression
with a complete, safe fallback value for orgId.
- Around line 145-150: The catch block is vulnerable to second-order SQL
injection because it interpolates e.getMessage and the raw tenantCode directly
into the UPDATE SQL; change the error-path to use the same safeTenantCode used
earlier (or otherwise sanitize/escape tenantCode) and pass both the tenant
identifier and the error message as parameters to postgresUtil.insertData (or
use a parameterized/escaped update function) when updating metaDataTable in the
catch of MentoringMetabaseDashboardFunction; locate usages of
postgresUtil.insertData, metaDataTable, tenantCode, safeTenantCode and
e.getMessage in this file (the two catch blocks) and replace the
string-interpolated SQL with a parameterized update or escaped values consistent
with the earlier safeTenantCode usage.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/mentoring/ProcessOrgConstructor.scala`:
- Around line 29-53: The pattern match on row.get("config") in
ProcessOrgConstructor.scala only handles Some(queryValue: PGobject) and None,
causing a MatchError for Some of other types; change the match to include a
catch-all branch (e.g., case Some(other) =>) or restructure to first match
Some(x) and then type-match inside
updateQuestionCardJsonValues/createQuestionCard logic, and in that new
Some(other) branch log a warning (including the runtime class of the value and
the dashboardId/row context) and skip processing so the code no longer throws on
non-PGobject values.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/mentoring/ProcessTenantConstructor.scala`:
- Around line 110-160: The method updateQuestionCardJsonValues currently returns
configObjectNode.get("questionCard") which can be null if the key is missing and
later causes an NPE in updatePostgresDatabaseQuery; change the final return to a
safe lookup (e.g., use configObjectNode.path("questionCard") or wrap get in
Option and return the original configJson.path("questionCard") when absent) so
the method always returns a non-null JsonNode (an empty/missing node) and
callers can skip rows when the questionCard is missing.
- Around line 27-52: The match on row.get("config") in
ProcessTenantConstructor.scala only handles Some(queryValue: PGobject) and None,
which will throw a MatchError for Some of any other type; add a fallback case
(e.g., case Some(other) => ...) that safely extracts a string/JSON (e.g.,
other.toString), attempts to parse it with objectMapper.readTree inside the same
null/exception-safe flow used for PGobject, or logs a warning and skips
processing if parsing fails—update references to config handling in the block
that calls updateQuestionCardJsonValues, updatePostgresDatabaseQuery, and
updateQuestionIdInDashCard so the non-PGobject path follows the same safe
behavior without raising runtime errors.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/observation/UpdateQuestionDomainJsonFiles.scala`:
- Around line 26-77: In UpdateQuestionDomainJsonFiles.scala, the else branch
(the heading branch handling row.get("config")) only matches case
Some(queryValue: PGobject) and can throw a MatchError for None or non-PGobject
values; add a case None => logger.info("Key 'config' not found in the result
row.") and a fallback case (e.g., case Some(other) => logger.warn(s"Unexpected
config type: ${other.getClass.getName}")) to the same match so missing or
wrong-typed config values are handled gracefully in the method where this match
occurs.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/project/miDashboard/HomePage.scala`:
- Around line 160-162: The code builds a raw SQL string using the dynamic table
name reportConfig in map_unique_id_query and passes it to
postgresUtil.fetchData; validate reportConfig against a safe allowlist or a
mapping enum before embedding it into the query (e.g., check reportConfig is one
of the expected table identifiers or map a logical name to a hardcoded table
name), and if validation fails throw/log an error or use a safe default so that
map_unique_id_query never contains an unchecked value; update the call site
where map_unique_id_query is constructed (referenced by map_unique_id_query and
stateMapId) to validate/translate reportConfig first and only then build the SQL
for postgresUtil.fetchData.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/project/ProcessAdminConstructor.scala`:
- Around line 25-49: The pattern match on row.get("config") currently only
handles case Some(queryValue: PGobject) and None, which causes a MatchError if
config is Some of another type; update the match in ProcessAdminConstructor
(where row.get("config") is matched and case Some(queryValue: PGobject) is used)
to add a fallback case such as case Some(other) => logger.warn(...) (or handle
conversion/skip) so unexpected types are safely logged/skipped and processing
continues without throwing; ensure the new case uses the same logger and does
not assume a PGobject before calling
updateQuestionCardJson/updatePostgresDatabaseQuery.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/project/ProcessDistrictConstructor.scala`:
- Around line 104-111: The code injects targetedStateId and targetedDistrictId
directly into SQL fragments in ProcessDistrictConstructor (see use in
updateTableFilter and updatedTableName with districtIdRegex/stateIdRegex and
replacements of ${state_id}/${district_id}); validate that both IDs match an
allowed pattern (e.g., numeric or UUID) and reject or throw if they don't, or
sanitize by escaping single quotes (replace "'" with "''") before performing the
.replaceAll/.replace calls; ensure the sanitization/validation happens
immediately after computing targetedStateId/targetedDistrictId and before any
use in queryNode, updatedTableName, or additions of projectsTable/solutionsTable
so the injected values cannot break or alter the SQL.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/project/ProcessStateConstructor.scala`:
- Around line 92-107: The updateQuery function currently injects targetedStateId
directly into the native SQL via updateTableFilter/replace calls (see
updateQuery, stateIdRegex, updateTableFilter), which risks breaking the query or
SQL injection if the ID contains quotes; before performing replacements, escape
any single quotes in targetedStateId (e.g., replace ' with ''), then use the
escaped value when building the replacement strings (the calls that insert
'$targetedStateId' / "${state_id}" into the query) so the generated SQL remains
syntactically correct and safe.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/survey/SurveyMetabaseDashboardFunction.scala`:
- Around line 69-96: The code concatenates event/config values into SQL (see
targetedSolutionId, targetedProgramId, metaDataTable, solutions and
surveyStatusTable) which risks SQL injection and malformed queries; change all
fetchData calls to use parameterized/prepared statements (bind
targetedSolutionId and targetedProgramId as query parameters instead of string
interpolation) and validate/allowlist any value used as an identifier
(metaDataTable, solutions, surveyStatusTable/surveyQuestionTable) with a strict
regex like ^[A-Za-z0-9_]+$ before using it as a table/entity name,
returning/logging an error or rejecting the event when validation fails so only
safe identifiers are used.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/survey/UpdateStatusJsonFiles.scala`:
- Around line 27-51: The pattern match on row.get("config") in
UpdateStatusJsonFiles.scala only handles Some(queryValue: PGobject) and None
which will throw a MatchError for Some values of other types; add a fallback
case (e.g., case Some(other) => ...) after the PGobject case that logs a warning
via logger.warn (including the unexpected type via other.getClass or
other.toString) and safely skips processing so the loop continues without
throwing, keeping existing behavior for updateQuestionCardJsonValues,
updatePostgresDatabaseQuery, and updateQuestionIdInDashCard intact.
- Around line 108-141: The method updateQuestionCardJsonValues currently returns
configObjectNode.get("questionCard") which can be null when "questionCard" is
missing; change the final return to a safe fallback by returning the
copied/possibly-updated configObjectNode (or configJson) instead of the raw get
result, and ensure processing is skipped when
Option(configObjectNode.get("questionCard")) is empty; update the function
updateQuestionCardJsonValues so that after the Option(...) processing it returns
configObjectNode (or configJson) rather than
configObjectNode.get("questionCard") to avoid nulls from questionCard.

In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/user/ProcessAdminConstructor.scala`:
- Around line 25-50: The pattern match on row.get("config") in
ProcessAdminConstructor.scala currently only handles case Some(queryValue:
PGobject) and case None, which can throw a MatchError for other types (e.g.,
String); update the match to include a wildcard case (e.g., case Some(other))
that safely handles unexpected types by converting other values to a string/JSON
or logging and skipping processing, ensuring the logic around
updateQuestionCardJsonValues, updatePostgresDatabaseQuery,
metabaseUtil.createQuestionCard and updating dashCards only runs for valid
JSON/PGobject inputs.

In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/task/UnifiedStreamTask.scala`:
- Around line 181-184: The object UnifiedStreamTask references logger but none
is defined; add a logger field and the import for your logging backend (e.g.
SLF4J) at the top of the object: import org.slf4j.LoggerFactory and define a
private val logger = LoggerFactory.getLogger(classOf[UnifiedStreamTask.type])
(or LoggerFactory.getLogger(getClass) if preferred) so the case fallback that
calls logger.warn(...) compiles and logs correctly while keeping runJob and
streamType usage unchanged.
🟡 Minor comments (5)
metabase-jobs/combined-dashboard-creator/src/main/resources/combined-dashboard-creator.conf-1-46 (1)

1-46: ⚠️ Potential issue | 🟡 Minor

Kafka topic naming uses prod prefix — verify environment alignment.

All topics (Lines 7, 11, 15, 19, 23, 27, 31, 35) use prod. or -prod suffixes, yet the base config sets job.env = "dev" and the Postgres/Metabase endpoints point to localhost. This mismatch suggests the config may not be correctly parameterized for different environments.

Consider using ${job.env} interpolation in topic names (e.g., "sl-metabase-project-dashboard-"${job.env}) to avoid deploying dev configs that consume from production topics or vice versa.

stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/task/UnifiedStreamTask.scala-118-127 (1)

118-127: ⚠️ Potential issue | 🟡 Minor

Project stream parallelism is hardcoded to 1 in single-stream mode, unlike all other streams.

Lines 120 and 123 hardcode parallelism to 1, whereas every other stream case in this method (survey, observation, user, mentoring) reads from config (config.*Parallelism). The multi-stream runJob also uses config values for project. This looks like a copy-paste oversight.

🐛 Proposed fix
         val projectStream = env.addSource(projectSource)
           .name("project-consumer").uid("project-consumer")
-          .setParallelism(1).rebalance
+          .setParallelism(config.projectConsumerParallelism).rebalance
           .process(new ProjectStreamFunction(config))
           .name("project-processor").uid("project-processor")
-          .setParallelism(1)
+          .setParallelism(config.projectProcessParallelism)
stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/task/UnifiedStreamTask.scala-138-153 (1)

138-153: ⚠️ Potential issue | 🟡 Minor

Sink names contain "test-" prefix — likely unintended in production code.

Lines 140 and 153 use "test-survey-sink" and "test-observation-sink" respectively, while the multi-stream runJob uses "survey-sink" and "observation-sink". These look like leftover test artifacts and will appear in Flink's web UI and metrics, making monitoring confusing.

🐛 Proposed fix
-          .name("test-survey-sink").setParallelism(config.surveySinkParallelism)
+          .name("survey-sink").setParallelism(config.surveySinkParallelism)
-          .name("test-observation-sink").setParallelism(config.observationSinkParallelism)
+          .name("observation-sink").setParallelism(config.observationSinkParallelism)
stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/task/UnifiedStreamTask.scala-48-50 (1)

48-50: ⚠️ Potential issue | 🟡 Minor

Project sink parallelism hardcoded to 1 in the multi-stream path.

All other stream sinks (survey, observation, user, mentoring) use their respective config.*SinkParallelism values. Project sink is the only one with a hardcoded 1, which is inconsistent and may become a bottleneck.

🐛 Proposed fix
       projectStream.getSideOutput(config.projectOutputTag)
         .addSink(kafkaConnector.kafkaStringSink(config.projectOutputTopic))
-        .name("project-sink").setParallelism(1)
+        .name("project-sink").setParallelism(config.projectSinkParallelism)
metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/mentoring/MentoringMetabaseDashboardFunction.scala-87-87 (1)

87-87: ⚠️ Potential issue | 🟡 Minor

Typo: "exits" → "exists".

-          logger.info(s"Table does not exits in the metabase DB")
+          logger.info(s"Table does not exist in the metabase DB")
🧹 Nitpick comments (10)
Documentation/log-conf-file/log4j2.properties (1)

104-203: Consider extracting shared properties to reduce duplication.
These appenders repeat the same layout pattern, retention age, and basePath; a small set of properties (e.g., log.pattern, log.basePath, log.retention) would make future updates less error‑prone.

jobs-core/src/main/resources/base-config.conf (1)

2-3: Kafka/ZooKeeper addresses assume Docker networking.

kafka:9092 and zookeeper:2181 resolve only inside a Docker/Compose network. This is fine for containerized deployments, but be aware that local or bare-metal development will require overrides. Consider documenting this or providing a local profile.

metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/domain/UserMappingEvent.scala (1)

23-25: Minor style inconsistency: missing explicit type parameter on some readOrDefault calls.

Lines 23, 25, and 35 omit the explicit [String] type parameter that all other String accessors include. While Scala infers the type correctly, being consistent aids readability.

Proposed fix for consistency
-  def stateId: String = readOrDefault("state.id", null)
+  def stateId: String = readOrDefault[String]("state.id", null)

-  def districtId: String = readOrDefault("district.id", null)
+  def districtId: String = readOrDefault[String]("district.id", null)

-  def programId: String = readOrDefault("meta.programInformation.id", null)
+  def programId: String = readOrDefault[String]("meta.programInformation.id", null)

Also applies to: 35-35

stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/task/UnifiedStreamTask.scala (2)

28-29: params.get("streamType") — consider using params.has for consistency and clarity.

Line 19 uses params.has("config.file.path") before accessing the value, but line 28 relies on params.get returning null. While ParameterTool.get(String) does return null for missing keys, using has + get would be consistent with the pattern already established above and makes the intent clearer.

♻️ Suggested consistency fix
-    val streamType = params.get("streamType")
-    if (streamType != null && streamType.nonEmpty) {
+    if (params.has("streamType") && params.get("streamType").nonEmpty) {
+      val streamType = params.get("streamType")
       runJob(unifiedStreamConfig, kafkaConnector, streamType)
     } else {
       runJob(unifiedStreamConfig, kafkaConnector)
     }

112-114: Default case abandons the env created at line 113.

While the return on line 184 correctly prevents double execution (addressing the prior review), the fallback still creates a full Flink StreamExecutionEnvironment on line 113 that is immediately discarded when case _ triggers. Consider moving env creation after the match or throwing on unknown streamType instead of silently falling back.

♻️ Suggested restructure
   def runJob(config: UnifiedStreamConfig, kafkaConnector: FlinkKafkaConnector, streamType: String): Unit = {
-    val env = FlinkUtil.getExecutionContext(config)
-
     streamType match {
-      case "project" =>
+      case "project" | "survey" | "observation" | "user" | "mentoring" => // valid
+      case _ =>
+        logger.warn(s"Unknown streamType '$streamType', falling back to all-streams mode.")
+        runJob(config, kafkaConnector)
+        return
+    }
+
+    val env = FlinkUtil.getExecutionContext(config)
+    streamType match {
+      case "project" =>
         // ... wire project stream

Or simply throw an IllegalArgumentException for unknown stream types instead of a silent fallback.

metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/project/miDashboard/DistrictPage.scala (1)

182-196: Make dashcard appends idempotent to avoid duplicates on re-runs.

appendDashCardToDashboard unconditionally appends to existing dashcards. Re-running the job will duplicate cards. Consider de‑duplication by card_id (or a stable dashcard identifier) before adding.

♻️ Proposed fix (dedupe by card_id)
-    val dashCardsNode = readJsonFile(jsonFile)
-    dashCardsNode.foreach { value =>
-      existingDashcards.add(value)
-    }
+    val existingIds = existingDashcards.elements().asScala
+      .flatMap(n => Option(n.path("card_id")).filter(_.isInt).map(_.asInt))
+      .toSet
+    val dashCardsNode = readJsonFile(jsonFile)
+    dashCardsNode.foreach { value =>
+      val cardId = Option(value.path("card_id")).filter(_.isInt).map(_.asInt)
+      if (cardId.forall(id => !existingIds.contains(id))) {
+        existingDashcards.add(value)
+      }
+    }

As per coding guidelines: "Idempotency of ETL runs".

metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/project/miDashboard/HomePage.scala (1)

182-196: Prevent duplicate dashcards on repeated runs.

appendDashCardToDashboard always appends to the existing list. This will duplicate dashcards if the job re-runs. Please de‑duplicate by card_id (or another stable identifier) before appending.

♻️ Proposed fix (dedupe by card_id)
-    val dashCardsNode = readJsonFile(jsonFile)
-    dashCardsNode.foreach { value =>
-      existingDashcards.add(value)
-    }
+    val existingIds = existingDashcards.elements().asScala
+      .flatMap(n => Option(n.path("card_id")).filter(_.isInt).map(_.asInt))
+      .toSet
+    val dashCardsNode = readJsonFile(jsonFile)
+    dashCardsNode.foreach { value =>
+      val cardId = Option(value.path("card_id")).filter(_.isInt).map(_.asInt)
+      if (cardId.forall(id => !existingIds.contains(id))) {
+        existingDashcards.add(value)
+      }
+    }

As per coding guidelines: "Idempotency of ETL runs".

metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/project/miDashboard/StatePage.scala (2)

356-359: Avoid printStackTrace; use structured logging.

Line 358 prints the stack trace directly. Use logger.error(..., e) so errors are captured by the logging pipeline.

📝 Proposed fix
-        logger.info(s"Error during JSON processing: ${e.getMessage}")
-        e.printStackTrace()
+        logger.error(s"Error during JSON processing: ${e.getMessage}", e)

185-199: Keep dashboard updates idempotent.

Appending dashcards without deduplication will duplicate cards on retries/re‑runs. Consider skipping entries whose card_id already exists in the dashboard.

♻️ Proposed fix (dedupe by card_id)
-    val dashCardsNode = readJsonFile(jsonFile)
-    dashCardsNode.foreach { value =>
-      existingDashcards.add(value)
-    }
+    val existingIds = existingDashcards.elements().asScala
+      .flatMap(n => Option(n.path("card_id")).filter(_.isInt).map(_.asInt))
+      .toSet
+    val dashCardsNode = readJsonFile(jsonFile)
+    dashCardsNode.foreach { value =>
+      val cardId = Option(value.path("card_id")).filter(_.isInt).map(_.asInt)
+      if (cardId.forall(id => !existingIds.contains(id))) {
+        existingDashcards.add(value)
+      }
+    }

As per coding guidelines: "Idempotency of ETL runs".

metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/mentoring/ProcessOrgConstructor.scala (1)

17-19: Method signature has 19 parameters — consider using a case class to group related parameters.

ProcessAndUpdateJsonFiles accepts 19 parameters, making it hard to read and error-prone at call sites. Grouping related parameters (e.g., all table names, all column IDs) into case classes would improve clarity.

zookeeper = "10.148.0.38:2181"
broker-servers = "kafka:9092"
zookeeper = "zookeeper:2181"
groupId = "combined-dashboard-creator-group"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Job-specific groupId in the shared base config is misleading and risky.

base-config.conf is inherited by every Flink job in the project. Setting groupId = "combined-dashboard-creator-group" here means any job that forgets to override kafka.groupId will silently join this consumer group, potentially stealing messages intended for the combined-dashboard-creator or vice versa.

Use a generic default (e.g., "default-consumer-group") here and let each job override it in its own config.

Proposed fix
-  groupId = "combined-dashboard-creator-group"
+  groupId = "default-consumer-group"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
groupId = "combined-dashboard-creator-group"
groupId = "default-consumer-group"
🤖 Prompt for AI Agents
In `@jobs-core/src/main/resources/base-config.conf` at line 4, The
base-config.conf defines a job-specific Kafka consumer group
("combined-dashboard-creator-group") which is unsafe as a global default; change
the default key/value (groupId) in base-config.conf to a generic placeholder
like "default-consumer-group" so jobs must intentionally override kafka.groupId
in their job-specific configs, and verify any code reading the config still uses
the same key name (groupId / kafka.groupId) so no lookup changes are needed.

Comment on lines +113 to +133
notify {
type = "api"
api.url = "https://saas-qa-interface.tekdinext.com/interface/v1/notification/send-raw"
email.template = """
{
"email": {
"to": ["{email}"],
"subject": "Welcome to Metabase Analytics Platform",
"body": "<p>Dear {name},</p><p>Please find the URL and login credentials for Metabase.</p><p>URL: <a href=\"{dashboardLink}\">Click here to access Metabase</a></p><p>Username: {email}</p><p>Password: {password}</p>"
}
}
"""
sms.template = """
{
"sms": {
"to": ["{mobile}"],
"body": "Dear {name},\nPlease find the URL and login credentials for Metabase.\nURL: {dashboardLink}\nUsername: {email}\nPassword: {password}"
}
}
"""
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Notification templates send Metabase passwords in cleartext.

The email and SMS templates include {password} in the body, meaning newly created Metabase user passwords are transmitted in plaintext over email/SMS. Consider using a password-reset link or temporary token flow instead, so credentials are never exposed in transit.

Also, the notification API URL on Line 115 (https://saas-qa-interface.tekdinext.com/...) appears to be a QA/staging endpoint hardcoded in what looks like a production config (topics use prod.* prefixes). This should be externalized.

🤖 Prompt for AI Agents
In
`@metabase-jobs/combined-dashboard-creator/src/main/resources/combined-dashboard-creator.conf`
around lines 113 - 133, The notify block currently embeds cleartext passwords in
email.template and sms.template via {password} and hardcodes the notify API URL
in api.url; update email.template and sms.template to remove {password} and
instead include a secure password-reset or one-time-token placeholder (e.g.,
{resetLink} or {tempToken}) that your user-creation flow will generate, and
modify the api.url to be externalized (replace the literal URL with a config/env
placeholder such as ${NOTIFY_API_URL} or a top-level notify.apiUrl property) so
the endpoint isn’t hardcoded in this config file.

Comment on lines +134 to +136
val questionCardIdList = ProcessTenantConstructor.ProcessAndUpdateJsonFiles(reportConfigQuery, parentCollectionId, databaseId, dashboardId, 0, orgNameId, 0, tenantUserTable,
tenantSessionTable, tenantSessionAttendanceTable, tenantConnectionsTable, tenantOrgMentorRatingTable,
tenantOrgRolesTable, Map.empty, if (orgId.nonEmpty) orgId.toInt else , tabId, metabaseUtil, postgresUtil)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Syntax error: missing value after else on Line 136.

if (orgId.nonEmpty) orgId.toInt else is incomplete — there's no expression after else. This won't compile.

This appears to be a partial fix attempt for the orgId.toInt issue flagged previously.

🐛 Proposed fix
-            tenantOrgRolesTable, Map.empty, if (orgId.nonEmpty) orgId.toInt else , tabId, metabaseUtil, postgresUtil)
+            tenantOrgRolesTable, Map.empty, if (orgId.nonEmpty) orgId.toInt else 0, tabId, metabaseUtil, postgresUtil)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
val questionCardIdList = ProcessTenantConstructor.ProcessAndUpdateJsonFiles(reportConfigQuery, parentCollectionId, databaseId, dashboardId, 0, orgNameId, 0, tenantUserTable,
tenantSessionTable, tenantSessionAttendanceTable, tenantConnectionsTable, tenantOrgMentorRatingTable,
tenantOrgRolesTable, Map.empty, if (orgId.nonEmpty) orgId.toInt else , tabId, metabaseUtil, postgresUtil)
val questionCardIdList = ProcessTenantConstructor.ProcessAndUpdateJsonFiles(reportConfigQuery, parentCollectionId, databaseId, dashboardId, 0, orgNameId, 0, tenantUserTable,
tenantSessionTable, tenantSessionAttendanceTable, tenantConnectionsTable, tenantOrgMentorRatingTable,
tenantOrgRolesTable, Map.empty, if (orgId.nonEmpty) orgId.toInt else 0, tabId, metabaseUtil, postgresUtil)
🤖 Prompt for AI Agents
In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/mentoring/MentoringMetabaseDashboardFunction.scala`
around lines 134 - 136, The call to
ProcessTenantConstructor.ProcessAndUpdateJsonFiles has a syntax error because
the ternary-like expression `if (orgId.nonEmpty) orgId.toInt else` is missing
the else branch; update that argument to provide a default integer (e.g., `0`)
or a safe conversion (e.g., use toIntOption.getOrElse(0) if available) so the
parameter passed for orgId is always an Int; locate the call to
ProcessTenantConstructor.ProcessAndUpdateJsonFiles in
MentoringMetabaseDashboardFunction.scala and replace the incomplete expression
with a complete, safe fallback value for orgId.

Comment on lines +145 to +150
catch {
case e: Exception =>
postgresUtil.insertData(s"UPDATE $metaDataTable SET status = 'Failed',error_message = '${e.getMessage}' WHERE entity_id = '${tenantCode}_tenant_admin';")
logger.error(s"An error occurred: ${e.getMessage}")
e.printStackTrace()
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Second-order SQL injection in error handler: e.getMessage and unescaped tenantCode interpolated into SQL.

Line 147 interpolates both e.getMessage and tenantCode directly into a SQL UPDATE statement. Exception messages can contain arbitrary text (including SQL metacharacters from malformed table/column names). Unlike the safeTenantCode used at Line 130, this catch block uses the raw tenantCode.

The same pattern appears at Line 175.

🐛 Proposed fix (minimal) — escape both values
         case e: Exception =>
-            postgresUtil.insertData(s"UPDATE $metaDataTable SET status = 'Failed',error_message = '${e.getMessage}' WHERE entity_id = '${tenantCode}_tenant_admin';")
+            val safeError = Option(e.getMessage).getOrElse("Unknown error").replace("'", "''")
+            val safeTenant = tenantCode.replace("'", "''")
+            postgresUtil.insertData(s"UPDATE $metaDataTable SET status = 'Failed',error_message = '$safeError' WHERE entity_id = '${safeTenant}_tenant_admin';")

Apply the same pattern at Line 175. Ideally, migrate to parameterized queries throughout.

As per coding guidelines, metabase-jobs/**: "No credential exposure" and "Correct DB connections & cursor handling" — SQL injection through error messages is a security risk that must be addressed.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
catch {
case e: Exception =>
postgresUtil.insertData(s"UPDATE $metaDataTable SET status = 'Failed',error_message = '${e.getMessage}' WHERE entity_id = '${tenantCode}_tenant_admin';")
logger.error(s"An error occurred: ${e.getMessage}")
e.printStackTrace()
}
catch {
case e: Exception =>
val safeError = Option(e.getMessage).getOrElse("Unknown error").replace("'", "''")
val safeTenant = tenantCode.replace("'", "''")
postgresUtil.insertData(s"UPDATE $metaDataTable SET status = 'Failed',error_message = '$safeError' WHERE entity_id = '${safeTenant}_tenant_admin';")
logger.error(s"An error occurred: ${e.getMessage}")
e.printStackTrace()
}
🤖 Prompt for AI Agents
In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/mentoring/MentoringMetabaseDashboardFunction.scala`
around lines 145 - 150, The catch block is vulnerable to second-order SQL
injection because it interpolates e.getMessage and the raw tenantCode directly
into the UPDATE SQL; change the error-path to use the same safeTenantCode used
earlier (or otherwise sanitize/escape tenantCode) and pass both the tenant
identifier and the error message as parameters to postgresUtil.insertData (or
use a parameterized/escaped update function) when updating metaDataTable in the
catch of MentoringMetabaseDashboardFunction; locate usages of
postgresUtil.insertData, metaDataTable, tenantCode, safeTenantCode and
e.getMessage in this file (the two catch blocks) and replace the
string-interpolated SQL with a parameterized update or escaped values consistent
with the earlier safeTenantCode usage.

Comment on lines +29 to +53
row.get("config") match {
case Some(queryValue: PGobject) =>
val configJson = objectMapper.readTree(queryValue.getValue)
if (configJson != null) {
val originalQuestionCard = configJson.path("questionCard")
val chartName = Option(originalQuestionCard.path("name").asText()).getOrElse("Unknown Chart")
val updatedQuestionCard = updateQuestionCardJsonValues(configJson, collectionId, statenameId, districtnameId, blocknameId, clusternameId, schoolnameId, databaseId)
val finalQuestionCard = updatePostgresDatabaseQuery(updatedQuestionCard, tenantUserTable, tenantSessionTable, tenantSessionAttendanceTable, tenantConnectionsTable, tenantOrgMentorRatingTable, tenantOrgRolesTable, orgId)
val requestBody = finalQuestionCard.asInstanceOf[ObjectNode]
val cardId = mapper.readTree(metabaseUtil.createQuestionCard(requestBody.toString)).path("id").asInt()
logger.info(s">>>>>>>>> Successfully created question card with card_id: $cardId for $chartName")
questionCardId.append(cardId)
val updatedQuestionIdInDashCard = updateQuestionIdInDashCard(configJson, dashboardId, cardId)
updatedQuestionIdInDashCard.foreach { node =>
val dashCardsNode = node.path("dashCards")
if (!dashCardsNode.isMissingNode && !dashCardsNode.isNull) {
dashcardsArray.add(dashCardsNode)
} else {
logger.warn("No 'dashCards' key found in the JSON.")
}
}
}
case None =>
logger.warn("Key 'config' not found in the result row.")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Non-exhaustive pattern match for non-heading rows — MatchError if config is Some(nonPGobject).

The heading branch (Line 71) correctly uses case _ => as a catch-all, but the non-heading branch at Line 29 only handles Some(queryValue: PGobject) and None. If the value is Some of another type, Scala throws MatchError.

🐛 Proposed fix
             case None =>
               logger.warn("Key 'config' not found in the result row.")
+            case _ =>
+              logger.warn("Unexpected type for 'config' in the result row. Skipping.")
           }
🤖 Prompt for AI Agents
In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/mentoring/ProcessOrgConstructor.scala`
around lines 29 - 53, The pattern match on row.get("config") in
ProcessOrgConstructor.scala only handles Some(queryValue: PGobject) and None,
causing a MatchError for Some of other types; change the match to include a
catch-all branch (e.g., case Some(other) =>) or restructure to first match
Some(x) and then type-match inside
updateQuestionCardJsonValues/createQuestionCard logic, and in that new
Some(other) branch log a warning (including the runtime class of the value and
the dashboardId/row context) and skip processing so the code no longer throws on
non-PGobject values.

Comment on lines +69 to +96
val solutionName = postgresUtil.fetchData(s"""SELECT entity_name FROM $metaDataTable WHERE entity_id = '$targetedSolutionId'""").collectFirst { case map: Map[_, _] => map.getOrElse("entity_name", "").toString }.getOrElse("")
val surveyStatusTable = s"""${targetedSolutionId}_survey_status"""
val targetedProgramId: String = {
val id = Option(event.targetedProgram).map(_.trim).getOrElse("")
if (id.nonEmpty) id
else {
val query = s"SELECT program_id FROM $solutions WHERE solution_id = '$targetedSolutionId'"
logger.info(query)
postgresUtil.fetchData(query) match {
case List(map: Map[_, _]) => Option(map.get("program_id")).flatten.map(_.toString).getOrElse("")
case _ => ""
}
}
}
val storedTableIds = TrieMap.empty[(Int, String), Int]
val storedColumnIds = TrieMap.empty[(Int, String), Int]

val query = s"SELECT external_id, program_external_id, program_description, description FROM $solutions WHERE solution_id = '$targetedSolutionId'"
val resultMap = postgresUtil.fetchData(query).collectFirst { case map: Map[_, _] => map }.getOrElse(Map.empty)

val solutionExternalId = resultMap.get("external_id").map(_.toString).getOrElse("")
val programExternalId = resultMap.get("program_external_id").map(_.toString).getOrElse("")
val programDescription = resultMap.get("program_description").map(_.toString).getOrElse("")
val solutionDescription = resultMap.get("description").map(_.toString).getOrElse("")

val programName = postgresUtil.fetchData(s"""SELECT entity_name from $metaDataTable where entity_id = '$targetedProgramId'""").collectFirst { case map: Map[_, _] => map.get("entity_name").map(_.toString).getOrElse("") }.getOrElse("")

val orgId = postgresUtil.fetchData(s"""SELECT org_id FROM $solutions WHERE program_id = '$targetedProgramId' AND org_id IS NOT NULL AND TRIM(org_id) <> '' LIMIT 1 """).collectFirst { case map: Map[_, _] => map.getOrElse("org_id", "").toString }.getOrElse("")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Use prepared statements for event-sourced IDs and validate table identifiers.

Lines 69–96 build SQL with targetedSolutionId, targetedProgramId, metaDataTable, and solutions directly in the query string. These values are event/config‑sourced and can contain quotes or malicious input. Also, surveyQuestionTable is derived from targetedSolutionId and later used as a table identifier. Please:

  • bind IDs via prepared statements, and
  • allowlist/validate identifier inputs (e.g., only [A-Za-z0-9_]+) before using them as table names.

This avoids SQL breakage and injection vectors.

🤖 Prompt for AI Agents
In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/survey/SurveyMetabaseDashboardFunction.scala`
around lines 69 - 96, The code concatenates event/config values into SQL (see
targetedSolutionId, targetedProgramId, metaDataTable, solutions and
surveyStatusTable) which risks SQL injection and malformed queries; change all
fetchData calls to use parameterized/prepared statements (bind
targetedSolutionId and targetedProgramId as query parameters instead of string
interpolation) and validate/allowlist any value used as an identifier
(metaDataTable, solutions, surveyStatusTable/surveyQuestionTable) with a strict
regex like ^[A-Za-z0-9_]+$ before using it as a table/entity name,
returning/logging an error or rejecting the event when validation fails so only
safe identifiers are used.

Comment on lines +27 to +51
row.get("config") match {
case Some(queryValue: PGobject) =>
val configJson = objectMapper.readTree(queryValue.getValue)
if (configJson != null) {
val originalQuestionCard = configJson.path("questionCard")
val chartName = Option(originalQuestionCard.path("name").asText()).getOrElse("Unknown Chart")
val updatedQuestionCard = updateQuestionCardJsonValues(configJson, collectionId, statenameId, districtnameId, blocknameId, clusternameId, schoolNameId, orgnameId, databaseId)
val finalQuestionCard = updatePostgresDatabaseQuery(updatedQuestionCard, surveyStatusTableName)
val requestBody = finalQuestionCard.asInstanceOf[ObjectNode]
val cardId = mapper.readTree(metabaseUtil.createQuestionCard(requestBody.toString)).path("id").asInt()
logger.info(s">>>>>>>>> Successfully created question card with card_id: $cardId for $chartName")
questionCardId.append(cardId)
val updatedQuestionIdInDashCard = updateQuestionIdInDashCard(configJson, dashboardId, tabId, cardId)
updatedQuestionIdInDashCard.foreach { node =>
val dashCardsNode = node.path("dashCards")
if (!dashCardsNode.isMissingNode && !dashCardsNode.isNull) {
dashcardsArray.add(dashCardsNode)
} else {
logger.warn("No 'dashCards' key found in the JSON.")
}
}
}
case None =>
logger.warn("Key 'config' not found in the result row.")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle unexpected config types in non‑heading rows.

Line 27 only covers Some(PGobject) and None. A Some with a non‑PGobject value will throw MatchError. Add a fallback case to skip/log.

🐛 Proposed fix
          row.get("config") match {
            case Some(queryValue: PGobject) =>
              val configJson = objectMapper.readTree(queryValue.getValue)
              if (configJson != null) {
                ...
              }
+           case Some(other) =>
+             logger.warn(s"Unexpected type for 'config': ${other.getClass.getName}. Skipping.")
            case None =>
              logger.warn("Key 'config' not found in the result row.")
          }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
row.get("config") match {
case Some(queryValue: PGobject) =>
val configJson = objectMapper.readTree(queryValue.getValue)
if (configJson != null) {
val originalQuestionCard = configJson.path("questionCard")
val chartName = Option(originalQuestionCard.path("name").asText()).getOrElse("Unknown Chart")
val updatedQuestionCard = updateQuestionCardJsonValues(configJson, collectionId, statenameId, districtnameId, blocknameId, clusternameId, schoolNameId, orgnameId, databaseId)
val finalQuestionCard = updatePostgresDatabaseQuery(updatedQuestionCard, surveyStatusTableName)
val requestBody = finalQuestionCard.asInstanceOf[ObjectNode]
val cardId = mapper.readTree(metabaseUtil.createQuestionCard(requestBody.toString)).path("id").asInt()
logger.info(s">>>>>>>>> Successfully created question card with card_id: $cardId for $chartName")
questionCardId.append(cardId)
val updatedQuestionIdInDashCard = updateQuestionIdInDashCard(configJson, dashboardId, tabId, cardId)
updatedQuestionIdInDashCard.foreach { node =>
val dashCardsNode = node.path("dashCards")
if (!dashCardsNode.isMissingNode && !dashCardsNode.isNull) {
dashcardsArray.add(dashCardsNode)
} else {
logger.warn("No 'dashCards' key found in the JSON.")
}
}
}
case None =>
logger.warn("Key 'config' not found in the result row.")
}
row.get("config") match {
case Some(queryValue: PGobject) =>
val configJson = objectMapper.readTree(queryValue.getValue)
if (configJson != null) {
val originalQuestionCard = configJson.path("questionCard")
val chartName = Option(originalQuestionCard.path("name").asText()).getOrElse("Unknown Chart")
val updatedQuestionCard = updateQuestionCardJsonValues(configJson, collectionId, statenameId, districtnameId, blocknameId, clusternameId, schoolNameId, orgnameId, databaseId)
val finalQuestionCard = updatePostgresDatabaseQuery(updatedQuestionCard, surveyStatusTableName)
val requestBody = finalQuestionCard.asInstanceOf[ObjectNode]
val cardId = mapper.readTree(metabaseUtil.createQuestionCard(requestBody.toString)).path("id").asInt()
logger.info(s">>>>>>>>> Successfully created question card with card_id: $cardId for $chartName")
questionCardId.append(cardId)
val updatedQuestionIdInDashCard = updateQuestionIdInDashCard(configJson, dashboardId, tabId, cardId)
updatedQuestionIdInDashCard.foreach { node =>
val dashCardsNode = node.path("dashCards")
if (!dashCardsNode.isMissingNode && !dashCardsNode.isNull) {
dashcardsArray.add(dashCardsNode)
} else {
logger.warn("No 'dashCards' key found in the JSON.")
}
}
}
case Some(other) =>
logger.warn(s"Unexpected type for 'config': ${other.getClass.getName}. Skipping.")
case None =>
logger.warn("Key 'config' not found in the result row.")
}
🤖 Prompt for AI Agents
In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/survey/UpdateStatusJsonFiles.scala`
around lines 27 - 51, The pattern match on row.get("config") in
UpdateStatusJsonFiles.scala only handles Some(queryValue: PGobject) and None
which will throw a MatchError for Some values of other types; add a fallback
case (e.g., case Some(other) => ...) after the PGobject case that logs a warning
via logger.warn (including the unexpected type via other.getClass or
other.toString) and safely skips processing so the loop continues without
throwing, keeping existing behavior for updateQuestionCardJsonValues,
updatePostgresDatabaseQuery, and updateQuestionIdInDashCard intact.

Comment on lines +108 to +141
def updateQuestionCardJsonValues(configJson: JsonNode, collectionId: Int, statenameId: Int, districtnameId: Int, blocknameId: Int, clusternameId: Int, schoolNameId: Int, orgnameId: Int, databaseId: Int): JsonNode = {
try {
val configObjectNode = configJson.deepCopy().asInstanceOf[ObjectNode]
Option(configObjectNode.get("questionCard")).foreach { questionCard =>
questionCard.asInstanceOf[ObjectNode].put("collection_id", collectionId)

Option(questionCard.get("dataset_query")).foreach { datasetQuery =>
datasetQuery.asInstanceOf[ObjectNode].put("database", databaseId)

Option(datasetQuery.get("native")).foreach { nativeNode =>
Option(nativeNode.get("template-tags")).foreach { templateTags =>
val params = Map(
"state_param" -> statenameId,
"district_param" -> districtnameId,
"block_param" -> blocknameId,
"cluster_param" -> clusternameId,
"school_param" -> schoolNameId,
"org_param" -> orgnameId
)
params.foreach { case (paramName, paramId) =>
Option(templateTags.get(paramName)).foreach { paramNode =>
updateDimension(paramNode.asInstanceOf[ObjectNode], paramId)
}
}
}
}
}
}
configObjectNode.get("questionCard")
} catch {
case e: Exception =>
logger.warn(s"Warning: JSON node could not be updated. Error: ${e.getMessage}")
configJson
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid returning null when questionCard is missing.

Line 136 returns configObjectNode.get("questionCard") directly, which can be null and cause downstream failures. Return a safe fallback (and skip processing when missing).

🛡️ Proposed fix
-        configObjectNode.get("questionCard")
+        Option(configObjectNode.get("questionCard")).getOrElse {
+          logger.warn("'questionCard' key not found in config JSON.")
+          configJson
+        }
🤖 Prompt for AI Agents
In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/survey/UpdateStatusJsonFiles.scala`
around lines 108 - 141, The method updateQuestionCardJsonValues currently
returns configObjectNode.get("questionCard") which can be null when
"questionCard" is missing; change the final return to a safe fallback by
returning the copied/possibly-updated configObjectNode (or configJson) instead
of the raw get result, and ensure processing is skipped when
Option(configObjectNode.get("questionCard")) is empty; update the function
updateQuestionCardJsonValues so that after the Option(...) processing it returns
configObjectNode (or configJson) rather than
configObjectNode.get("questionCard") to avoid nulls from questionCard.

Comment on lines +25 to +50
if (row.get("question_type").map(_.toString).getOrElse("") != "heading") {
row.get("config") match {
case Some(queryValue: PGobject) =>
val configJson = objectMapper.readTree(queryValue.getValue)
if (configJson != null) {
val originalQuestionCard = configJson.path("questionCard")
val chartName = Option(originalQuestionCard.path("name").asText()).getOrElse("Unknown Chart")
val updatedQuestionCard = updateQuestionCardJsonValues(configJson, collectionId, tenantCodeId, databaseId)
val finalQuestionCard = updatePostgresDatabaseQuery(updatedQuestionCard, userMetrics)
val requestBody = finalQuestionCard.asInstanceOf[ObjectNode]
val cardId = mapper.readTree(metabaseUtil.createQuestionCard(requestBody.toString)).path("id").asInt()
logger.info(s">>>>>>>>> Successfully created question card with card_id: $cardId for $chartName")
questionCardIds.append(cardId)
val updatedQuestionIdInDashCard = updateQuestionIdInDashCard(configJson, dashboardId, cardId)
updatedQuestionIdInDashCard.foreach { node =>
val dashCardsNode = node.path("dashCards")
if (!dashCardsNode.isMissingNode && !dashCardsNode.isNull) {
dashcardsArray.add(dashCardsNode)
} else {
logger.info("No 'dashCards' key found in the JSON.")
}
}
}
case None =>
logger.info("Key 'config' not found in the result row.")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find and read the full file to understand context
find . -path "*metabase-jobs*" -name "ProcessAdminConstructor.scala" -type f

Repository: ELEVATE-Project/data-pipeline

Length of output: 651


🏁 Script executed:

cat -n "./metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/user/ProcessAdminConstructor.scala"

Repository: ELEVATE-Project/data-pipeline

Length of output: 9268


🏁 Script executed:

# Find PostgresUtil class
find . -path "*metabase-jobs*" -name "PostgresUtil.scala" -o -name "PostgresUtil.java" | head -5

Repository: ELEVATE-Project/data-pipeline

Length of output: 55


🏁 Script executed:

# Search more broadly for PostgresUtil
find . -name "PostgresUtil.scala" -o -name "PostgresUtil.java" 2>/dev/null

Repository: ELEVATE-Project/data-pipeline

Length of output: 144


🏁 Script executed:

cat -n "./jobs-core/src/main/scala/org/shikshalokam/job/util/PostgresUtil.scala"

Repository: ELEVATE-Project/data-pipeline

Length of output: 8714


🏁 Script executed:

# Let me check the schema or query to understand if config is guaranteed to be PGobject
# First, search for where ProcessAdminConstructor is called and what query is passed
rg -A 2 -B 2 "ProcessAdminConstructor.processAdminJsonFiles" --type scala | head -40

Repository: ELEVATE-Project/data-pipeline

Length of output: 3246


Handle unexpected config types to avoid MatchError.

The non-heading branch (lines 26–50) only handles PGobject and None cases. Since row.get("config") returns Option[Any] (from JDBC's resultSet.getObject()), any other type—such as a String—will cause a MatchError at runtime. The heading branch correctly uses a wildcard pattern (line 68); the non-heading branch should do the same.

✅ Suggested fix
         row.get("config") match {
           case Some(queryValue: PGobject) =>
             val configJson = objectMapper.readTree(queryValue.getValue)
             if (configJson != null) {
               ...
             }
+          case Some(other) =>
+            logger.warn(s"Unexpected type for 'config': ${other.getClass.getSimpleName}. Skipping.")
           case None =>
             logger.info("Key 'config' not found in the result row.")
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (row.get("question_type").map(_.toString).getOrElse("") != "heading") {
row.get("config") match {
case Some(queryValue: PGobject) =>
val configJson = objectMapper.readTree(queryValue.getValue)
if (configJson != null) {
val originalQuestionCard = configJson.path("questionCard")
val chartName = Option(originalQuestionCard.path("name").asText()).getOrElse("Unknown Chart")
val updatedQuestionCard = updateQuestionCardJsonValues(configJson, collectionId, tenantCodeId, databaseId)
val finalQuestionCard = updatePostgresDatabaseQuery(updatedQuestionCard, userMetrics)
val requestBody = finalQuestionCard.asInstanceOf[ObjectNode]
val cardId = mapper.readTree(metabaseUtil.createQuestionCard(requestBody.toString)).path("id").asInt()
logger.info(s">>>>>>>>> Successfully created question card with card_id: $cardId for $chartName")
questionCardIds.append(cardId)
val updatedQuestionIdInDashCard = updateQuestionIdInDashCard(configJson, dashboardId, cardId)
updatedQuestionIdInDashCard.foreach { node =>
val dashCardsNode = node.path("dashCards")
if (!dashCardsNode.isMissingNode && !dashCardsNode.isNull) {
dashcardsArray.add(dashCardsNode)
} else {
logger.info("No 'dashCards' key found in the JSON.")
}
}
}
case None =>
logger.info("Key 'config' not found in the result row.")
}
if (row.get("question_type").map(_.toString).getOrElse("") != "heading") {
row.get("config") match {
case Some(queryValue: PGobject) =>
val configJson = objectMapper.readTree(queryValue.getValue)
if (configJson != null) {
val originalQuestionCard = configJson.path("questionCard")
val chartName = Option(originalQuestionCard.path("name").asText()).getOrElse("Unknown Chart")
val updatedQuestionCard = updateQuestionCardJsonValues(configJson, collectionId, tenantCodeId, databaseId)
val finalQuestionCard = updatePostgresDatabaseQuery(updatedQuestionCard, userMetrics)
val requestBody = finalQuestionCard.asInstanceOf[ObjectNode]
val cardId = mapper.readTree(metabaseUtil.createQuestionCard(requestBody.toString)).path("id").asInt()
logger.info(s">>>>>>>>> Successfully created question card with card_id: $cardId for $chartName")
questionCardIds.append(cardId)
val updatedQuestionIdInDashCard = updateQuestionIdInDashCard(configJson, dashboardId, cardId)
updatedQuestionIdInDashCard.foreach { node =>
val dashCardsNode = node.path("dashCards")
if (!dashCardsNode.isMissingNode && !dashCardsNode.isNull) {
dashcardsArray.add(dashCardsNode)
} else {
logger.info("No 'dashCards' key found in the JSON.")
}
}
}
case Some(other) =>
logger.warn(s"Unexpected type for 'config': ${other.getClass.getSimpleName}. Skipping.")
case None =>
logger.info("Key 'config' not found in the result row.")
}
🤖 Prompt for AI Agents
In
`@metabase-jobs/combined-dashboard-creator/src/main/scala/org/shikshalokam/job/combined/dashboard/creator/functions/user/ProcessAdminConstructor.scala`
around lines 25 - 50, The pattern match on row.get("config") in
ProcessAdminConstructor.scala currently only handles case Some(queryValue:
PGobject) and case None, which can throw a MatchError for other types (e.g.,
String); update the match to include a wildcard case (e.g., case Some(other))
that safely handles unexpected types by converting other values to a string/JSON
or logging and skipping processing, ensuring the logic around
updateQuestionCardJsonValues, updatePostgresDatabaseQuery,
metabaseUtil.createQuestionCard and updating dashCards only runs for valid
JSON/PGobject inputs.

Comment on lines +181 to +184
case _ =>
logger.warn(s"Unknown streamType '$streamType', falling back to all-streams mode.")
runJob(config, kafkaConnector)
return // runJob already calls env.execute
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for logger references and imports in UnifiedStreamTask.scala
echo "=== Searching for logger-related patterns ==="
rg -n 'logger|Logger|LazyLogging|StrictLogging|Logging' \
  "stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/task/UnifiedStreamTask.scala"

echo -e "\n=== Checking first 50 lines for imports and object definition ==="
head -50 "stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/task/UnifiedStreamTask.scala"

Repository: ELEVATE-Project/data-pipeline

Length of output: 2536


🏁 Script executed:

#!/bin/bash
# Check lines 170-200 for context around line 182
echo "=== Context around line 182 (lines 170-200) ==="
sed -n '170,200p' "stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/task/UnifiedStreamTask.scala"

echo -e "\n=== Check for any logger val/var definition in entire file ==="
rg -n '^\s*(private\s+)?(val|var)\s+logger' \
  "stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/task/UnifiedStreamTask.scala"

echo -e "\n=== Check if object UnifiedStreamTask has any trait extends ==="
head -20 "stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/task/UnifiedStreamTask.scala" | grep -A 5 "object UnifiedStreamTask"

Repository: ELEVATE-Project/data-pipeline

Length of output: 1381


🏁 Script executed:

#!/bin/bash
# Get total file length
echo "=== Total lines in file ==="
wc -l "stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/task/UnifiedStreamTask.scala"

# Show ALL imports
echo -e "\n=== All imports ==="
sed -n '1,30p' "stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/task/UnifiedStreamTask.scala"

# Check for any logger pattern (broader search)
echo -e "\n=== Broader search for logger definition ==="
rg -n 'logger\s*=' "stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/task/UnifiedStreamTask.scala"

# Check if there's a trait that might provide logger
echo -e "\n=== Check for implicit or method providing logger ==="
rg -n 'implicit|given' "stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/task/UnifiedStreamTask.scala"

Repository: ELEVATE-Project/data-pipeline

Length of output: 1683


🏁 Script executed:

#!/bin/bash
# Show exact lines 181-188 (end of file)
echo "=== Lines 181-188 (end of file) ==="
sed -n '181,188p' "stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/task/UnifiedStreamTask.scala"

# Verify no SLF4J or any logging framework in imports
echo -e "\n=== Search for any logging framework import in entire file ==="
rg -i 'slf4j|logging|logger|log4j' "stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/task/UnifiedStreamTask.scala"

Repository: ELEVATE-Project/data-pipeline

Length of output: 520


logger is not defined — this will fail to compile.

object UnifiedStreamTask does not extend any trait that provides a logger field, nor is one declared locally. The file lacks any logging framework imports (SLF4J, Log4j, etc.). Line 182 will produce a compilation error.

Add a logger to the object:

+import org.slf4j.LoggerFactory
+
 object UnifiedStreamTask {
+  private val logger = LoggerFactory.getLogger(getClass)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
case _ =>
logger.warn(s"Unknown streamType '$streamType', falling back to all-streams mode.")
runJob(config, kafkaConnector)
return // runJob already calls env.execute
import org.slf4j.LoggerFactory
object UnifiedStreamTask {
private val logger = LoggerFactory.getLogger(getClass)
// ... rest of the object definition ...
def main(args: Array[String]): Unit = {
// ... earlier code ...
streamType match {
// ... other cases ...
case _ =>
logger.warn(s"Unknown streamType '$streamType', falling back to all-streams mode.")
runJob(config, kafkaConnector)
return // runJob already calls env.execute
}
env.execute("Stream Processor Job")
}
}
🤖 Prompt for AI Agents
In
`@stream-jobs/combined-stream-processor/src/main/scala/org/shikshalokam/job/combined/stream/processor/task/UnifiedStreamTask.scala`
around lines 181 - 184, The object UnifiedStreamTask references logger but none
is defined; add a logger field and the import for your logging backend (e.g.
SLF4J) at the top of the object: import org.slf4j.LoggerFactory and define a
private val logger = LoggerFactory.getLogger(classOf[UnifiedStreamTask.type])
(or LoggerFactory.getLogger(getClass) if preferred) so the case fallback that
calls logger.warn(...) compiles and logs correctly while keeping runJob and
streamType usage unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants