optimization of master key initialization by zane-neo · Pull Request #1 · zane-neo/ml-commons

zane-neo · 2025-11-27T06:07:25Z

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

New functionality includes testing.
New functionality has been documented.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

zane-neo · 2025-11-27T07:26:22Z

@CodiumAI-Agent /improve

…t get error when executing PER agent (opensearch-project#4579) * use dedicated thread pool in response handler Signed-off-by: zane-neo <zaniu@amazon.com> * address comments Signed-off-by: zane-neo <zaniu@amazon.com> * optimize code Signed-off-by: zane-neo <zaniu@amazon.com> --------- Signed-off-by: zane-neo <zaniu@amazon.com>

) Signed-off-by: Sicheng Song <sicheng.song@outlook.com>

…pensearch-project#4586) * [Gemini Model Support] Filter Agent final response, address comments opensearch-project#4570 Signed-off-by: rithin-pullela-aws <rithinp@amazon.com> * address comment Signed-off-by: rithin-pullela-aws <rithinp@amazon.com> * Address comment + add coverage Signed-off-by: rithin-pullela-aws <rithinp@amazon.com> --------- Signed-off-by: rithin-pullela-aws <rithinp@amazon.com>

Signed-off-by: Sicheng Song <sicheng.song@outlook.com>

…rch-project#4591) * Support OpenAI Chat Completions API with new Agent Interface Signed-off-by: rithin-pullela-aws <rithinp@amazon.com> * Address comments Signed-off-by: rithin-pullela-aws <rithinp@amazon.com> --------- Signed-off-by: rithin-pullela-aws <rithinp@amazon.com>

…4599) Signed-off-by: rithin-pullela-aws <rithinp@amazon.com>

…ntic memory queries (opensearch-project#4597) * Add feature flag for remote agentic memory type Signed-off-by: Sicheng Song <sicheng.song@outlook.com> * fix: get message in agentic memory not working Signed-off-by: Sicheng Song <sicheng.song@outlook.com> * adress comments Signed-off-by: Sicheng Song <sicheng.song@outlook.com> --------- Signed-off-by: Sicheng Song <sicheng.song@outlook.com>

* use global variables and add validations Signed-off-by: Mingshi Liu <mingshl@amazon.com> * change exception type Signed-off-by: Mingshi Liu <mingshl@amazon.com> * fix client stashContext Signed-off-by: Mingshi Liu <mingshl@amazon.com> * consolidate test Signed-off-by: Mingshi Liu <mingshl@amazon.com> --------- Signed-off-by: Mingshi Liu <mingshl@amazon.com>

… of llm connectors (opensearch-project#4394) * [FEATURE] Add an option to turn on and off the certificate validation in ML Commons Resolves opensearch-project#4371 Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com> * [FEATURE] Add an option to turn on and off the certificate validation in ML Commons Resolves opensearch-project#4371 Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com> * Fixed coderabbitai comments Resolves opensearch-project#4371 Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com> * Added test cases Resolves opensearch-project#4371 Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com> * Fixed review comments Resolves opensearch-project#4371 Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com> * Fixed review comments Resolves opensearch-project#4371 Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com> * Fixed review comments Resolves opensearch-project#4371 Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com> * Converting log based tests to arguent check Resolves opensearch-project#4371 Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com> * Converting log based tests to arguent check Resolves opensearch-project#4371 Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com> * Converting log based tests to arguent check Resolves opensearch-project#4371 Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com> * Fixing test failures Resolves opensearch-project#4371 Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com> * changing info logs to warn Resolves opensearch-project#4371 Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com> --------- Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com> Signed-off-by: Muneer Kolarkunnu <33829651+akolarkunnu@users.noreply.github.com> Co-authored-by: Dhrubo Saha <dhrubo@amazon.com> Co-authored-by: Mingshi Liu <mingshl@amazon.com>

* fix previous tool results missing Signed-off-by: Jiaping Zeng <jpz@amazon.com> * add tool message support in agent revamp + update AGUI processing Signed-off-by: Jiaping Zeng <jpz@amazon.com> * support image in streaming Signed-off-by: Jiaping Zeng <jpz@amazon.com> * add/fix tests Signed-off-by: Jiaping Zeng <jpz@amazon.com> * add support for tool messages for OpenAI Signed-off-by: Jiaping Zeng <jpz@amazon.com> --------- Signed-off-by: Jiaping Zeng <jpz@amazon.com>

Signed-off-by: zane-neo <zaniu@amazon.com>

… Access (opensearch-project#4608) * Fix: Restore Thread Context in MLAgentExecutor properly to fix Memory Access Signed-off-by: rithin-pullela-aws <rithinp@amazon.com> * Fix Rebase issue Signed-off-by: rithin-pullela-aws <rithinp@amazon.com> --------- Signed-off-by: rithin-pullela-aws <rithinp@amazon.com>

…ic memory (opensearch-project#4621) Signed-off-by: Sicheng Song <sicheng.song@outlook.com>

…oject#4626) * add overload constructor to unblock skills plugin Signed-off-by: Jiaping Zeng <jpz@amazon.com> * add test Signed-off-by: Jiaping Zeng <jpz@amazon.com> --------- Signed-off-by: Jiaping Zeng <jpz@amazon.com>

…earch-project#4627) Signed-off-by: Yaliang Wu <ylwu@amazon.com>

…#4628) Signed-off-by: Mingshi Liu <mingshl@amazon.com>

…uring agent register (opensearch-project#4637) * add more tests Signed-off-by: Mingshi Liu <mingshl@amazon.com> add test notations Signed-off-by: Mingshi Liu <mingshl@amazon.com> * apply spotless Signed-off-by: Mingshi Liu <mingshl@amazon.com> * force run CI Signed-off-by: Mingshi Liu <mingshl@amazon.com> --------- Signed-off-by: Mingshi Liu <mingshl@amazon.com>

* Add 3.5.0 release notes Signed-off-by: Jiaping Zeng <jpz@amazon.com> * Update release notes with latest changes Signed-off-by: Jiaping Zeng <jpz@amazon.com> * update release notes with latest bug fixes Signed-off-by: Jiaping Zeng <jpz@amazon.com> --------- Signed-off-by: Jiaping Zeng <jpz@amazon.com>

Signed-off-by: opensearch-ci-bot <opensearch-infra@amazon.com> Co-authored-by: opensearch-ci-bot <opensearch-infra@amazon.com>

…make ml-common fips build param aware (opensearch-project#4654) * Fix ML build with 1) adapt to gradle shadow plugin v9 upgrade and 2) make ml-common fips build param aware Signed-off-by: Craig Perkins <cwperx@amazon.com> * Add to build tasks as well Signed-off-by: Craig Perkins <cwperx@amazon.com> * Add FipsBuildParam in plugin/build.gradle Signed-off-by: Craig Perkins <cwperx@amazon.com> --------- Signed-off-by: Craig Perkins <cwperx@amazon.com>

…h-project#4659) Use "-Pcrypto.standard=FIPS-140-3" (quoted) instead of -Pcrypto.standard=FIPS-140-3 for consistency with other OpenSearch plugin repositories (e.g. flow-framework PR opensearch-project#1322). Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>

Signed-off-by: Nathalie Jonathan <nathhjo@amazon.com>

…tion (opensearch-project#4656) * fix: fix integ test for ML inference range query rewrite - RestMLInferenceSearchRequestProcessorIT: Remove pre/post process functions from Bedrock connector so raw response is available as dataAsMap. Use embedding.length() to get embedding dimension as integer for the range query. Use diary_embedding_size_int (integer field) instead of diary_embedding_size (keyword field). - RestMLRAGSearchProcessorIT: Update Cohere model from command-a-03-2025 (v2 API only) to command-r-08-2024 (v1 API). - plugin/build.gradle: Add bc-fips to unit test classpath in FIPS mode via detached configuration to fix NoClassDefFoundError. Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * fix: revert manual substitution, restore StringSubstitutor The manual substitution was unnecessary. The correct fix is removing the post_process_function from the connector so the raw Bedrock response is available as dataAsMap, allowing embedding.length() to work directly. Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * test: remove unnecessary unit test for numeric type preservation The test was added when the fix was in MLInferenceSearchRequestProcessor but since the fix is now in the integration test (removing post-process function from connector), this unit test adds no value. Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * test: restore missing javadoc for testExecute_rewriteListFromTermQueryToGeometryQuerySuccess Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> --------- Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>

…ect#4666) Signed-off-by: Peter Zhu <zhujiaxi@amazon.com> Co-authored-by: Dhrubo Saha <dhrubo@amazon.com>

…asters (opensearch-project#4665) * fix: improve integration test stability - SearchModelGroupITTests: add supportsDedicatedMasters=false to prevent suite timeout when test framework randomly adds dedicated cluster-manager nodes based on random seed - BedRockConnectorBodies.json, RestMLInferenceSearchResponseProcessorIT, RestMLRAGSearchProcessorIT: increase max_connection to 200 in Bedrock connector configs to prevent connection pool exhaustion under high concurrent request rates in CI - RestMLRAGSearchProcessorIT: update Cohere connector model from command-a-03-2025 (v2 API only) to command-r-08-2024 (v1 API) - MLCommonsRestTestCase: add isServiceReachable(hostname) helper for skipping tests when external services are unreachable - RestMLInferenceIngestProcessorIT: skip OpenAI tests when api.openai.com is not reachable in addition to OPENAI_KEY null check Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * fix: fail fast in waitForTask when task reaches terminal failure state When a model download fails (e.g. network error), the task goes to FAILED state. The previous waitForTask only checked for the target state (COMPLETED), so it would loop until CUSTOM_MODEL_TIMEOUT (20,000 seconds), causing the 20-minute suite timeout to trigger first. Now waitForTask also exits immediately on FAILED or CANCELLED states, allowing the test to fail with a clear assertion error instead of a suite timeout. Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> --------- Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>

* fix: stats job collector Signed-off-by: Pavan Yekbote <pybot@amazon.com> * fix: test cases Signed-off-by: Pavan Yekbote <pybot@amazon.com> --------- Signed-off-by: Pavan Yekbote <pybot@amazon.com>

…x flaky IndexUtilsTests (opensearch-project#4668) * fix: skip OpenAI RAG tests when api.openai.com is unreachable When api.openai.com is not reachable on CI, model registration tasks fail silently returning model_id=null, causing deployRemoteModel(null) to hit /_plugins/_ml/models/null/_deploy and fail with 404. Reuse the existing isServiceReachable() helper to skip all four OpenAI tests when the service is not reachable. Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * fix: make testGetNumberOfDocumentsInIndex_SearchQuery synchronous The assertion was running inside an async ActionListener callback on a search thread. If it threw AssertionError, it was caught as an uncaught exception on that thread rather than propagated to the test thread, causing the test to appear to pass or fail non-deterministically. Use PlainActionFuture to block the test thread until the result is available, then assert on the test thread. Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> --------- Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>

…#4667) * Optimize integration test setup to eliminate redundant per-test work - Transport IT: use @SuiteScopeTestCase to run expensive model training and data loading once per class instead of per test method - Memory IT: change scope=TEST to scope=SUITE to reuse cluster across tests instead of restarting a 2-node cluster per test method - REST IT: add static guard around disableClusterConnectorAccessControl() + Thread.sleep(20000) so cluster settings and sleep run once per class Measured 82% reduction across tested classes (776s → 140s). 16 files changed, 0 test regressions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: rithin-pullela-aws <rithinp@amazon.com> * Fix REST IT: remove base class settings guard, only guard Thread.sleep OpenSearchRestTestCase.cleanUpCluster() calls wipeClusterSettings() after every test method, clearing all persistent cluster settings. The previous static guard in setupSettings() prevented re-applying them, causing failures in MLModelAutoReDeployerIT, RestMLDeleteTaskActionIT, RestMLMemoryCircuitBreakerIT, and others. Fix: - Remove baseSettingsInitialized guard from MLCommonsRestTestCase.setupSettings() (4 cheap REST calls, must run every test since settings get wiped) - In subclasses, move disableClusterConnectorAccessControl() outside the guard (must re-run after wipe), only guard Thread.sleep(20000) (expensive, only needed once for initial propagation) Verified: full integTest suite passes (same pre-existing OpenAI API key failures as baseline, 0 new regressions). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: rithin-pullela-aws <rithinp@amazon.com> * Replace Thread.sleep(20000) with active cluster settings polling The 20-second sleep after disableClusterConnectorAccessControl() was a brute-force wait for cluster settings propagation. Since PUT _cluster/settings returns after master acknowledgment, the setting is available almost immediately on 1-2 node test clusters. Added waitForClusterSettingPropagation() utility in MLCommonsRestTestCase that polls GET _cluster/settings?flat_settings=true until the setting appears, with a 10-second timeout as safety net. Resolves the existing TODO comments asking whether the sleep could be replaced with a cluster state check. Measured: RestBedRockInferenceIT dropped from 32.6s to 15.4s. Full integTest suite dropped from 8m22s to 6m11s. Signed-off-by: rithin-pullela-aws <rithinp@amazon.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: rithin-pullela-aws <rithinp@amazon.com> * Validate cluster setting value in polling check Update waitForClusterSettingPropagation to verify the setting has the expected value (not just that the key exists) for correctness. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: rithin-pullela-aws <rithinp@amazon.com> * Fix waitForTask timeout bug and skip redundant remote model deploy The CUSTOM_MODEL_TIMEOUT (20_000) was passed with TimeUnit.SECONDS, creating a 20,000-second (5.5 hour) effective timeout instead of the intended 20 seconds. This caused tests to hang until suite timeout killed them. Fixed by using TimeUnit.MILLISECONDS. Also skip the deploy step for remote model registration in integration tests since remote models do not require explicit deployment. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: rithin-pullela-aws <rithinp@amazon.com> --------- Signed-off-by: rithin-pullela-aws <rithinp@amazon.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…nt (opensearch-project#4645) * extend memory interface Signed-off-by: Jiaping Zeng <jpz@amazon.com> * clean up + support remote agentic memory Signed-off-by: Jiaping Zeng <jpz@amazon.com> * add tests Signed-off-by: Jiaping Zeng <jpz@amazon.com> * send MessagesSnapshot AGUI event Signed-off-by: Jiaping Zeng <jpz@amazon.com> * sort using only messageId Signed-off-by: Jiaping Zeng <jpz@amazon.com> * test: create memory session document using if provided sessionId does not exist Signed-off-by: Jiaping Zeng <jpz@amazon.com> * move initial memory saving logic from MLChatAgentRunner to MLAgentExecutor Signed-off-by: Jiaping Zeng <jpz@amazon.com> * fix streaming text accumulation Signed-off-by: Jiaping Zeng <jpz@amazon.com> * address comments Signed-off-by: Jiaping Zeng <jpz@amazon.com> * Revert "move initial memory saving logic from MLChatAgentRunner to MLAgentExecutor" This reverts commit a364488. Signed-off-by: Jiaping Zeng <jpz@amazon.com> * disable conversation index memory when using messages array input Signed-off-by: Jiaping Zeng <jpz@amazon.com> * add messageId in MessagesSnapshot and remove page context from memory Signed-off-by: Jiaping Zeng <jpz@amazon.com> * Revert "disable conversation index memory when using messages array input" This reverts commit 1e667ee. Signed-off-by: Jiaping Zeng <jpz@amazon.com> * refactor memory handling for unified interface Signed-off-by: Jiaping Zeng <jpz@amazon.com> * remove tool result from conv index memory Signed-off-by: Jiaping Zeng <jpz@amazon.com> * store text in message with image in conv index memory Signed-off-by: Jiaping Zeng <jpz@amazon.com> --------- Signed-off-by: Jiaping Zeng <jpz@amazon.com>

Signed-off-by: zane-neo <zaniu@amazon.com>

zane-neo force-pushed the optimize-master-blocking-issue branch from 325878a to c46e970 Compare November 27, 2025 07:25

zane-neo force-pushed the optimize-master-blocking-issue branch from c46e970 to a77a844 Compare December 9, 2025 01:28

zane-neo and others added 27 commits February 2, 2026 13:47

fix: type mismatch in AgenticConversationMemory (opensearch-project#4578

cd967fd

) Signed-off-by: Sicheng Song <sicheng.song@outlook.com>

fix: set disable session by default to false (opensearch-project#4584)

4c43b79

Signed-off-by: Sicheng Song <sicheng.song@outlook.com>

Fix gemini function calling in tool failure case (opensearch-project#…

23f0aae

…4599) Signed-off-by: rithin-pullela-aws <rithinp@amazon.com>

move ssl configuration to client_config (opensearch-project#4616)

30704d7

Signed-off-by: zane-neo <zaniu@amazon.com>

fix: restore thread context in other agent related class to fix agent…

49467e8

…ic memory (opensearch-project#4621) Signed-off-by: Sicheng Song <sicheng.song@outlook.com>

fix deserialization failing for models with built-in connector (opens…

9b1274a

…earch-project#4627) Signed-off-by: Yaliang Wu <ylwu@amazon.com>

adapt summarization manager for unified interface (opensearch-project…

b1646b0

…#4628) Signed-off-by: Mingshi Liu <mingshl@amazon.com>

Increment version to 3.6.0-SNAPSHOT (opensearch-project#4609)

f78efd5

Signed-off-by: opensearch-ci-bot <opensearch-infra@amazon.com> Co-authored-by: opensearch-ci-bot <opensearch-infra@amazon.com>

Add maintainer (opensearch-project#4660)

5a33fc3

Signed-off-by: Nathalie Jonathan <nathhjo@amazon.com>

Onboard code diff analyzer and reviewer (ml-commons) (opensearch-proj…

f919cfc

…ect#4666) Signed-off-by: Peter Zhu <zhujiaxi@amazon.com> Co-authored-by: Dhrubo Saha <dhrubo@amazon.com>

Fix: Early exit in stats collector job (opensearch-project#4560)

283e8ec

* fix: stats job collector Signed-off-by: Pavan Yekbote <pybot@amazon.com> * fix: test cases Signed-off-by: Pavan Yekbote <pybot@amazon.com> --------- Signed-off-by: Pavan Yekbote <pybot@amazon.com>

rithin-pullela-aws and others added 5 commits February 26, 2026 16:50

optimization of master key initialization

0b904ef

Signed-off-by: zane-neo <zaniu@amazon.com>

Fix tests

68c2904

Signed-off-by: zane-neo <zaniu@amazon.com>

rebase main

b223c0a

Signed-off-by: zane-neo <zaniu@amazon.com>

zane-neo force-pushed the optimize-master-blocking-issue branch from a77a844 to b223c0a Compare February 27, 2026 12:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimization of master key initialization#1

optimization of master key initialization#1
zane-neo wants to merge 32 commits intomainfrom
optimize-master-blocking-issue

zane-neo commented Nov 27, 2025

Uh oh!

zane-neo commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants