-
Notifications
You must be signed in to change notification settings - Fork 526
[Context Security Apps] add performance optimization enhancements for transform #16904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Context Security Apps] add performance optimization enhancements for transform #16904
Conversation
opauloh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some adjustments needed, also can we test the new settings before we merge it?
You can do it by building the package locally, spinning up a new environment in cloud and upload the integration from the zip file. Then install a new agent and verify if the findings are being loaded properly. Ideally, with these settings we should be seeing findings being generated incrementally in a few seconds
| docs_per_second: 100 | ||
| max_page_search_size: 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to be careful setting docs_per_second and max_page_search_size to a low number, this might lead to a point where the transform will start to lag behind and not being able to process everything between the transform executions, and this can reduce performance instead of improving it.
There was a great discussion in this Entity Store Performance PR about those two settings here.
My suggestion is that we copy Entity Store strategy and have only with max_page_search_size, setting it to 500. The Entity Store handles a heavy load of data and these settings are already being battle tested.
cc @romulets please correct me if I'm wrong here, but it seems that the higher performance improvement on https://github.com/elastic/sdh-security-team/issues/1476 came from removing the frozen and cold tiers and adding the time range that was added later on the investigation and not from setting the docs per second and max page to 100.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Paulo for the thorough review. Okay I can remove docs_per_second and max_page_search_size to test query filters. I think max_page_search_size default to 500. We have the [separate ticket]([CSPM] Consider having docs per second and page size by default in transform
#14755) that can be utilized investigation and exploration of settings docs_per_second and max_page_search_size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your assessment @opauloh, we can hold the docs per sec and max page until we have test it further.
It's important to say tho that in previous SDHs such a setting did yield big impact. So I'd not drop understanding what default config is good enough for us. I do think some lag in our use case might be acceptable, the cycle is once per day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification @romulets! I think it will be great to work on it in the separate ticket then, so we can isolate the performance gains benefits and battle test a good default for misconfigurations and vulnerabilities
packages/cloud_security_posture/elasticsearch/transform/misconfiguration/transform.yml
Outdated
Show resolved
Hide resolved
packages/cloud_security_posture/elasticsearch/transform/misconfiguration/transform.yml
Outdated
Show resolved
Hide resolved
…figuration/transform.yml Co-authored-by: Paulo Silva <paulo.scape@gmail.com>
…figuration/transform.yml Co-authored-by: Paulo Silva <paulo.scape@gmail.com>
maxcold
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of questions regarding the version, otherwise from the logic perspective the change looks fine, but I agree with @opauloh that we need to verify it. Ideally on the env with data to make sure the expected number of findings is being present in the findings page after a cycle
Other points not directly related to this PR but to the task in general:
- we also need to update the vulnerability transform, which is still being installed from the Kibana code. With vulnerabilities it's even more important as the volume of vulnerabiltiies is usually bigger then the volume of misconfigurations
- we have the old transform definition in Kibana code, there is logic to install it for older versions of the package. Can you check if we need to backport this change to old transform? probably not, but I don't remember exactly starting from which version customers get the transform from the integration, the change was quite recent. Ido did it for the namespace epic
| # 1.4.x - 8.9.x | ||
| # 1.3.x - 8.8.x | ||
| # 1.2.x - 8.7.x | ||
| - version: "3.3.0-preview01" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 3.2.0 is skipped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I was confusing on versioning comment. I thought each package version should match the release version and I didn't see the 3.2.0 but maybe we bump to 3.2.0 for 9.3.0 and 9.4.0 releases. Wdyt? @maxcold

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historically, during the release cycle, we add changes to the integration package in -previewX version and then between the first and second QA cycle, we release the version of the package under preview. During 9.3.0 work, there were only two changes, neither of them were not released under preview; there was no 3.2.0 release for 9.3.0. But that doesn't mean we need to skip this version. I will post on our QA channel for clarification if the package release process changed. If not, and we just skipped it for 9.1.1 and 9.1.2, you need to add your change under 9.2.0-preview01. If, from now on, we decide not to release under preview, then you can add it under just 9.2.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh okay thank you for providing clarity!
| # Bump this version to delete, reinstall, and restart the transform during package. | ||
| # Version bump is needed if there is any code change in transform. | ||
| fleet_transform_version: 0.2.1 | ||
| fleet_transform_version: 0.2.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we bump the minor version of the package, it would be nice to bump the minor here as well. Don't think it influences anything from a functionality standpoint, but it's good to keep consistency
|
@opauloh @maxcold Something off with query filter range timestamp but I don't think it's the array structure. There issue with query filter syntax
Okay I can create a separate ticket and pr for vulnerabilities transform
So you are saying we should apply these to old transform? I believe old transform was last updated |
🚀 Benchmarks reportTo see the full report comment with |
@Omolola-Akinleye no, I'm just saying that you need to check and decide if the filter needs to be applied. If you think it's not needed, then no change to the old transform is needed |
| # newer versions go on top | ||
| # version map: | ||
| # IMPORTANT: this map doesn't apply to serverless where package availability depends on the spec version https://github.com/elastic/kibana/blob/main/config/serverless.yml#L14-L15 | ||
| # 3.2.x - 9.3.x, 9.4.x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as there were no change to kibana condition https://github.com/elastic/integrations/blob/main/packages/cloud_security_posture/manifest.yml#L26 , 3.2.x will work for 9.2.x as well. Let's figure out the approch to package versioning first https://elastic.slack.com/archives/C03E5KGNWT1/p1768212825336389
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can now bump to from 3.2.0 to 3.3.0-preview01 according to docs so I just followed that approach since these changes are for 9.4.0 release
romulets
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to the transform itself look good to me!
|
Transform continues to generate findings with the latest queries @opauloh
|
💚 Build Succeeded
History
|
opauloh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks for performing the tests with real data!



We experienced an SDH where Elasticsearch became overloaded following an agent upgrade. The CSPM misconfiguration transform does not enforce a maximum document age, which contributed to data overload.
To reduce the impact of resource over-consumption, we need to optimize the Cloud Security Posture Management (CSPM) misconfiguration transform. Proposed improvements include adding query filters and maximum constraints to limit the data processed by the transform, thereby improving transform efficiency and overall resource utilization.
Tickets Addressed:
[CSPM] Add date filter to the transform to fetch only day fresher than 26 hours #14757
[CSPM] Skip Cold and Frozen data in Transform #14754
Changes:
** Add 26-hour time window**
Only query findings from the last 26 hours, matching the retention policy
** Add Data Tier Exclusion Filter** Prevents the transform from querying cold/frozen storage tiers, which significantly improves query performance. Since the transform already uses a 26-hour retention window, there's no need to access archived data on slow storage tiers.
Bump Transform Version Triggers Fleet to delete, reinstall, and restart the transform during package upgrade, ensuring all users receive the performance improvements.