Skip to content

Conversation

@Omolola-Akinleye
Copy link
Contributor

@Omolola-Akinleye Omolola-Akinleye commented Jan 8, 2026

We experienced an SDH where Elasticsearch became overloaded following an agent upgrade. The CSPM misconfiguration transform does not enforce a maximum document age, which contributed to data overload.

To reduce the impact of resource over-consumption, we need to optimize the Cloud Security Posture Management (CSPM) misconfiguration transform. Proposed improvements include adding query filters and maximum constraints to limit the data processed by the transform, thereby improving transform efficiency and overall resource utilization.

Tickets Addressed:

Changes:
** Add 26-hour time window**
Only query findings from the last 26 hours, matching the retention policy

must:
  range:
    "@timestamp":
      gte: "now-26h"

** Add Data Tier Exclusion Filter** Prevents the transform from querying cold/frozen storage tiers, which significantly improves query performance. Since the transform already uses a 26-hour retention window, there's no need to access archived data on slow storage tiers.

must_not:
  terms:
    _tier:
      - data_frozen
      - data_cold

Bump Transform Version Triggers Fleet to delete, reinstall, and restart the transform during package upgrade, ensuring all users receive the performance improvements.

fleet_transform_version: 0.3.0

@Omolola-Akinleye Omolola-Akinleye self-assigned this Jan 8, 2026
@Omolola-Akinleye Omolola-Akinleye requested a review from a team as a code owner January 8, 2026 17:26
@Omolola-Akinleye Omolola-Akinleye added enhancement New feature or request Team:Cloud Security Cloud Security team [elastic/cloud-security-posture] labels Jan 8, 2026
Copy link
Contributor

@opauloh opauloh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some adjustments needed, also can we test the new settings before we merge it?

You can do it by building the package locally, spinning up a new environment in cloud and upload the integration from the zip file. Then install a new agent and verify if the findings are being loaded properly. Ideally, with these settings we should be seeing findings being generated incrementally in a few seconds

Comment on lines 39 to 40
docs_per_second: 100
max_page_search_size: 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be careful setting docs_per_second and max_page_search_size to a low number, this might lead to a point where the transform will start to lag behind and not being able to process everything between the transform executions, and this can reduce performance instead of improving it.

There was a great discussion in this Entity Store Performance PR about those two settings here.

My suggestion is that we copy Entity Store strategy and have only with max_page_search_size, setting it to 500. The Entity Store handles a heavy load of data and these settings are already being battle tested.

cc @romulets please correct me if I'm wrong here, but it seems that the higher performance improvement on https://github.com/elastic/sdh-security-team/issues/1476 came from removing the frozen and cold tiers and adding the time range that was added later on the investigation and not from setting the docs per second and max page to 100.

Copy link
Contributor Author

@Omolola-Akinleye Omolola-Akinleye Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Paulo for the thorough review. Okay I can remove docs_per_second and max_page_search_size to test query filters. I think max_page_search_size default to 500. We have the [separate ticket]([CSPM] Consider having docs per second and page size by default in transform
#14755) that can be utilized investigation and exploration of settings docs_per_second and max_page_search_size

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your assessment @opauloh, we can hold the docs per sec and max page until we have test it further.

It's important to say tho that in previous SDHs such a setting did yield big impact. So I'd not drop understanding what default config is good enough for us. I do think some lag in our use case might be acceptable, the cycle is once per day.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification @romulets! I think it will be great to work on it in the separate ticket then, so we can isolate the performance gains benefits and battle test a good default for misconfigurations and vulnerabilities

@andrewkroh andrewkroh added the Integration:cloud_security_posture Security Posture Management label Jan 8, 2026
Omolola-Akinleye and others added 2 commits January 8, 2026 17:18
…figuration/transform.yml

Co-authored-by: Paulo Silva <paulo.scape@gmail.com>
…figuration/transform.yml

Co-authored-by: Paulo Silva <paulo.scape@gmail.com>
Copy link
Contributor

@maxcold maxcold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of questions regarding the version, otherwise from the logic perspective the change looks fine, but I agree with @opauloh that we need to verify it. Ideally on the env with data to make sure the expected number of findings is being present in the findings page after a cycle

Other points not directly related to this PR but to the task in general:

  • we also need to update the vulnerability transform, which is still being installed from the Kibana code. With vulnerabilities it's even more important as the volume of vulnerabiltiies is usually bigger then the volume of misconfigurations
  • we have the old transform definition in Kibana code, there is logic to install it for older versions of the package. Can you check if we need to backport this change to old transform? probably not, but I don't remember exactly starting from which version customers get the transform from the integration, the change was quite recent. Ido did it for the namespace epic

# 1.4.x - 8.9.x
# 1.3.x - 8.8.x
# 1.2.x - 8.7.x
- version: "3.3.0-preview01"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 3.2.0 is skipped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I was confusing on versioning comment. I thought each package version should match the release version and I didn't see the 3.2.0 but maybe we bump to 3.2.0 for 9.3.0 and 9.4.0 releases. Wdyt? @maxcold
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically, during the release cycle, we add changes to the integration package in -previewX version and then between the first and second QA cycle, we release the version of the package under preview. During 9.3.0 work, there were only two changes, neither of them were not released under preview; there was no 3.2.0 release for 9.3.0. But that doesn't mean we need to skip this version. I will post on our QA channel for clarification if the package release process changed. If not, and we just skipped it for 9.1.1 and 9.1.2, you need to add your change under 9.2.0-preview01. If, from now on, we decide not to release under preview, then you can add it under just 9.2.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh okay thank you for providing clarity!

# Bump this version to delete, reinstall, and restart the transform during package.
# Version bump is needed if there is any code change in transform.
fleet_transform_version: 0.2.1
fleet_transform_version: 0.2.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we bump the minor version of the package, it would be nice to bump the minor here as well. Don't think it influences anything from a functionality standpoint, but it's good to keep consistency

@Omolola-Akinleye
Copy link
Contributor Author

Omolola-Akinleye commented Jan 9, 2026

@opauloh @maxcold Something off with query filter range timestamp but I don't think it's the array structure. There issue with query filter syntax

  must:
        - range:
            '@timestamp':
            gte: "now-26h"

we also need to update the vulnerability transform, which is still being installed from the Kibana code. With vulnerabilities it's even more important as the volume of vulnerabiltiies is usually bigger then the volume of misconfigurations

Okay I can create a separate ticket and pr for vulnerabilities transform

we have the old transform definition in Kibana code, there is logic to install it for older versions of the package. Can you check if we need to backport this change to old transform? probably not, but I don't remember exactly starting from which version customers get the transform from the integration, the change was quite recent. Ido did it for the namespace epic

So you are saying we should apply these to old transform? I believe old transform was last updated 8.15.0

@elastic-vault-github-plugin-prod

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@maxcold
Copy link
Contributor

maxcold commented Jan 12, 2026

So you are saying we should apply these to old transform?

@Omolola-Akinleye no, I'm just saying that you need to check and decide if the filter needs to be applied. If you think it's not needed, then no change to the old transform is needed

# newer versions go on top
# version map:
# IMPORTANT: this map doesn't apply to serverless where package availability depends on the spec version https://github.com/elastic/kibana/blob/main/config/serverless.yml#L14-L15
# 3.2.x - 9.3.x, 9.4.x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as there were no change to kibana condition https://github.com/elastic/integrations/blob/main/packages/cloud_security_posture/manifest.yml#L26 , 3.2.x will work for 9.2.x as well. Let's figure out the approch to package versioning first https://elastic.slack.com/archives/C03E5KGNWT1/p1768212825336389

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can now bump to from 3.2.0 to 3.3.0-preview01 according to docs so I just followed that approach since these changes are for 9.4.0 release

Copy link
Member

@romulets romulets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to the transform itself look good to me!

@Omolola-Akinleye
Copy link
Contributor Author

Transform continues to generate findings with the latest queries @opauloh

image image image

@andrewkroh andrewkroh added the Integration:cloud_asset_inventory Cloud Asset Discovery label Jan 14, 2026
@elasticmachine
Copy link

💚 Build Succeeded

History

cc @Omolola-Akinleye

Copy link
Contributor

@opauloh opauloh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for performing the tests with real data!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Integration:cloud_asset_inventory Cloud Asset Discovery Integration:cloud_security_posture Security Posture Management Team:Cloud Security Cloud Security team [elastic/cloud-security-posture]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants