Skip to content

Conversation

@anchi205
Copy link
Contributor

@anchi205 anchi205 commented Jan 23, 2026

Changes

Add stepResources field in Build and BuildRun APIs, to override resources for specific steps in a BuildStrategy or ClusterBuildStrategy.

Fixes shipwright-io/build#1894

Submitter Checklist

  • Includes tests if functionality changed/was added
  • Includes docs if changes are user-facing
  • Set a kind label on this PR
  • Release notes block has been filled in, or marked NONE

See the contributor guide
for details on coding conventions, github and prow interactions, and the code review process.

Release Notes

@pull-request-size pull-request-size bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 23, 2026
Copy link
Member

@adambkaplan adambkaplan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

This is an excellent 1st SHIP, and definitely meets the requirements for a provisional proposal. We could upgrade this status to "implementable", however I think to do so warrants further discussion that would block merge. I don't think we should do that - a separate PR encourages contribution from others.

My top of mind for making this "implementable" and ready for merge:

  1. How should we use of Tekton's computResources API or Kubernetes pod resources? Should these be an implementation detail, or a separate feature?
  2. Should we add features to the shp CLI, or leave this as a pure "YAML-only" developer experience?
  3. Do we need to add anything to the operator? For example, does our implementation logic need to change depending on the version of Tekton or Kubernetes is deployed?

I think it's fine to add these as "open questions" in the proposal document.


## Summary

This feature introduces a `stepResources` field in the Build and BuildRun APIs, enabling developers to `override` CPU, memory, or ephemeral storage requirements for specific steps defined in a BuildStrategy or ClusterBuildStrategy, without needing to duplicate and modify the strategy itself. Currently, resource requirements are hardcoded at the strategy level, for which either we need to duplicate strategies for different resource needs or request platform admins to modify shared strategies. With this change, users can specify per-step resource overrides directly in their Build or BuildRun specs, and the controller will merge these overrides at runtime when generating the Tekton TaskRun with BuildRun overrides taking precedence over Build overrides, which in turn override strategy defaults. This follows the existing Shipwright override patterns for parameters, volumes, and env., maintaining API consistency while enabling fine-grained, per-build resource customization.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: split to multiple lines (I recommend after 100 characters). Not all systems wrap text on a single line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

### Goals

- Users can specify step-level resource overrides in Build spec and BuildRun spec, without creating duplicate BuildStrategies or requesting changes to shared ClusterBuildStrategies.
- When resources are specified at multiple levels, BuildRun overrides take precedence over Build overrides, which take precedence over strategy defaults.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 - this lines up with our existing precedence patterns.

Comment on lines +96 to +120
stepResources:
- name: build
resources:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 - I think it makes sense to put this API in the strategy section.


## Drawbacks

- `stepResources` will be placed under `spec.strategy` while similar override fields (`paramValues`, `volumes`, `env`) are at the `spec` level. This inconsistency may confuse users.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to call out here, but I think your instincts to place this in strategy is the right choice, since the step names are directly coupled to the selected build strategy.

## Drawbacks

- `stepResources` will be placed under `spec.strategy` while similar override fields (`paramValues`, `volumes`, `env`) are at the `spec` level. This inconsistency may confuse users.
- Allowing arbitrary overrides may lead to builds failing in unexpected ways (e.g., setting memory too low causes OOMkilled, or too high wastes cluster resources).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise good to call out expected failure modes, including "silent" failures such as underutilization.


## Alternatives

- Pass step resource overrides directly to Tekton's `TaskRun.spec.taskSpec.stepSpecs[].computeResources` instead of implementing Shipwright-native abstraction. Trade-off: Shipwright's abstraction is broken and creates inconsistency.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

- Users create their own namespace-scoped `BuildStrategy` with custom resources. Trade-off: It is cumbersome as user needs to duplicate the BuildStrategy to add their custom resource requirements.
- Inject default resources into ClusterBuildStrategies at deployment time via operator. Trade-off: Only sets global defaults, no per-build control.
- Use `LimitRange` to set default resources across namespace. Trade-off: Applies to entire namespace, not build-specific.
- Use Kubernetes native pod-level resource specification.`Kubernetes v1.34 [beta]`(enabled by default). Resources are shared across all containers in pod. Trade-off: Requires PodLevelResources feature gate enabled; build steps run sequentially so no concurrent sharing benefit. No per-step control.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats - you have convinced me the value of having per-step control!

One of the challenges of using Tekton is that from the perspective of Kubernetes, all containers in a Tekton TaskRun pod execute simultaneously. This causes the scheduler to assign pods to a node whose capacity is equal to the sum of requested resources across all steps. See Tekton's article on compute resources for details.

Use of Tekton's computeResources and/or Kubernetes Pod Level Resources could be left as an implementation detail, or a follow up feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes agreed, I had included task-level computeResources as a Non-Goal (line 57) as well as in Alternatives (line 166). Since both Tekton's task-level computeResources and Kubernetes Pod Level Resources are in beta, I'll add them to Non-Goals to be considered as follow-up features

- Feature works with both BuildStrategy and ClusterBuildStrategy.
- Invalid step names in `stepResources` rejects the Build with clear errors.
- Existing builds without stepResources continue to work unchanged (The feature is backward compatible). Builds that do not specify `stepResources` continue to use strategy-defined defaults with no behavioral change.
- The Tekton TaskRun created by the controller contains the merged resource values in `spec.taskSpec.steps[].computeResources`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave this out as a goal, and move it to a detail in the implementation notes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- TBD
creation-date: 2026-01-23
last-updated: 2026-01-23
status: provisional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This meets the criteria for an implementable proposal 🎉 , however to get it "over the finish line" we would need more community discussion and feedback. I would rather merge as "provisional" now and have a follow-up PR where those details are ironed out.

- Existing builds without stepResources continue to work unchanged (The feature is backward compatible). Builds that do not specify `stepResources` continue to use strategy-defined defaults with no behavioral change.
- The Tekton TaskRun created by the controller contains the merged resource values in `spec.taskSpec.steps[].computeResources`.

### Non-Goals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider the resources for the Git clone and output image tasks as being in scope, or out of scope as a non-goal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the resources for the Git clone step and image-processing step are already configurable cluster-wide by administrators via container templates (GIT_CONTAINER_TEMPLATE, IMAGE_PROCESSING_CONTAINER_TEMPLATE) in the Shipwright controller configuration, I believe changing this would increase complexity and disturbs the separation of concerns (admin level cluster config vs user level per build config) we have. So I think we can keep it out of scope as a non-goal

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think administrators will be populating GIT_CONTAINER_TEMPLATE over the default on provided by the code, just to take advantage of the resource quotas. It's good to provide in the API itself, but I agree we can do it in separate scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, makes sense. I think we can have it as a follow up then.

@openshift-ci
Copy link

openshift-ci bot commented Jan 27, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adambkaplan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 27, 2026
@anchi205
Copy link
Contributor Author

/approve

This is an excellent 1st SHIP, and definitely meets the requirements for a provisional proposal. We could upgrade this status to "implementable", however I think to do so warrants further discussion that would block merge. I don't think we should do that - a separate PR encourages contribution from others.

My top of mind for making this "implementable" and ready for merge:

  1. How should we use of Tekton's computResources API or Kubernetes pod resources? Should these be an implementation detail, or a separate feature?
  2. Should we add features to the shp CLI, or leave this as a pure "YAML-only" developer experience?
  3. Do we need to add anything to the operator? For example, does our implementation logic need to change depending on the version of Tekton or Kubernetes is deployed?

I think it's fine to add these as "open questions" in the proposal document.

Thank you @adambkaplan for your feedback ! This was very helpful.

Regarding the questions, this is what I think:

  1. Users will specify stepResources in Build/BuildRun spec; internally we set Tekton's Step.ComputeResources. Tekton's task-level computeResources and Kubernetes Pod Level Resources could be kept as follow up

  2. We can have shp CLI support as a follow up feature. For now, we can keep it "YAML-only".

  3. No operator changes are required. The implementation will use Step.ComputeResources which is a stable Tekton API available in all supported Tekton versions.(https://pkg.go.dev/github.com/tektoncd/pipeline/pkg/apis/pipeline/v1#Step:~:text=SkippingReason%20%3D%20%22[…]%20Step%20%C2%B6,-type%20Step%20struct). When we later add support for the beta features (task-level resources, K8s Pod Level Resources), the operator will need to detect versions.

I am still adding these as open questions in this SHIP, for further discussion.

Add stepResources field in Build and BuildRun APIs, to override resources for specific steps in a BuildStrategy or ClusterBuildStrategy.

Signed-off-by: Anchita Borah <anborah@redhat.com>
@sayan-biswas
Copy link

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 29, 2026
@openshift-merge-bot openshift-merge-bot bot merged commit 3eff72e into shipwright-io:main Jan 29, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[FEATURE] Override Strategy Step Resources in Build and BuildRuns

3 participants