forked from concourse/concourse
-
Notifications
You must be signed in to change notification settings - Fork 0
Merge upstream 2025 03 07 #810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The maxValidFromFile function was incorrectly using the hardcoded uidMap path regardless of the filename parameter passed to it. This caused the MaxValidIds method to always read from /proc/self/uid_map twice instead of reading from both uid_map and gid_map files. Fixed by modifying maxValidFromFile to properly use the fname parameter both when opening the file and when generating error messages. Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>
Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>
In the case where we failed to create a container we would log the error twice in the web node and also mark it as failed twice. The second call to mark the container as failed would itself fail and return an error because the container is already marked as failed. Signed-off-by: Taylor Silva <dev@taydev.net>
We found UUID collisions occuring in our cluster. We thought it was impossible but I decided to look into the library we were using for UUID generation and found this issue: nu7hatch/gouuid#28 Which would explain some of the errors I've seen where the only possible explanation was that we got the same UUID twice. Signed-off-by: Taylor Silva <dev@taydev.net>
Fix bug in maxValidFromFile using hardcoded path
on systems where the test was slow to run it seemed that this testing of the batch duration would flake a lot. I think there was also a general race condition from accessing the NewRelicBatch array since it does get cleared out in a go routine. I left NewRelicBatch public still because it makes the testing setup easier, but now there's a Batch() function that can be used to safely access the array. Signed-off-by: Taylor Silva <dev@taydev.net>
Use github.com/google/uuid to generate UUIDv4
Makes process killing concurrent by launching Kill operations in separate goroutines. The buffered error channel ensures we don't block goroutines while waiting for errors to be processed. Returns on first error to maintain original behavior. Each Kill operation executes independently, allowing multiple processes to be killed simultaneously while still respecting grace periods and proper error handling. Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>
worker runtime: concurrent process killing
…om-go-jose-go-jose-v3-vulnerability Update module github.com/go-jose/go-jose/v3 to v3.0.4 [SECURITY]
We can use the built-in `min` and `max` functions since Go 1.21. Reference: https://go.dev/ref/spec#Min_and_max Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
We found that if a container wasn't correctly made, specifically if the `/etc/hosts` file of the container wasn't populated with the containers IP, container cleanup would fail with this error: ``` resume container traffic: error getting container IP: ip not found for container handle: 0ce036f8-aa2a-4bea-acbd-8ee48e5fa63b ``` We can ignore this error when deleting a container because it tells us there are no iptables rules to remove anyways. We can probably continue deleting the container and will delete it successfully. Signed-off-by: Taylor Silva <dev@taydev.net>
…lookup-errors worker runtime: make container deletion more robust
Replace min/max helpers with built-in min/max
- Fix variable source diffs printing to stdout instead of provided writer
- Enhance practicallyDifferent to properly handle nil values
- nit: change interface{} to any, slowly modernizing go code
Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com>
Co-authored-by: Taylor Silva <dev@taydev.net>
Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com>
Co-authored-by: Taylor Silva <dev@taydev.net>
The error message when encountering an invalid placement strategy incorrectly referenced the strategy variable (the accumulating result) rather than the specific invalid strategy string that caused the error. This resulted in misleading error messages that showed all strategies added so far instead of just the invalid one. Changed the error message to reference the specific invalid strategy string for clearer troubleshooting. nit: used newer `range int` feature in Go to make loop more readable Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>
Replace mutex-based implementation with Go's atomic types to reduce contention and improve performance under high concurrency. This change eliminates lock overhead while maintaining thread safety. Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>
…reds
Ensure the read lock is properly released using defer to guarantee thread safety
when iterating over credentials. This prevents a possible race condition if
multiple goroutines access the Tracker concurrently.
nit: modernize code by using any instead of interface{}
Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com>
Co-authored-by: Taylor Silva <dev@taydev.net>
The scanner.scanResources method was creating a goroutine for each resource without any concurrency limits. For deployments with many resources, this could lead to resource exhaustion on the web node and possibly the database. This commit introduces a concurrency limit of 50 simultaneous resource checks. The implementation spawns 50 go routines and works through resources passed down a channel. Additionally, improved context handling was added to ensure proper cleanup when the scanner is being shut down or the context is otherwise canceled. This prevents orphaned goroutines from continuing to run after their parent process has been signaled to stop. These changes make the scanner more robust and predictable under high load while maintaining the existing parallel checking behavior. Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>
atc: Various small fixes
Signed-off-by: Taylor Silva <dev@taydev.net>
…-test CI: only run baggageclaimcmd test on linux
1. Fixes improper use of defer Close(rows) inside loops in the JobsToSchedule method which would cause resource leaks, as deferred calls would only execute when the function returns, not at the end of loop iterations. 2. Removes redundant database query in GetPendingBuilds method that was re-fetching job information already available in the job struct. 3. Adds missing defer Close(rows) calls in multiple methods 4. Ensures proper error handling when scanning database rows. Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>
This commit adds implementations for the context-aware methods in the logConn wrapper that were previously missing. The DbConn interface requires several context-aware methods (ExecContext, QueryContext, QueryRowContext, PrepareContext, BeginTx) which were not explicitly implemented in logConn. Without these implementations, any context-aware database operations would bypass the logging functionality, leading to inconsistent logging behavior. The fix ensures that all database operations, both with and without context, are properly logged. Additionally, this commit completes the logDbTx implementation by adding missing context methods to maintain consistency throughout the entire logging wrapper system. Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>
Refactored the CleanUpInvalidTaskCaches function to use a single SQL operation instead of two separate queries. This change improves code readability and efficiency by using a subquery directly within the DELETE statement to identify and remove inactive task caches in a single database operation. Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>
Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>
- Apply early continue pattern in constrainingCandidates() to reduce nesting and improve code clarity - Pre-allocate slice capacity in orderJobs() to avoid multiple allocations during append operations These changes maintain identical functionality while making the code more maintainable and slightly more efficient. Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>
The Get and Set methods of ContainerSpec incorrectly used strings.SplitN with arguments in the wrong order (separator and string were swapped). This bug would prevent proper parsing of environment variables for tracing propagation. - Fixed strings.SplitN call in Get method - Fixed strings.SplitN call in Set method Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>
Fix several issues in pipeline cycle detection: - Reset visited state for each root job to prevent false positives - Replace numeric constants with JobState enum for better clarity - Ensure immediate error propagation when cycle is detected Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>
fly: improve performance of fly watch
web: Update frontend graphics dependencies
Signed-off-by: Maria Shaldybin <maria.shaldybin@broadcom.com>
Mount /sys/fs/cgroup as cgroup2 type if supported
We found that when the worker is restarted that all containers would be
present, but any tasks running in them would disappear. This would result
in TaskNotFound errors, mostly in resource checks.
Resource checks are more prone to this type of error because we re-use
their containers over a long period of time. The other types of
containers we make (get, put, task) usually don't hang around that long.
We initially create a Task during container creation and assume that
task would still be there when we go to exec the actual executable that
we wanted to run (e.g. /opt/resource/{in,out,check}). For long running
containers this may not be true but we can gracefully recover in this
scenario.
Signed-off-by: Taylor Silva <dev@taydev.net>
Signed-off-by: Taylor Silva <dev@taydev.net>
Gracefully recover from containerd TaskNotFound errors
marco-m-pix4d
approved these changes
Mar 7, 2025
We've had a few PR's that are specific to our linux-containerd implementation: concourse#9017 and concourse#9094 After these PR's folks in the community found that they couldn't compile Concourse for non-Linux platforms. This commit tries to make it easier for future contributors to make changes to the containerd-linux runtime and not accidently break things for non-linux platforms. Signed-off-by: Taylor Silva <dev@taydev.net>
Signed-off-by: Taylor Silva <dev@taydev.net>
worker: split linux and non-linux features
Signed-off-by: IvanChalukov <ichalukov@gmail.com>
Previously we could spawn as many in-memory checks and they wouldn't be checked for duplication until the atc/db/lock acquired a Mutex and tried to get a lock from the database. I'm _guessing_, based on the info from concourse#8638 that the contention for this lock would get really high. I'm guessing this and other users cluster are very high usage clusters. It's possible that lidar may send off multiple checks for the same resource which would result in multiple go routines running and fighting over the same lock. Lidar would send off multiple in-memory checks if every time it runs it still finds that a resource has exceeded it's check interval. I'm not 100% sure this is the problem and I don't have an easy way to test this. This is all based on my reading of the code as it is today and the pprof memory graph reported from users that showed a lot of in-memory checks waiting on the db/lock Mutex. Signed-off-by: Taylor Silva <dev@taydev.net>
this code isn't being used anywhere. Dead code. Signed-off-by: Taylor Silva <dev@taydev.net>
and removed deprecated calls to rand.Seed() which is a no-op starting in Go 1.24 Signed-off-by: Taylor Silva <dev@taydev.net>
web: avoid duplicate in-memory checks
…laim worker: fix baggageclaim tests on windows
Fix `--team` flag in `order-pipelines` command
Signed-off-by: IvanChalukov <ichalukov@gmail.com>
…che-team Add `--team` flag to `clear-resource-cache` command
SpectreVert
approved these changes
Mar 10, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
lets sync one more time...