Merge upstream 2025 03 07 #810

aliculPix4D · 2025-03-07T10:17:28Z

lets sync one more time...

The maxValidFromFile function was incorrectly using the hardcoded uidMap path regardless of the filename parameter passed to it. This caused the MaxValidIds method to always read from /proc/self/uid_map twice instead of reading from both uid_map and gid_map files. Fixed by modifying maxValidFromFile to properly use the fname parameter both when opening the file and when generating error messages. Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>

Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>

In the case where we failed to create a container we would log the error twice in the web node and also mark it as failed twice. The second call to mark the container as failed would itself fail and return an error because the container is already marked as failed. Signed-off-by: Taylor Silva <dev@taydev.net>

We found UUID collisions occuring in our cluster. We thought it was impossible but I decided to look into the library we were using for UUID generation and found this issue: nu7hatch/gouuid#28 Which would explain some of the errors I've seen where the only possible explanation was that we got the same UUID twice. Signed-off-by: Taylor Silva <dev@taydev.net>

Fix bug in maxValidFromFile using hardcoded path

on systems where the test was slow to run it seemed that this testing of the batch duration would flake a lot. I think there was also a general race condition from accessing the NewRelicBatch array since it does get cleared out in a go routine. I left NewRelicBatch public still because it makes the testing setup easier, but now there's a Batch() function that can be used to safely access the array. Signed-off-by: Taylor Silva <dev@taydev.net>

Use github.com/google/uuid to generate UUIDv4

Makes process killing concurrent by launching Kill operations in separate goroutines. The buffered error channel ensures we don't block goroutines while waiting for errors to be processed. Returns on first error to maintain original behavior. Each Kill operation executes independently, allowing multiple processes to be killed simultaneously while still respecting grace periods and proper error handling. Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>

worker runtime: concurrent process killing

…om-go-jose-go-jose-v3-vulnerability Update module github.com/go-jose/go-jose/v3 to v3.0.4 [SECURITY]

We can use the built-in `min` and `max` functions since Go 1.21. Reference: https://go.dev/ref/spec#Min_and_max Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

We found that if a container wasn't correctly made, specifically if the `/etc/hosts` file of the container wasn't populated with the containers IP, container cleanup would fail with this error: ``` resume container traffic: error getting container IP: ip not found for container handle: 0ce036f8-aa2a-4bea-acbd-8ee48e5fa63b ``` We can ignore this error when deleting a container because it tells us there are no iptables rules to remove anyways. We can probably continue deleting the container and will delete it successfully. Signed-off-by: Taylor Silva <dev@taydev.net>

…lookup-errors worker runtime: make container deletion more robust

Replace min/max helpers with built-in min/max

- Fix variable source diffs printing to stdout instead of provided writer - Enhance practicallyDifferent to properly handle nil values - nit: change interface{} to any, slowly modernizing go code Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net> Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>

The error message when encountering an invalid placement strategy incorrectly referenced the strategy variable (the accumulating result) rather than the specific invalid strategy string that caused the error. This resulted in misleading error messages that showed all strategies added so far instead of just the invalid one. Changed the error message to reference the specific invalid strategy string for clearer troubleshooting. nit: used newer `range int` feature in Go to make loop more readable Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>

Replace mutex-based implementation with Go's atomic types to reduce contention and improve performance under high concurrency. This change eliminates lock overhead while maintaining thread safety. Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>

…reds Ensure the read lock is properly released using defer to guarantee thread safety when iterating over credentials. This prevents a possible race condition if multiple goroutines access the Tracker concurrently. nit: modernize code by using any instead of interface{} Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>

The scanner.scanResources method was creating a goroutine for each resource without any concurrency limits. For deployments with many resources, this could lead to resource exhaustion on the web node and possibly the database. This commit introduces a concurrency limit of 50 simultaneous resource checks. The implementation spawns 50 go routines and works through resources passed down a channel. Additionally, improved context handling was added to ensure proper cleanup when the scanner is being shut down or the context is otherwise canceled. This prevents orphaned goroutines from continuing to run after their parent process has been signaled to stop. These changes make the scanner more robust and predictable under high load while maintaining the existing parallel checking behavior. Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>

atc: Various small fixes

Signed-off-by: Taylor Silva <dev@taydev.net>

…-test CI: only run baggageclaimcmd test on linux

1. Fixes improper use of defer Close(rows) inside loops in the JobsToSchedule method which would cause resource leaks, as deferred calls would only execute when the function returns, not at the end of loop iterations. 2. Removes redundant database query in GetPendingBuilds method that was re-fetching job information already available in the job struct. 3. Adds missing defer Close(rows) calls in multiple methods 4. Ensures proper error handling when scanning database rows. Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>

This commit adds implementations for the context-aware methods in the logConn wrapper that were previously missing. The DbConn interface requires several context-aware methods (ExecContext, QueryContext, QueryRowContext, PrepareContext, BeginTx) which were not explicitly implemented in logConn. Without these implementations, any context-aware database operations would bypass the logging functionality, leading to inconsistent logging behavior. The fix ensures that all database operations, both with and without context, are properly logged. Additionally, this commit completes the logDbTx implementation by adding missing context methods to maintain consistency throughout the entire logging wrapper system. Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>

Refactored the CleanUpInvalidTaskCaches function to use a single SQL operation instead of two separate queries. This change improves code readability and efficiency by using a subquery directly within the DELETE statement to identify and remove inactive task caches in a single database operation. Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>

Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>

- Apply early continue pattern in constrainingCandidates() to reduce nesting and improve code clarity - Pre-allocate slice capacity in orderJobs() to avoid multiple allocations during append operations These changes maintain identical functionality while making the code more maintainable and slightly more efficient. Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>

The Get and Set methods of ContainerSpec incorrectly used strings.SplitN with arguments in the wrong order (separator and string were swapped). This bug would prevent proper parsing of environment variables for tracing propagation. - Fixed strings.SplitN call in Get method - Fixed strings.SplitN call in Set method Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>

Fix several issues in pipeline cycle detection: - Reset visited state for each root job to prevent false positives - Replace numeric constants with JobState enum for better clarity - Ensure immediate error propagation when cycle is detected Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>

fly: improve performance of fly watch

web: Update frontend graphics dependencies

Signed-off-by: Maria Shaldybin <maria.shaldybin@broadcom.com>

Mount /sys/fs/cgroup as cgroup2 type if supported

We found that when the worker is restarted that all containers would be present, but any tasks running in them would disappear. This would result in TaskNotFound errors, mostly in resource checks. Resource checks are more prone to this type of error because we re-use their containers over a long period of time. The other types of containers we make (get, put, task) usually don't hang around that long. We initially create a Task during container creation and assume that task would still be there when we go to exec the actual executable that we wanted to run (e.g. /opt/resource/{in,out,check}). For long running containers this may not be true but we can gracefully recover in this scenario. Signed-off-by: Taylor Silva <dev@taydev.net>

Signed-off-by: Taylor Silva <dev@taydev.net>

Gracefully recover from containerd TaskNotFound errors

…25-03-07

We've had a few PR's that are specific to our linux-containerd implementation: concourse#9017 and concourse#9094 After these PR's folks in the community found that they couldn't compile Concourse for non-Linux platforms. This commit tries to make it easier for future contributors to make changes to the containerd-linux runtime and not accidently break things for non-linux platforms. Signed-off-by: Taylor Silva <dev@taydev.net>

Signed-off-by: Taylor Silva <dev@taydev.net>

worker: split linux and non-linux features

Signed-off-by: IvanChalukov <ichalukov@gmail.com>

Previously we could spawn as many in-memory checks and they wouldn't be checked for duplication until the atc/db/lock acquired a Mutex and tried to get a lock from the database. I'm _guessing_, based on the info from concourse#8638 that the contention for this lock would get really high. I'm guessing this and other users cluster are very high usage clusters. It's possible that lidar may send off multiple checks for the same resource which would result in multiple go routines running and fighting over the same lock. Lidar would send off multiple in-memory checks if every time it runs it still finds that a resource has exceeded it's check interval. I'm not 100% sure this is the problem and I don't have an easy way to test this. This is all based on my reading of the code as it is today and the pprof memory graph reported from users that showed a lot of in-memory checks waiting on the db/lock Mutex. Signed-off-by: Taylor Silva <dev@taydev.net>

this code isn't being used anywhere. Dead code. Signed-off-by: Taylor Silva <dev@taydev.net>

and removed deprecated calls to rand.Seed() which is a no-op starting in Go 1.24 Signed-off-by: Taylor Silva <dev@taydev.net>

web: avoid duplicate in-memory checks

…laim worker: fix baggageclaim tests on windows

Fix `--team` flag in `order-pipelines` command

Signed-off-by: IvanChalukov <ichalukov@gmail.com>

…che-team Add `--team` flag to `clear-resource-cache` command

…25-03-07

analytically and others added 30 commits February 25, 2025 10:30

worker filestore: remove unreachable code

0a1d585

Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>

Merge pull request concourse#9082 from concourse/maxValidFromFile

4b953e1

Fix bug in maxValidFromFile using hardcoded path

Merge pull request concourse#9083 from concourse/replace-uuid-lib

8dc27d0

Use github.com/google/uuid to generate UUIDv4

Merge pull request concourse#9084 from concourse/runtime-kill-concurrent

8d5063f

worker runtime: concurrent process killing

Update module github.com/go-jose/go-jose/v3 to v3.0.4 [SECURITY]

5774646

Merge pull request concourse#9087 from concourse/renovate/go-github.c…

47fc760

…om-go-jose-go-jose-v3-vulnerability Update module github.com/go-jose/go-jose/v3 to v3.0.4 [SECURITY]

Replace min/max helpers with built-in min/max

523238a

We can use the built-in `min` and `max` functions since Go 1.21. Reference: https://go.dev/ref/spec#Min_and_max Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

Merge pull request concourse#9090 from concourse/ignore-container-ip-…

59c5e67

…lookup-errors worker runtime: make container deletion more robust

Merge pull request concourse#9089 from Juneezee/minmax

5fdede7

Replace min/max helpers with built-in min/max

Merge pull request concourse#9092 from concourse/small-fixes

2cfed36

atc: Various small fixes

only run baggageclaimcmd test on linux

e0c6f11

Signed-off-by: Taylor Silva <dev@taydev.net>

Merge pull request concourse#9093 from concourse/fix-baggageclaim-cmd…

3b664c5

…-test CI: only run baggageclaimcmd test on linux

web: improve index template asset caching implementation

73b3996

Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com> Co-authored-by: Taylor Silva <dev@taydev.net>

taylorsilva and others added 8 commits March 5, 2025 19:07

Merge pull request concourse#9097 from concourse/fly-writer

2c80e78

fly: improve performance of fly watch

Merge pull request concourse#9098 from concourse/update-web-deps

c8ce937

web: Update frontend graphics dependencies

Mount /sys/fs/cgroup as type cgroup2 if supported

ae402e3

Signed-off-by: Maria Shaldybin <maria.shaldybin@broadcom.com>

Merge pull request concourse#9094 from mariash/cgroup2-mount

8503568

Mount /sys/fs/cgroup as cgroup2 type if supported

use newer slices.ContainsFunc

9a29e29

Signed-off-by: Taylor Silva <dev@taydev.net>

Merge pull request concourse#9100 from concourse/issue/8172

f9174a2

Gracefully recover from containerd TaskNotFound errors

Merge remote-tracking branch 'upstream/master' into merge-upstream-20…

fe0fbeb

…25-03-07

aliculPix4D requested a review from a team as a code owner March 7, 2025 10:17

aliculPix4D requested review from SpectreVert, iAmoric, marco-m-pix4d and odormond March 7, 2025 10:17

marco-m-pix4d approved these changes Mar 7, 2025

View reviewed changes

taylorsilva and others added 13 commits March 7, 2025 11:39

atc: reduce chance of race condition in test setup

209bc72

Signed-off-by: Taylor Silva <dev@taydev.net>

Merge pull request concourse#9101 from concourse/9094-follow-up

55ff4b2

worker: split linux and non-linux features

fix flag in command

89f3150

Signed-off-by: IvanChalukov <ichalukov@gmail.com>

remove in-memory lock

324d27f

this code isn't being used anywhere. Dead code. Signed-off-by: Taylor Silva <dev@taydev.net>

worker: fix baggageclaim tests on windows

552332e

and removed deprecated calls to rand.Seed() which is a no-op starting in Go 1.24 Signed-off-by: Taylor Silva <dev@taydev.net>

Merge pull request concourse#9103 from concourse/issue/8638

8c0fcad

web: avoid duplicate in-memory checks

Merge pull request concourse#9104 from concourse/fix-windows-baggagec…

25db616

…laim worker: fix baggageclaim tests on windows

Merge pull request concourse#9102 from IvanChalukov/master

fc16cbd

Fix `--team` flag in `order-pipelines` command

Add --team flag to clear-resource-cache command

824e266

Signed-off-by: IvanChalukov <ichalukov@gmail.com>

Merge pull request concourse#9106 from IvanChalukov/clear-resource-ca…

755216b

…che-team Add `--team` flag to `clear-resource-cache` command

Merge remote-tracking branch 'upstream/master' into merge-upstream-20…

c479316

…25-03-07

SpectreVert approved these changes Mar 10, 2025

View reviewed changes

aliculPix4D merged commit 83194c2 into master Mar 10, 2025
10 checks passed

aliculPix4D deleted the merge-upstream-2025-03-07 branch March 10, 2025 11:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge upstream 2025 03 07 #810

Merge upstream 2025 03 07 #810

Uh oh!

aliculPix4D commented Mar 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Merge upstream 2025 03 07 #810

Merge upstream 2025 03 07 #810

Uh oh!

Conversation

aliculPix4D commented Mar 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants