Add new cronjob controller by alaypatel07 · Pull Request #3 · alaypatel07/kubernetes

alaypatel07 · 2020-06-22T04:36:44Z

No description provided.

pkg/controller/cronjob/cronjob_controller2.go

soltysh · 2020-07-02T10:42:50Z

pkg/controller/cronjob/cronjob_controller2.go

Leave a TODO here, we should try to limit the amount of data we scrape here. I'd start with a label selector which should help us with this.

the cronjob spec does not currently seem to have a selector field. Are you suggesting we add that to the spec and then use that to filter the jobs for the CronJob?

We could have a specific label apps.k8s.io=cronjob_ns/cronjob_name to filter out the jobs immediately, but I wonder if this is reliable? a user can easily (sometimes unintentionally) change this and it would skip the reconciliation loop (a side effect I'd like to not introduce)

Yeah, it doesn't have to be here, but would be nice from a performance pov.

soltysh · 2020-07-02T10:46:27Z

pkg/controller/cronjob/cronjob_controller2.go

klog.V(2).Infof

soltysh · 2020-07-02T10:46:42Z

pkg/controller/cronjob/cronjob_controller2.go

klog.V(2).Infof

soltysh · 2020-07-02T10:46:46Z

pkg/controller/cronjob/cronjob_controller2.go

klog.V(2).Infof

soltysh · 2020-07-02T11:00:19Z

pkg/controller/cronjob/cronjob_controller2.go

scheduledTime will be a time from the past, if I'm not mistaken, so I'm not sure that's what you want to use in the next two conditions, you'll just want to re-queue it for the next execution time, and getRecentUnmetScheduleTimes doesn't return that. It only returns past times. You'll need some kind of getNextScheduleTime.

my bad, I misunderstood our last conversation, I'll revert that change as I had before, good catch, +1

soltysh · 2020-07-02T11:01:20Z

pkg/controller/cronjob/cronjob_controller2.go

This is still the biggest one 😄

alaypatel07 · 2020-07-13T14:49:57Z

pkg/controller/cronjob/cronjob_controller2.go

@soltysh the way I am trying to think about the problem here is that, if the change is schedule results in next requeue having to be sooner than it already was, it will be handled here by the queue. If the next requeue is further than previous schedule, the sync loop will essentially be a no-op for the already queued key with old schedule.

…ional-probe-protocol Support customize load balancer health probe protocol and request path

fix golint fix gofmt

…ndency remove label dependency on k8s api in Azure

local-up-cluster.sh: Pass CLUSTER_CIDR to kube-proxy

Add --experimental-logging-sanitization flag to control plane components

…rs_without_vpc_native Forbid creating clusters with more than 100 nodes without vpc-native

Remove ready directory which created in empty volumeMounter setUp func

…te-evt Ignore some update Pod events in scheduler

e2e test for PodFsGroupChangePolicy feature

Remove duplicate import

…er-version Add vCenter info metric

…err-conn-killed stop logging killing connection/stream because serving request timed out and response had been started

…ere/update_windows_container_resources Add WindowsContainerResources to UpdateContainerResourcesRequest

…y-beta Move fsGroupChangePolicy feature to beta

Fix command and arg in NPD e2e

…v2-docs cloud-provider: update docs and guidance for InstanceV2 and Zones

Remove Const IPVSProxyMode

…ymlink-fix fixing issue where SMB share paths cannot resolve with CRI-containerD on Windows

The code path for handling non-JSON output from etcd was broken: - It did not skip over already parsed JSON output. - It got stuck in the wrong for loop and repeatedly tried parsing the same non-JSON output. This prevented test shutdown. Not sure why yet, but in the branch with DRA v1 graduation the following error started to show up for the first time (?!): 2025/07/18 19:24:48 WARNING: [core] [Server #3]grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"

DRA depends on the assume cache having invoked all event handlers before Assume() returns, because DRA maintains state that is relevant for scheduling through those event handlers. This log snippet shows how this went wrong during PreBind: dynamicresources.go:1150: I0115 10:35:29.264437] scheduler: Claim stored in assume cache pod="testdra-all-usesallresources-kqjpj/my-pod-0091" claim="testdra-all-usesallresources-kqjpj/claim-0091" uid=<types.UID>: 516f274f-e1a9-4a4b-b7d2-bb86138e4240 resourceVersion="5636" dra_manager.go:198: I0115 10:35:29.264448] scheduler: Removed in-flight claim claim="testdra-all-usesallresources-kqjpj/claim-0091" uid=<types.UID>: 516f274f-e1a9-4a4b-b7d2-bb86138e4240 version="287" dynamicresources.go:1157: I0115 10:35:29.264463] scheduler: Removed claim from in-flight claims pod="testdra-all-usesallresources-kqjpj/my-pod-0091" claim="testdra-all-usesallresources-kqjpj/claim-0091" uid=<types.UID>: 516f274f-e1a9-4a4b-b7d2-bb86138e4240 resourceVersion="5636" allocation=< ... allocateddevices.go:189: I0115 10:35:29.267315] scheduler: Observed device allocation device="testdra-all-usesallresources-kqjpj.driver/worker-1/worker-1-device-096" claim="testdra-all-usesallresources-kqjpj/claim-0091" - goroutine #1: UpdateStatus result delivered via informer. AssumeCache updates cache, pushes event A, emitEvents pulls event A from queue. *Not* done with delivering it yet! - goroutine #2: AssumeCache.Assume called. Updates cache, pushes event B, emits it. Old and new claim have allocation, so no "Observed device allocation". - goroutine #3: Schedules next pod, without considering device as allocated (not in the log snippet). - goroutine #1: Finally delivers event A: "Observed device allocation", but too late. Also, events are delivered out-of-order. The fix is to let emitEvents when called by Assume wait for a potentially running emitEvents in some other goroutine, thus ensuring that an event pulled out of the queue by that other goroutine got delivered before Assume itself checks the queue one more time and then returns. The time window were things go wrong is small. An E2E test covering this only flaked rarely, and only in the CI. An integration test (separate commit) with higher number of pods finally made it possible to reproduce locally. It also uncovered a second race (fix in separate commit). The unit test fails without the fix: === RUN TestAssumeConcurrency assume_cache_test.go:311: FATAL ERROR: Assume should have blocked and didn't. --- FAIL: TestAssumeConcurrency (0.00s)

soltysh reviewed Jul 2, 2020

View reviewed changes

alaypatel07 force-pushed the add-new-cronjob-controller branch from 2689cd9 to b2d8826 Compare July 11, 2020 01:45

alaypatel07 commented Jul 13, 2020

View reviewed changes

alaypatel07 force-pushed the add-new-cronjob-controller branch 7 times, most recently from 971d9c7 to e74eea6 Compare July 26, 2020 21:53

alaypatel07 force-pushed the add-new-cronjob-controller branch 3 times, most recently from 6ac7233 to 3ee24ab Compare August 27, 2020 04:42

alaypatel07 force-pushed the add-new-cronjob-controller branch 7 times, most recently from bcd1019 to 868cb40 Compare October 6, 2020 04:34

alaypatel07 force-pushed the add-new-cronjob-controller branch 10 times, most recently from 31e8b4e to 92e44ed Compare October 27, 2020 05:30

pigletfly and others added 3 commits November 10, 2020 19:11

Remove duplicate import

19a36d4

Merge pull request kubernetes#96338 from nilo19/feature/support-addit…

59405cc

…ional-probe-protocol Support customize load balancer health probe protocol and request path

remove label dependency on k8s api in Azure

7c4ae18

fix golint fix gofmt

alaypatel07 force-pushed the add-new-cronjob-controller branch from 7321af6 to 9e12624 Compare November 10, 2020 14:20

Merge pull request kubernetes#96414 from andyzhangx/remove-label-depe…

4926bee

…ndency remove label dependency on k8s api in Azure

alaypatel07 force-pushed the add-new-cronjob-controller branch from 9e12624 to 881748d Compare November 10, 2020 15:09

k8s-ci-robot and others added 3 commits November 10, 2020 07:17

Merge pull request kubernetes#96028 from masap/local_up_cluster2

26f09b7

local-up-cluster.sh: Pass CLUSTER_CIDR to kube-proxy

Forbid creating clusters with more than 100 nodes without vpc-native

6e598a9

Merge pull request kubernetes#96370 from serathius/sanitization

0ad06e9

Add --experimental-logging-sanitization flag to control plane components

alaypatel07 force-pushed the add-new-cronjob-controller branch from bbcad5a to 43227e5 Compare November 10, 2020 16:16

k8s-ci-robot added 13 commits November 10, 2020 10:00

Merge pull request kubernetes#96418 from marseel/fix_large_k8s_cluste…

3146daf

…rs_without_vpc_native Forbid creating clusters with more than 100 nodes without vpc-native

Merge pull request kubernetes#95770 from jingxu97/oct/readyfile

f458996

Remove ready directory which created in empty volumeMounter setUp func

Merge pull request kubernetes#96071 from Huang-Wei/phantom-sched-upda…

2b8f43b

…te-evt Ignore some update Pod events in scheduler

Merge pull request kubernetes#96247 from saikat-royc/iss-95590

02528ce

e2e test for PodFsGroupChangePolicy feature

Merge pull request kubernetes#96413 from pigletfly/fix-kubelet-import

d41f791

Remove duplicate import

Merge pull request kubernetes#94526 from Danil-Grigorev/metrics-vcent…

0f6d1ed

…er-version Add vCenter info metric

Merge pull request kubernetes#95002 from p0lyn0mial/upstream-supress-…

40ef0ad

…err-conn-killed stop logging killing connection/stream because serving request timed out and response had been started

Merge pull request kubernetes#95741 from katiewasnothere/katiewasnoth…

3061743

…ere/update_windows_container_resources Add WindowsContainerResources to UpdateContainerResourcesRequest

Merge pull request kubernetes#96376 from gnufied/fsgroup-change-polic…

6068f12

…y-beta Move fsGroupChangePolicy feature to beta

Merge pull request kubernetes#96381 from karan/npd-test-cmd

84745e2

Fix command and arg in NPD e2e

Merge pull request kubernetes#96397 from andrewsykim/update-instance-…

c3c050c

…v2-docs cloud-provider: update docs and guidance for InstanceV2 and Zones

Merge pull request kubernetes#96248 from qingsenLi/201105-IPVSProxyMode

38f5dc8

Remove Const IPVSProxyMode

Merge pull request kubernetes#96396 from marosset/windows-smb-mount-s…

1d4c0ad

…ymlink-fix fixing issue where SMB share paths cannot resolve with CRI-containerD on Windows

alaypatel07 force-pushed the add-new-cronjob-controller branch 2 times, most recently from 75e1832 to 7f669ff Compare November 10, 2020 22:16

alaypatel07 added 2 commits November 10, 2020 17:32

add cronjob_controllerv2.go

8d7dd44

update violation_exceptions.list and make generated

38bb535

alaypatel07 force-pushed the add-new-cronjob-controller branch from 7f669ff to 38bb535 Compare November 10, 2020 22:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new cronjob controller#3

Add new cronjob controller#3
alaypatel07 wants to merge 1042 commits intomasterfrom
add-new-cronjob-controller

alaypatel07 commented Jun 22, 2020

Uh oh!

Uh oh!

Uh oh!

soltysh Jul 2, 2020

Uh oh!

alaypatel07 Jul 11, 2020

Uh oh!

soltysh Jul 13, 2020

Uh oh!

soltysh Jul 2, 2020

Uh oh!

soltysh Jul 2, 2020

Uh oh!

soltysh Jul 2, 2020

Uh oh!

soltysh Jul 2, 2020

Uh oh!

alaypatel07 Jul 11, 2020

Uh oh!

soltysh Jul 2, 2020

Uh oh!

alaypatel07 Jul 13, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

alaypatel07 commented Jun 22, 2020

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants