Skip to content

Conversation

@gbhat618
Copy link
Contributor

@gbhat618 gbhat618 commented Dec 29, 2025

In a high frequency cronjob (running every 2min) seeing after about ~2.5h the connections are getting stuck at com.nirima.jenkins.plugins.docker.DockerCloud.countContainersInDocker, and since that, no new builds using the agent from this docker cloud would work.
Restarting the jenkins controller again makes the builds start working.

Indicating a stuck socket connection, possible fix is in supporting SO_TIMEOUT > 0 at https://github.com/docker-java/docker-java/blob/faa88e16460a8cb321c9695cdbc34cb7a662458e/docker-java-transport-httpclient5/src/main/java/com/github/dockerjava/httpclient5/ApacheDockerHttpClientImpl.java#L117-L122

It would be working if each build is separated by at least 5 minute, so the cache could expire, as,

CLIENT_CACHE = new UsageTrackingCache(5, TimeUnit.MINUTES, expiryHandler);
and
CacheBuilder<Object, Object> cacheBuilder = CacheBuilder.newBuilder();
cacheBuilder = cacheBuilder.expireAfterAccess(duration, unit);

thread dump
"jenkins.util.Timer [#5]" #68 [103] daemon prio=5 os_prio=0 cpu=1486.31ms elapsed=7407.21s tid=0x000078fc40004630 nid=103 runnable  [0x000078fce68fb000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.Net.poll(java.base@21.0.9/Native Method)
	at sun.nio.ch.NioSocketImpl.park(java.base@21.0.9/NioSocketImpl.java:191)
	at sun.nio.ch.NioSocketImpl.timedFinishConnect(java.base@21.0.9/NioSocketImpl.java:548)
	at sun.nio.ch.NioSocketImpl.connect(java.base@21.0.9/NioSocketImpl.java:592)
	at java.net.SocksSocketImpl.connect(java.base@21.0.9/SocksSocketImpl.java:327)
	at java.net.Socket.connect(java.base@21.0.9/Socket.java:751)
	at org.apache.hc.client5.http.impl.io.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:205)
	at org.apache.hc.client5.http.impl.io.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:490)
	at org.apache.hc.client5.http.impl.classic.InternalExecRuntime.connectEndpoint(InternalExecRuntime.java:164)
	at org.apache.hc.client5.http.impl.classic.InternalExecRuntime.connectEndpoint(InternalExecRuntime.java:174)
	at org.apache.hc.client5.http.impl.classic.ConnectExec.execute(ConnectExec.java:144)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement$$Lambda/0x000078fcee157b58.proceed(Unknown Source)
	at org.apache.hc.client5.http.impl.classic.ProtocolExec.execute(ProtocolExec.java:195)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement$$Lambda/0x000078fcee157b58.proceed(Unknown Source)
	at org.apache.hc.client5.http.impl.classic.ContentCompressionExec.execute(ContentCompressionExec.java:150)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement$$Lambda/0x000078fcee157b58.proceed(Unknown Source)
	at org.apache.hc.client5.http.impl.classic.HttpRequestRetryExec.execute(HttpRequestRetryExec.java:113)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement$$Lambda/0x000078fcee157b58.proceed(Unknown Source)
	at org.apache.hc.client5.http.impl.classic.RedirectExec.execute(RedirectExec.java:110)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
	at org.apache.hc.client5.http.impl.classic.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:87)
	at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:55)
	at org.apache.hc.client5.http.classic.HttpClient.executeOpen(HttpClient.java:183)
	at com.github.dockerjava.httpclient5.ApacheDockerHttpClientImpl.execute(ApacheDockerHttpClientImpl.java:189)
	at com.github.dockerjava.httpclient5.ApacheDockerHttpClient.execute(ApacheDockerHttpClient.java:9)
	at com.github.dockerjava.core.DefaultInvocationBuilder.execute(DefaultInvocationBuilder.java:228)
	at com.github.dockerjava.core.DefaultInvocationBuilder.get(DefaultInvocationBuilder.java:202)
	at com.github.dockerjava.core.DefaultInvocationBuilder.get(DefaultInvocationBuilder.java:74)
	at com.github.dockerjava.core.exec.ListContainersCmdExec.execute(ListContainersCmdExec.java:44)
	at com.github.dockerjava.core.exec.ListContainersCmdExec.execute(ListContainersCmdExec.java:15)
	at com.github.dockerjava.core.exec.AbstrSyncDockerCmdExec.exec(AbstrSyncDockerCmdExec.java:21)
	at com.github.dockerjava.core.command.AbstrDockerCmd.exec(AbstrDockerCmd.java:33)
	at com.nirima.jenkins.plugins.docker.DockerCloud.countContainersInDocker(DockerCloud.java:638)
	at com.nirima.jenkins.plugins.docker.DockerCloud.canAddProvisionedAgent(DockerCloud.java:656)
	at com.nirima.jenkins.plugins.docker.DockerCloud.provision(DockerCloud.java:394)
	- locked <0x000000069217bb88> (a com.nirima.jenkins.plugins.docker.DockerCloud)
	at io.jenkins.docker.FastNodeProvisionerStrategy.applyToCloud(FastNodeProvisionerStrategy.java:71)
	at io.jenkins.docker.FastNodeProvisionerStrategy.apply(FastNodeProvisionerStrategy.java:41)
	at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:327)
	at hudson.slaves.NodeProvisioner.lambda$suggestReviewNow$4(NodeProvisioner.java:199)
	at hudson.slaves.NodeProvisioner$$Lambda/0x000078fcedd2ea28.run(Unknown Source)
	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67)
	at java.util.concurrent.Executors$RunnableAdapter.call(java.base@21.0.9/Executors.java:572)
	at java.util.concurrent.FutureTask.run(java.base@21.0.9/FutureTask.java:317)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@21.0.9/ScheduledThreadPoolExecutor.java:304)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@21.0.9/ThreadPoolExecutor.java:1144)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@21.0.9/ThreadPoolExecutor.java:642)
	at java.lang.Thread.runWith(java.base@21.0.9/Thread.java:1596)
	at java.lang.Thread.run(java.base@21.0.9/Thread.java:1583)

   Locked ownable synchronizers:
	- <0x000000068c284050> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
	- <0x000000068ea724c8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
	- <0x00000006ac631450> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

This PR is experimenting with disabling the cache.

Testing done

Simulated zombie connection by,

  • trigger a build with node('docker') { sh 'sleep 120' }
  • while the build is still running block the network from jenkins controller sudo iptables -A OUTPUT -p tcp -d 10.112.7.2 --sport 2375 -j DROP (here the IP of jenkins controller is 10.112.7.2)
  • even stop the docker daemon sudo systemctl stop docker
  • still the thread dump shows stuck connection.
  • and even new builds are stuck in similar way

After the patch

  • at least new builds are triggering and completing
  • stuck thread would just be there forever (?) - let me check this

Submitter checklist

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests that demonstrate the feature works or the issue is fixed

@gbhat618 gbhat618 marked this pull request as ready for review December 31, 2025 13:33
@gbhat618 gbhat618 requested a review from a team as a code owner December 31, 2025 13:33
@gbhat618 gbhat618 marked this pull request as draft December 31, 2025 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants