Skip to content

Conversation

@jmckenzie-dev
Copy link
Contributor

Add JDK21 support

Patch by jmckenzie; reviewed by TBD for CASSANDRA-18831

@michaelsembwever
Copy link
Member

ant check is throwing a few errors (related to imports)

@fisk
Copy link

fisk commented Nov 29, 2024

FYI if you want to retain a max tenuring threshold of 1, you can specify -XX:ZTenuringThreshold=1 with generational ZGC. But honestly it will probably figure things out automatically anyway.

Copy link
Contributor

@ekaterinadimitrova2 ekaterinadimitrova2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a full review but high-level remarks.
Most of all:

  • I am not sure we should change config parameters based on the assumption people will use directly JDK21 on the next major. Check my comments. Also, probably we need to bring to the ML if we want to move to ZGC. Maybe you can share some test results from your practice with it and benefits.
  • I would not add 21 to supported versions in build.xml until we start running JDK21 CI.
    We have an option that allows people to experiment with 21: CASSANDRA_JDK_UNSUPPORTED
  • There is one add-export that is incorrect and can be removed.

Copy link
Contributor

@ekaterinadimitrova2 ekaterinadimitrova2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some general comments for things that I see. No big blockers on my end.
Leaving to @michaelsembwever the full review as my understanding is he already did that and I have full trust on you too trusted PMC members. :-)
I am fine with the new ZGC added as long as we bring it to the mailing list and it is not yet advertised probably as production ready? I believe we already agreed on that but just leaving here for completeness and awareness.

Just a few notable things to be addressed:
some tests' changes were not reverted after we agreed not to change defaults for config - this breaks config tests.
For some unknown to me reason I do not find cassandra listed here - https://ci-builds.apache.org/ and I couldn't verify the rest of the test failures.

I am a bit confused as some tests still have failures related to jamm despite the settings being tweaked to workaround some mentioned parameter. Will those be solved with the jamm PR being merged? They are expected? I am sure I missed something here but I don't know what :-)

-Dnet.bytebuddy.experimental - I still see it in the jvm config file and I thought we figured it is not needed? Maybe I misunderstood something?

I am not familiar with the latest CI scripts, so I definitely defer that to @michaelsembwever .
Also, simulator code is not something I know. The rest I checked for immediate obvious issues. Hope that feedback helps here. Thank you for your work!

NEWS.txt Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds great, do we recommend this in prod already? I don't think I am the right person to review the selected settings and say they are good for prod.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker for this ticket and we do not have test coverage etc, neither I think it is a blocker but as a future reminder - I recommend adding simple unit tests to the DatabaseDescriptorTest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question, about the change of default

@jmckenzie-dev
Copy link
Contributor Author

k, signposting intent here @ekaterinadimitrova2 and @michaelsembwever:

  1. I'll get internal CI working w/this branch and JDK21, so then I can...
  2. revert the experimental bytebuddy again and see how it behaves (think I erroneously attributed failures to it on subsequent run that weren't it) w/out having an external "please run CI for me" dep
  3. I'll double-check to make sure I captured all the changed default GC params (that was work that Marcus did I wasn't involved in and it got absorbed into the diff I then upstreamed; I need to more deeply familiarize myself with that changeset to make sure I revert it), and finally
  4. I'll work through any and all test failures, ensuring any erroneous test changes (haven't looked at the code yet; chaotic Monday) are addressed, with a bias toward reverting to baseline if possible and passing.

I'm going to set this PR "on deck" for a week or two to address another priority, then will work through all this in a closed loop and surface if I get blocked on or need a 2nd pair of eyes on anything. That way we can break out of the long-async cycles we've been having on review / integration on this branch and hopefully push it across the line.

@jmckenzie-dev jmckenzie-dev changed the title Add JDK21 support CASSANDRA-18831 Add JDK21 support Apr 14, 2025
@belliottsmith belliottsmith force-pushed the trunk branch 2 times, most recently from df3eb40 to 54e39a9 Compare July 23, 2025 11:19
@mohakgoel1
Copy link

Hello,
Thanks for the work on adding JDK 21 support! I’ve been following this PR with interest as someone outside the project and was wondering if there’s any update or expected timeline for when it might be reviewed or merged.

Appreciate all the work the team is doing on Cassandra!

@jmckenzie-dev
Copy link
Contributor Author

any update or expected timeline for when it might be reviewed or merged

The hold up is CI validation. I have 1 pipeline left (python upgrade dtests) to get working on our internal CI infrastructure then it should only be a couple of weeks to burn down test failures. The code as written here is functionally identical to what we're running internally on a fork on a very large fleet w/all CI passing so if you wanted to create a build from this patch to tinker with, you should be in a very solid place.

I'm going to be traveling next week and we have the community over code conference coming up in a month so it's not an ideal season for keeping things moving along but I certainly haven't forgotten about this work, it just got shuffled a bit in priorities but I'm almost dug out from under things.

@pron
Copy link

pron commented Sep 30, 2025

--add-opens/--add-exports flags are FIXME markers signifying non-portable code that may break with any release. The more there are, the likelier the code is to break. By all accounts, a program with 30 such flags shouldn't work at all unless it's specifically hand-crafted for that release. Even if these changes are necessary yet the software miraculously works across versions, it can't work for long. But this project works on JDK 17, so that this change works on 21 tells me these flags are largely unnecessary or that the fixes they require aren't so complicated.

Without such flags, a piece of software is highly likely to work, unchanged, on future JDK releases for a long time, enjoying performance and observability improvements for free. But every single one of them that is removed can contribute to the project's stability and portability, so I would recommend taking the time to make sure they're really needed.

@driftx
Copy link
Contributor

driftx commented Sep 30, 2025

But this project works on JDK 17, so that this change works on 21 tells me these flags are largely unnecessary or that the fixes they require aren't so complicated.

https://github.com/apache/cassandra/blob/trunk/conf/jvm17-server.options#L69

@pron
Copy link

pron commented Oct 2, 2025

https://github.com/apache/cassandra/blob/trunk/conf/jvm17-server.options#L69

Ah. In that case, the code is living on borrowed time and has just been lucky. Every release increases the chance of things breaking. Remember that the reason for encapsulation of JDK internals (or one of the several reasons) is to give the JDK more freedom to change internals more liberally, and we've been taking advantage of the freedom. The pace of changes to internals is expected to continue growing.

I had a look at a couple of uses of internal access, and they now have standard replacements (albeit in JDK 22 and on).

@ekaterinadimitrova2
Copy link
Contributor

ekaterinadimitrova2 commented Oct 2, 2025

@pron - you are not wrong.
Please check the discussion below:
https://lists.apache.org/list?dev@cassandra.apache.org:2022-9:jdk%20internals

Also, I would like to mention that on JDK17, we reduced the number of add-opens and add-exports, especially with the update of the jamm library and other code changes during the upgrade. We will be happy to continue improving things when there is a chance.

I had a look at a couple of uses of internal access, and they now have standard replacements (albeit in JDK 22 and on).

That is great. Please feel free to open tickets with your findings. It would be even better if you had patches, but even just tickets would be great. Thank you for your input here. Highly appreciated

Patch by Josh McKenzie; reviewed by TBD for CASSANDRA-18831

Co-authored-by: Aleksey Yeschenko
Co-authored-by: Achilles Benetopoulos
Co-authored-by: Mick Semb Wever
- Unified log vs. warn relationship checking under a single method in
  DatabaseDescriptor
ws("workspace/${JOB_NAME}/${BUILD_NUMBER}/${cell.step}/${cell.arch}/jdk-${cell.jdk}/python-${cell.python}") {
fetchSource(cell.step, cell.arch, cell.jdk)
fetchDockerImages(['ubuntu2004_test'])
fetchDockerImages(['ubuntu-test'])
Copy link
Member

@michaelsembwever michaelsembwever Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 346 also needs a fix !! @jmckenzie-dev

fetchDockerImages("redhat" == cell.step ? ['almalinux-build'] : ['debian-build'])

Comment on lines +43 to +45
# Don't let ZGC return unclaimed memory to the OS by setting min and max ==
# -Xms12G </string>
# -Xmx12G </string>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this redundant ? (it being done already in cassandra-env.sh)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. In cassandra-env.sh we have the following:

# We only set -Xms and -Xmx if they were not defined on jvm-server.options file
# If defined, both Xmx and Xms should be defined together.
if [ $DEFINED_XMX -ne 0 ] && [ $DEFINED_XMS -ne 0 ]; then
     JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}"
     JVM_OPTS="$JVM_OPTS -Xmx${MAX_HEAP_SIZE}"
elif [ $DEFINED_XMX -ne 0 ] || [ $DEFINED_XMS -ne 0 ]; then
     echo "Please set or unset -Xmx and -Xms flags in pairs on jvm-server.options file."
     exit 1
fi

This implies to me that we expect to have the values exist in both places and one takes precedent.

Comment on lines +191 to +205
# Young generation size is automatically calculated by cassandra-env
# based on this formula: min(100 * num_cores, 1/4 * heap size)
#
# The main trade-off for the young generation is that the larger it
# is, the longer GC pause times will be. The shorter it is, the more
# expensive GC will be (usually).
#
# It is not recommended to set the young generation size if using the
# G1 GC, since that will override the target pause-time goal.
# More info: http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html
#
# The example below assumes a modern 8-core+ machine for decent
# times. If in doubt, and if you do not particularly want to tweak, go
# 100 MB per physical CPU core.
#-Xmn800M
Copy link
Member

@michaelsembwever michaelsembwever Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also exists here: https://github.com/jmckenzie-dev/cassandra/blob/1564529f5b7dce3a20f10dfd0add2f146b3991de/conf/jvm11-server.options#L53-L58

where it is better placed, being only relevant to CMS.
GenZGC does not have the young/old distinction, and G1 uses -XX:G1NewSizePercent=50 here: https://github.com/jmckenzie-dev/cassandra/blob/1564529f5b7dce3a20f10dfd0add2f146b3991de/conf/jvm11-server.options#L70C1-L70C24

i think what's highlighted here can simply be deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably here to provide a warning that setting it in the G1 case is ill-advised. -Xmn is noop for ZGC but G1 users could trip across this.

throw new IllegalArgumentException("Threshold value for gc_warn_threshold must be greater than or equal to 0");

if (threshold > Integer.MAX_VALUE)
throw new IllegalArgumentException("Threshold must be less than Integer.MAX_VALUE");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the parameter type should be long but still be under Integer.MAX_VALUE it (IMHO) deserves a comment as to why, as it looks odd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The int to long conversion predates this patch; I just kind of "When in Rome'ed" it.

            DatabaseDescriptor.setGCLogThreshold((int) DatabaseDescriptor.getGCLogThreshold());
            DatabaseDescriptor.setGCWarnThreshold((int) DatabaseDescriptor.getGCWarnThreshold());

I could dig around a bit to try and figure out why these return a long - comes from JJirsa in 2016. My guess is that we didn't want to change the type but did want to sanity check and constrain the values to permissible ranges, but I'm not sure tbh.

if (threshold < 0)
throw new IllegalArgumentException("Threshold value for gc_warn_zgc_threshold must be greater than or equal to 0");

if (threshold > Integer.MAX_VALUE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Member

@michaelsembwever michaelsembwever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

complete review done. a few comments to address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.