Add AppDomain debugging support, and fix some problems it turned up #644

kring · 2025-12-18T13:12:43Z

This PR includes and requires CesiumGS/cesium-native#1291 so merge that first.

Continuing the quest for AppDomain reload robustness...

First, this PR adds a system for detecting cases where we attempt to use a reference from one AppDomain in a different one. The idea is to capture which AppDomain we're in whenever we construct an ObjectHandle. Then, later, when we use it, make sure we're still in the same AppDomain. It's simple in theory, but of course there are difficulties...

The main problem is that it's hard to tell what AppDomain we're in. AppDomains have an Id property, which uniquely identifies them. Perfect! Except it only uniquely identifies them among AppDomains that are currently active. A new AppDomain can have the same ID as one that has now been unloaded. 🙄 So that doesn't work at all for our purposes.

I thought maybe we could use the AppDomain's GetHashCode, and it would at least be mostly unique for each AppDomain, even if it's not guaranteed to be so. But no, GetHashCode seems to be implemented to basically return a hash of the Id. So that doesn't work for our purposes either.

My next idea was to track the AppDomain manually. In Unity, we know there's only one active at a time, even though this isn't guaranteed in .NET in general. So we can say that a new AppDomain starts when initializeReinterop is called, and it ends when the AppDomain's DomainUnload event is raised. So an ObjectHandle created inside one [initializeReinterop, DomainUnload] interval should never be used in a different such interval. We can work with that.

But what about when a handle is created or destroyed outside of any such interval? Well, handle creation outside the interval should never happen. So we'll assert on that.

Destruction is not so simple. The problem is finalization. Mono calls DomainUnload and then it invokes finalizers for all objects in the AppDomain. In our case, all of our C# classes backed by a C++ Impl need to be finalized so that the Impls can be destroyed. In some cases, the Impls hold references to other managed objects. When the Impls are destroyed, these references will be released. Outside our happy interval. So we can't assert in this case.

This isn't great, though, because actual bugs could look just like this. We create an ObjectHandle inside one AppDomain interval, and then release it too late, after the AppDomain is gone but before the initializeReinterop is called for the next one. Can we detect this scenario somehow?

Yes, because we can ask the current AppDomain if it is currently finalizing objects for unload. So the logic is, a handle is valid (using it does not assert) if:

It is being used in the same AppDomain happy interval in which it was created, or
The current AppDomain is finalizing for unload.

In the latter case, we don't assert, but we also treat the reference as if it's null. Effectively, as soon as the DomainUnload event is raised, all managed references held by our native code immediately become null. I think this is a good idea because the purpose of a finalizer is solely to release native resources. Releasing a managed reference is pointless, because all managed objects are by definition dead at this stage. And using a managed reference is worse than pointless - it's dangerous! - because the object may have already been finalized. The .NET doc on finalizers says:

Don't access managed object members from a finalizer. During finalization, managed objects might already be disposed, making them unavailable or in an invalid state. Only access unmanaged resources directly from finalizers.

So with this debugging system in place, I did all kinds of terrible things to trigger AppDomain unloads at the worst times, and watched for assertions. I saw a couple, and those are fixed in this PR as well. Specifically:

Added code to wait for async CesiumIonSession work to complete before allowing the AppDomain unload. Similar to what we were already doing for tilesets and raster overlays.
Explicitly disposed tileset's _updateInEditorCallback so it doesn't need to be finalized.

Also fixed a case where e in generated catch (Exception e) conflicted with the name of a parameter. I renamed it to reinteropManagedException.

This PR is currently draft because this AppDomain tracking probably adds some overhead. We should at least disable it in non-Editor builds. Maybe it should be optional even in Debug/Editor builds?

Also draft because, after all this, I still (rarely) get this error:

Release of invalid GC handle. The handle is from a previous domain. The release operation is skipped.

I can't really think of how this can possibly happen anymore. With the debugging system in place, any error like this should trigger an assertion failure, not a stackless error message like this. Perhaps it's overly optimistic, but I kind of want to say this isn't us! That some other part of Unity is attempting to use a reference across AppDomains.

But maybe there's some subtle way this can still happen, and I just haven't thought of it yet...

I'm not entirely sure why this has just become a problem now.

kring · 2025-12-19T02:18:20Z

This should be ready now. The AppDomain checking is disabled in non-Editor builds. Editor builds (even release ones) do the AppDomain checking, which makes ObjectHandles 50% bigger than they used to be and requires an extra comparison when dereferencing, but I think we can live with that. I still can't explain why we still sometimes get "invalid GC Handle" errors once in awhile when entering Play mode. As mentioned above, it's possible (but maybe not likely?) that this isn't even a Cesium bug.

j9liu

Thanks @kring ! I have a few comments but most are questions with potentially no action. Although I still see some invalid GC handle errors (as you acknowledged), they are significantly fewer than before, which is a great improvement!

native~/src/Editor/CesiumIonSessionImpl.cpp

native~/src/Runtime/UnityExternals.cpp

Reinterop~/CppObjectHandle.cs

Reinterop~/CSharpObjectHandleUtility.cs

j9liu · 2025-12-22T19:41:56Z

Reinterop~/CppObjectHandle.cs

+                      this->_handle = nullptr;
+                    }
+
+                    this->_appDomainIndex = _runningAppDomainIndex;


I see the copies use runningAppDomainIndex while the moves use rhs._appDomainIndex -- is this in case the copy happens during an AppDomain reload?

It shouldn't actually matter... but I think of it like this:

When moving, we're just moving some existing handle to a new place in memory. There's no particular reason to spend time evaluating the validity of that handle when we do that. We'll do that later when we dereference it or free it or whatever.

When copying, we must have a valid handle. So we want to do all our appdomain validation logic. But if rhs._appDomainIndex != runningAppDomainIndex, then rhs.GetRaw() will return nullptr anyway and then the app domain index doesn't matter. So we could have initialized _appDomainIndex(rhs._appDomainIndex) in the copy constructor, too and it wouldn't change any behavior. I think it would be slightly less clear, though.

j9liu · 2025-12-22T19:50:19Z

native~/src/Editor/CesiumIonSessionImpl.cpp

        pThis->_isLoadingProfile = false;
-        pThis->_profile = std::nullopt;
-        pThis->broadcastProfileUpdate();
+        pThis->_profile.emplace();


I was wondering what the effects of this were, and so I traced a potential issue to the function refreshProfileIfNeeded. But then I discovered that we never call that function in the codebase anyways! Maybe we could delete all those refresh___IfNeeded functions to confirm?

Just FYI, I made this change to avoid a refresh loop:

The profile is loading.

Unity wants to reload the AppDomain, so we cancel all requests, including the profile request.

The canceled request causes this catchInMainThread continuation to be invoked.

The continuation calls BroadcastProfileUpdate, which accesses the profile, which causes it to be requested again.

We block forever waiting for the session to not be busy.

But yeah these refresh-if-needed functions look pretty useless. I'll remove them.

j9liu · 2025-12-22T20:48:56Z

Just realized I was testing the build on Release and not Debug, so I tried it out on the latter. The first time I did an AppDomain reload I got one big batch of errors (45), but afterwards I couldn't see them. Maybe because the ion requests were cached?

But I did run into this when I exited mid-play mode in the first sample level, while tiles were still loading (arguably evil behavior 😅):

kring · 2025-12-22T22:22:47Z

The first time I did an AppDomain reload I got one big batch of errors

What kind of errors? If it was various flavors of "request canceled because the AppDomain is unloading", that's expected. Maybe we could hide those from the user, but that will take some doing. If you saw other errors, like GCHandle errors for example, that's more concerning.

But I did run into this when I exited mid-play mode in the first sample level, while tiles were still loading (arguably evil behavior 😅)

No that's a valid thing to do. Does it happen consistently for you? If you see it again, can you attach the debugger (VS or VS Code will work fine) to the unity.exe process, click Retry, and tell me what your call stack looks like?

j9liu · 2025-12-22T23:31:04Z

What kind of errors? If it was various flavors of "request canceled because the AppDomain is unloading", that's expected. Maybe we could hide those from the user, but that will take some doing. If you saw other errors, like GCHandle errors for example, that's more concerning.

It was definitely a flux of GCHandle errors. I also saw the "request canceled because the AppDomain is unloading" logs but I figured that was fine and intentional!

No that's a valid thing to do. Does it happen consistently for you? If you see it again, can you attach the debugger (VS or VS Code will work fine) to the unity.exe process, click Retry, and tell me what your call stack looks like?

I just tried to reproduce it a couple times but I couldn't 🤔 I don't remember if I did anything specific to trigger it, but maybe it was just a fluke.

I also tried reproducing the 45+ GCHandle errors, but now I'm only getting ~3. So I'll let you know if I ever run into it again, I suppose?

kring · 2025-12-23T00:56:12Z

It was definitely a flux of GCHandle errors.

🤦 And no call stack associated with any of them, I assume?

kring · 2025-12-23T00:57:33Z

I also tried reproducing the 45+ GCHandle errors, but now I'm only getting ~3.

3 is a lot! The most I've ever seen in this branch is one, and only rarely. It happens when entering play mode.

Co-authored-by: Janine Liu <32226860+j9liu@users.noreply.github.com>

kring · 2025-12-23T12:49:48Z

I was able to reproduce the assertion when exiting Unity while in Play mode. The problem was that, while finalizing, it was attempting to dispose a UnityWebRequest that had already been disposed. This caused a managed exception to be thrown, and then we asserted when creating an ObjectHandle for the managed exception. Easy fix: don't dispose a request that has already been disposed.

kring · 2025-12-23T12:50:44Z

Thanks for the review @j9liu! I believe I've addressed everything, except that there may still be a lingering GCHandle issue.

j9liu

Looks good to me, thanks @kring !

kring added 8 commits December 17, 2025 17:56

Further handling of AppDomain unload.

490540b

More robust assertion of proper AppDomain behavior.

ca6976e

Only use AppDomain in the Editor.

787b709

Move more includes to editor-only.

97136d8

Yet more editor-only includes.

7517a45

AppDomain shenanigans only in the Editor.

2a8e6ce

Fix incorrect forward references.

b0a9c4a

I'm not entirely sure why this has just become a problem now.

Update cesium-native.

1598f55

kring mentioned this pull request Dec 19, 2025

Enable warnings-as-errors in our C++ code #645

Open

kring marked this pull request as ready for review December 19, 2025 02:18

j9liu requested changes Dec 22, 2025

View reviewed changes

kring and others added 3 commits December 23, 2025 22:56

Fix typo.

6e8ce59

Co-authored-by: Janine Liu <32226860+j9liu@users.noreply.github.com>

Remove unused "IfNeeded" functions.

146b370

Verify UnityWebRequest is not null before disposing it.

269b003

Merge remote-tracking branch 'origin/main' into more-appdomain-fun

574f191

kring mentioned this pull request Dec 28, 2025

Add support for Cesium ion login token refresh #647

Merged

j9liu approved these changes Jan 1, 2026

View reviewed changes

j9liu merged commit 276f362 into main Jan 1, 2026
8 checks passed

j9liu deleted the more-appdomain-fun branch January 1, 2026 18:34

j9liu added this to the January 2026 Release milestone Jan 1, 2026

Add AppDomain debugging support, and fix some problems it turned up #644

Add AppDomain debugging support, and fix some problems it turned up #644

Uh oh!

Conversation

kring commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kring commented Dec 19, 2025

Uh oh!

j9liu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

j9liu Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

kring Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

j9liu Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

kring Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

j9liu commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kring commented Dec 22, 2025

Uh oh!

j9liu commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kring commented Dec 23, 2025

Uh oh!

kring commented Dec 23, 2025

Uh oh!

kring commented Dec 23, 2025

Uh oh!

kring commented Dec 23, 2025

Uh oh!

j9liu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kring commented Dec 18, 2025 •

edited

Loading

j9liu left a comment •

edited

Loading

j9liu commented Dec 22, 2025 •

edited

Loading

j9liu commented Dec 22, 2025 •

edited

Loading