-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix: ScriptSystem performance #2986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| /// Either a microthread or an action with priority. | ||
| /// </summary> | ||
| internal struct SchedulerEntry : IComparable<SchedulerEntry> | ||
| internal class SchedulerEntry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested this, but just out of curiosity:
How does this change from struct to class fare from the garbage/GC standpoint? In your extreme sample scene with many scripts, doesn't this result in thousands of SchedulerEntry instances?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, yes. Previous implementation wrapped this struct inside of a priority node, so the memory overhead is about the same per instance between both implementation. In this new implementation, those are not pooled, so they will be GCed once the script or microthread are collected. If creating a script/microthread did not allocate otherwise, I would worry, but given that it does, here it's just another allocation on top of the base one which is not nearly as problematic.
Right now it was swapped into a class to mark it as unscheduled once it runs, which is required to skip an unnecessary lookup when unscheduling it outside of the scheduler loop. We could do without and bare one binary search on every syncscript removal, or build additional logic around this to conditionally skip it when ran after a scheduler run, but that's a bit too fragile. I'll check monday if there's anything else I could do on that side.
…e priority to run all synscript of that priority together
|
Looked into making the SchedulerEntry a struct, not sure it's worth it given that we have to pay the cost of an additional hashset lookup whenever we schedule, process or unschedule one. We're kind of limited in what we can setup given that the Scheduler is re-entrant and can schedule actions while processing scheduled actions. |
This reverts commit cac41bb.
|
This PR is ready for review |
|
LGTM! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a significant performance optimization for the ScriptSystem, reducing script scheduling from O(N log N) to O(log N) per unique priority. In a test scene with ~20k SyncScript instances, ScriptSystem.Update execution time improved from ~9.5ms to ~4.7ms (approximately 2x faster).
Key Changes:
- Replaced per-script priority queue scheduling with batched execution grouped by priority
- Introduced pooling for
SchedulerEntryandExecutionQueueobjects to reduce GC pressure - Refactored
SchedulerEntryfrom struct to class to support the new pooling architecture
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
sources/engine/Stride.Engine/Engine/SyncScript.cs |
Added ScriptSystem reference and priority change tracking via PriorityUpdated() override |
sources/engine/Stride.Engine/Engine/StartupScript.cs |
Changed StartSchedulerNode from PriorityQueueNode to nullable SchedulerEntry for pooling |
sources/engine/Stride.Engine/Engine/Processors/ScriptSystem.cs |
Core refactoring: batches sync scripts by priority, implements object pooling, new batch execution method ExecuteBatchOfSyncScripts |
sources/core/Stride.Core/Collections/Dequeue.cs |
Made public, moved to Collections namespace, added power-of-2 capacity requirement, optimized with bitwise mask operations, added BinarySearch method |
sources/core/Stride.Core.Tests/TestDeque.cs |
New comprehensive test suite for Deque functionality including wrapping, range insertion, and binary search |
sources/core/Stride.Core.MicroThreading/SchedulerEntry.cs |
Converted from struct to class, removed priority/counter fields (moved to queue level), added queue tracking fields |
sources/core/Stride.Core.MicroThreading/Scheduler.cs |
Complete scheduler rewrite: priority-based bucketing with Deque per priority, object pooling for entries and buckets, new locking strategy |
sources/core/Stride.Core.MicroThreading/MicroThread.cs |
Simplified priority handling, removed internal Reschedule method, delegates to Scheduler.Reschedule |
sources/core/Stride.Core.MicroThreading/MicroThreadYieldAwaiter.cs |
Updated to use new HasNoEntriesScheduled() method instead of direct queue access |
sources/core/Stride.Core.Design/Threading/Internal/DefaultAsyncWaitQueue.cs |
Added import for Stride.Core.Collections namespace (Deque moved) |
Comments suppressed due to low confidence (1)
sources/core/Stride.Core/Collections/Dequeue.cs:540
- The comment refers to the non-existent method
DequeIndexToBufferIndex. This method was renamed toDequeIndexToBufferRef. Please update the comment to reflect the current method name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. If possible improve the documentation on some types.
|
@Kryptos-FR, are you testing copilot? You would like copilot to create xml documentation? |
I was trying to have it make a review but apparently I can't. Thanks for doing it. My other review comments were directed at the original author but if copilot can do it, why not? |
|
We need to update these instructions https://github.com/stride3d/stride/blob/master/.github/copilot-instructions.md, and include to always suggest xml doc for public APIs if they are missing, or needs to be updated or corrected. |
|
Copilot can't do it on existing PR at the moment.. so I am listing it below. @Eideren, use if any of those are helpful.. |
I don't have that option. Copilot is not listed for me. |
|
@VaclavElias Well, it somehow missed every single cases that should actually be documented aside from the binary search ... :D |
|
I guees, that's all we can get from it today 🤣. |

PR Details
Used a non-synthetic private scene to test these changes out. The scene is comprised of around 20k
SyncScriptof different types and priorities.Out of the box
ScriptSystem.Updatetakes ~9.5ms with min/max of 9.0 and 10.4.With this change, it takes ~4.7ms with min/max of 4.6 and 5.25.
Profile capture: https://share.firefox.dev/49LWPGK
Previously, scheduling any script was
log N, so total cost ofN log N. It now islog Nonly for each unique priority scheduled. Otherwise scheduling isO1The implementation uses a
Dequeto provide ordered insertion at the start, end and removal from the start.Then we have a
PriorityQueuewhich sorts theseDequebased on their priority to consume the most prioritized queue first.And finally a
DictionaryforO1subscription to a particularDeque.Then we have a couple of minor details to further amortize the cost of all of this.
Related Issue
fix: #2942
Types of changes
Checklist