-
Notifications
You must be signed in to change notification settings - Fork 7
Test and fix garbage collector #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Switch over to `DEBUG_PRINTF()` for debugging output - Add MIN() and MAX() macros borrowed from glibc - Rename tp_mark 'max' parameter to 'gc_max' - Fix loop logic to perform full GCs as expected
|
|
||
| #ifndef DEBUG_PRINTF | ||
| #ifdef NDEBUG | ||
| #define DEBUG_PRINTF(...) {0;} | ||
| #else | ||
| #define DEBUG_PRINTF(...) fprintf(stderr, __VA_ARGS__) | ||
| #endif | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this to tp_internal to replace the hand-coded logic I was using in tp_gc.c — it should be portable to any C99 compiler, __VA_ARGS__ is part of the standard now. And printing to stderr means the messages can still be suppressed if needed.
| .objs/%.o : %.c | ||
| @mkdir -p $(dir $@) | ||
| $(CC) $(CFLAGS) -g -O0 -I . -c -o $@ $< | ||
| $(CC) $(CFLAGS) -DNDEBUG -g -O0 -I . -c -o $@ $< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this to the non-debug compile commands, since it apparently wasn't being defined. (I think because of the -g.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-g -O0 looks like a debug compile command to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rofl0r In truth, it's identical to the actual debug compile command a few lines down, except that it doesn't define TPVM_DEBUG. I'm guessing turning on optimization in the non-debug builds was something left for after the code was shored up a bit better.
| void tp_mark(TP, int gc_max) { | ||
| int target_len = gc_max > 0 ? MAX(0, tp->grey->len - gc_max) : 0; | ||
| DEBUG_PRINTF( | ||
| "GC: Coloring objects, %d out of %d to mark\n", | ||
| tp->grey->len - target_len, tp->grey->len | ||
| ); | ||
|
|
||
| while (tp->grey->len > target_len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this fixes the garbage collection. The DEBUG_PRINTF()s show that it's now running and doing its thing, instead of doing nothing like before. (Although it's interesting that, at least when running the tests, ONLY full GC runs were ever triggered — which made the issue even worse, because partial runs would've actually collected some objects. But there were never any partial runs.)
Even having fixed it, though, I'm kind of at a loss for how to actually test it. The "tests" I added to tests/test_gc.py really don't test the GC at all, in the sense of confirming that it's doing what it should be doing. They merely create and delete some objects to trigger the DEBUG_PRINTF() output, so that the GC operation can be manually inspected. But there are no tests for it in the code.
I'm really not sure how to write them, either. Without weakref support (which I don't think tinypy has?), it seems like a difficult thing to actually confirm that an unreferenced object has been deleted. Especially by a GC that, unlike the Python one, has no code-accessible API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, now that I've read more of the code I don't actually like this method.
The problem comes up (potentially) when complex nested objects are GC'd. If a container is moved from the grey list to the black list, its children are placed on the grey list. Which means the length of the grey list can actually INCREASE as the loop is running, which would cause it to mark more than gc_max objects each run.
So, making tp->grey->len the loop control variable is a bad idea, and I'll need to add an explicit loop iteration counter back in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, indeed, now that I actually have incremental steps working (see https://github.com/rainwoodman/tinypy/pull/45/files#r2384173153), the logs regularly show things like:
GC: Incremental step, marking up to 8 objects
GC: Coloring objects, 8 out of 31 to mark
GC: Marking complete, 105 objects remain on grey list
GC: Incremental step, marking up to 8 objects
GC: Coloring objects, 8 out of 105 to mark
GC: Marking complete, 101 objects remain on grey list
GC: Incremental step, marking up to 8 objects
GC: Coloring objects, 8 out of 101 to mark
GC: Marking complete, 97 objects remain on grey list
My previous algorithm would've screwed up "8 out of 31 to mark => marking complete, 105 remain" pretty spectacularly.
| int re_set_syntax(int); | ||
| char *re_compile_pattern(unsigned char*, int, regexp_t); | ||
| int re_match(regexp_t, unsigned char*, int, int, regexp_registers_t); | ||
| int re_search(regexp_t, unsigned char*, int, int, int, regexp_registers_t); | ||
| void re_compile_fastmap(regexp_t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I meant about needing to update the code for modern GCCs. Around GCC 11 or 12, declarations like the ones that were there previously became a compile error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi that's a matter of default standard, GCC 15 defaults to -std=gnu23 (which makes this an error), whereas earlier standards accepted it, so adding e.g. -std=gnu99 would fix it too. (note that using -std=c99 is more complicated since it enables a "strict C without extensions" profile on most platforms, which makes common POSIX or XSI extensions unavailable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know, tho I figure the declarations are better with the argument types provided anyway, so it's not really worth fighting the standard. 😀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, having the full prototype here is an improvement. in other cases where one just wants to compile a piece of software that shipped before GCC 15 materialized, it's usually a lot less hassle to just add -std=gnu99 to CFLAGS than trying to patch the software. and with compilers moving as fast and recklessly like in the past few years it's generally a good idea to just hardcode the expected C/C++ standard one wants to use in the build environment, so it will build even in a decade into the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it will build even in a decade into the future.
I admire your optimism! 😁
|
Output of the debug build with these changes in place: $ bash run-tests.sh --backend=tpvm-dbg tests/test_gc.py
tests/test_gc.py
./tpc -o tests/test_gc.tpc tests/test_gc.py
./tpvm-dbg tests/test_gc.tpc
Running full GC
GC: Coloring objects, 9 out of 9 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 74 out of 74 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 9 out of 9 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 9 out of 9 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 11 out of 11 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 9 out of 9 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 9 out of 9 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
[...]
Running full GC
GC: Coloring objects, 9 out of 9 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 9 out of 9 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 9 out of 9 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 9 out of 9 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 9 out of 9 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 9 out of 9 to mark
GC complete, 0 items remaining
[ STARTED ] tests/test_gc.tpc 2 cases.
[ PASS ] 0: test_create
[ PASS ] 1: test_delete
[ HEALTHY ] tests/test_gc.tpc 0 Known Failures.
All tests passed. |
I should fix that message to be more precise. The GC isn't complete at all at the point that's emitted, only the coloring of grey-list candidates to the black list. |
| #ifdef TPVM_DEBUG | ||
| tp->gcmax = 0; | ||
| #else | ||
| tp->gcmax = 4096; | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AUGH! Well, now I know why only full GCs were ever run in the debug build. (tp->gcmax == 0 is one of the conditions for triggering a full GC in tp_gc_run(), even if it wasn't explicitly requested.)
|
Hi! I quickly scanned through the discussion thread. Thanks for the
thorough investigation!
Quick re on Testing: If we expose some internal state and actions of the gc
to a python module maybe we build some assertions on top of that?
…On Sat, Sep 27, 2025 at 7:32 AM Frank Dana ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In modules/re/regexpr.h
<#45 (comment)>:
> +int re_set_syntax(int);
+char *re_compile_pattern(unsigned char*, int, regexp_t);
+int re_match(regexp_t, unsigned char*, int, int, regexp_registers_t);
+int re_search(regexp_t, unsigned char*, int, int, int, regexp_registers_t);
+void re_compile_fastmap(regexp_t);
so it will build even in a decade into the future.
I admire your optimism! 😁
—
Reply to this email directly, view it on GitHub
<#45 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABBWTF6Z4FZ5GTLYNRGFSL3U2NZTAVCNFSM6AAAAACHUUUMHOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTENZUHE3DEMZXGE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I think that's probably the best/only way. If we take Python's >>> import gc
>>> gc.<tab><tab>
gc.DEBUG_COLLECTABLE gc.freeze() gc.get_threshold()
gc.DEBUG_LEAK gc.garbage gc.is_finalized(
gc.DEBUG_SAVEALL gc.get_count() gc.is_tracked(
gc.DEBUG_STATS gc.get_debug() gc.isenabled()
gc.DEBUG_UNCOLLECTABLE gc.get_freeze_count() gc.set_debug(
gc.callbacks gc.get_objects( gc.set_threshold(
gc.collect( gc.get_referents( gc.unfreeze()
gc.disable() gc.get_referrers(
gc.enable() gc.get_stats()
def test_list(self):
l = []
l.append(l)
gc.collect()
del l
self.assertEqual(gc.collect(), 1)
def test_tuple(self):
# since tuples are immutable we close the loop with a list
l = []
t = (l,)
l.append(t)
gc.collect()
del t
del l
self.assertEqual(gc.collect(), 2)
def test_instance(self):
class A:
pass
a = A()
a.a = a
gc.collect()
del a
self.assertNotEqual(gc.collect(), 0)
>>> from pprint import pp
>>> import gc
>>> pp(gc.get_stats())
[{'collections': 10, 'collected': 782, 'uncollectable': 0},
{'collections': 0, 'collected': 0, 'uncollectable': 0},
{'collections': 0, 'collected': 0, 'uncollectable': 0}]
>>> gc.get_count()
(614, 10, 0)
>>> len(gc.get_objects())
13326
>>> gc.get_count()
(620, 10, 0)
>>> gc.collect()
801
>>> gc.get_count()
(40, 0, 0)
>>> len(gc.get_objects())
12182
>>> pp(gc.get_stats())
[{'collections': 10, 'collected': 782, 'uncollectable': 0},
{'collections': 0, 'collected': 0, 'uncollectable': 0},
{'collections': 1, 'collected': 801, 'uncollectable': 0}]
>>> gc.collect()
0
>>> gc.get_count()
(41, 0, 0)Obviously Python's But having some way to inspect its operation programmatically would make it possible to write tests that, right now, I don't think are possible. |
This PR (marked draft as it's still evolving) is an investigation into the operation of the garbage collector, which I believe is not currently working (at least for full GC runs). See #16 (comment) for analysis.
So far, I've made changes to output state from the GC marking process, and written a simple
tests/test_gc.pywhich exercises it. Results demonstrate what I'd suspected: When a full run is triggered, the GC is not actually doing anything. (Partial) output from running the new test, with the changes in this PR: