Test and fix garbage collector #45

ferdnyc · 2025-09-27T05:25:37Z

This PR (marked draft as it's still evolving) is an investigation into the operation of the garbage collector, which I believe is not currently working (at least for full GC runs). See #16 (comment) for analysis.

So far, I've made changes to output state from the GC marking process, and written a simple tests/test_gc.py which exercises it. Results demonstrate what I'd suspected: When a full run is triggered, the GC is not actually doing anything. (Partial) output from running the new test, with the changes in this PR:

$ bash run-tests.sh --backend=tpvm-dbg tests/test_gc.py
tests/test_gc.py
./tpc -o tests/test_gc.tpc tests/test_gc.py
./tpvm-dbg tests/test_gc.tpc
Running full GC
GC: Coloring objects, -1 out of 9 to mark
GC complete, 9 items remaining
Running full GC
GC: Coloring objects, -1 out of 74 to mark
GC complete, 74 items remaining
Running full GC
GC: Coloring objects, -1 out of 75 to mark
GC complete, 75 items remaining
Running full GC
GC: Coloring objects, -1 out of 76 to mark
GC complete, 76 items remaining
Running full GC
GC: Coloring objects, -1 out of 77 to mark
GC complete, 77 items remaining
Running full GC
GC: Coloring objects, -1 out of 77 to mark
GC complete, 77 items remaining
[...]
Running full GC
GC: Coloring objects, -1 out of 22937 to mark
GC complete, 22937 items remaining
Running full GC
GC: Coloring objects, -1 out of 22938 to mark
GC complete, 22938 items remaining
Running full GC
GC: Coloring objects, -1 out of 22938 to mark
GC complete, 22938 items remaining
Running full GC
GC: Coloring objects, -1 out of 22939 to mark
GC complete, 22939 items remaining
Running full GC
GC: Coloring objects, -1 out of 22940 to mark
GC complete, 22940 items remaining
Running full GC
GC: Coloring objects, -1 out of 22940 to mark
GC complete, 22940 items remaining
All tests passed.

- Switch over to `DEBUG_PRINTF()` for debugging output - Add MIN() and MAX() macros borrowed from glibc - Rename tp_mark 'max' parameter to 'gc_max' - Fix loop logic to perform full GCs as expected

ferdnyc · 2025-09-27T07:11:58Z

tinypy/tp_internal.h

+
+#ifndef DEBUG_PRINTF
+#ifdef NDEBUG
+#define DEBUG_PRINTF(...) {0;}
+#else
+#define DEBUG_PRINTF(...) fprintf(stderr, __VA_ARGS__)
+#endif
+#endif


I added this to tp_internal to replace the hand-coded logic I was using in tp_gc.c — it should be portable to any C99 compiler, __VA_ARGS__ is part of the standard now. And printing to stderr means the messages can still be suppressed if needed.

ferdnyc · 2025-09-27T11:03:22Z

Makefile

 .objs/%.o : %.c
 	@mkdir -p $(dir $@)
-	$(CC) $(CFLAGS) -g -O0 -I . -c -o $@ $<
+	$(CC) $(CFLAGS) -DNDEBUG -g -O0 -I . -c -o $@ $<


Added this to the non-debug compile commands, since it apparently wasn't being defined. (I think because of the -g.)

-g -O0 looks like a debug compile command to me

@rofl0r In truth, it's identical to the actual debug compile command a few lines down, except that it doesn't define TPVM_DEBUG. I'm guessing turning on optimization in the non-debug builds was something left for after the code was shored up a bit better.

ferdnyc · 2025-09-27T11:08:12Z

tinypy/tp_gc.c

+void tp_mark(TP, int gc_max) {
+    int target_len = gc_max > 0 ? MAX(0, tp->grey->len - gc_max) : 0;
+    DEBUG_PRINTF(
+        "GC: Coloring objects, %d out of %d to mark\n",
+        tp->grey->len - target_len, tp->grey->len
+    );
+
+    while (tp->grey->len > target_len) {


So, this fixes the garbage collection. The DEBUG_PRINTF()s show that it's now running and doing its thing, instead of doing nothing like before. (Although it's interesting that, at least when running the tests, ONLY full GC runs were ever triggered — which made the issue even worse, because partial runs would've actually collected some objects. But there were never any partial runs.)

Even having fixed it, though, I'm kind of at a loss for how to actually test it. The "tests" I added to tests/test_gc.py really don't test the GC at all, in the sense of confirming that it's doing what it should be doing. They merely create and delete some objects to trigger the DEBUG_PRINTF() output, so that the GC operation can be manually inspected. But there are no tests for it in the code.

I'm really not sure how to write them, either. Without weakref support (which I don't think tinypy has?), it seems like a difficult thing to actually confirm that an unreferenced object has been deleted. Especially by a GC that, unlike the Python one, has no code-accessible API.

Also, now that I've read more of the code I don't actually like this method.

The problem comes up (potentially) when complex nested objects are GC'd. If a container is moved from the grey list to the black list, its children are placed on the grey list. Which means the length of the grey list can actually INCREASE as the loop is running, which would cause it to mark more than gc_max objects each run.

So, making tp->grey->len the loop control variable is a bad idea, and I'll need to add an explicit loop iteration counter back in.

And, indeed, now that I actually have incremental steps working (see https://github.com/rainwoodman/tinypy/pull/45/files#r2384173153), the logs regularly show things like:

GC: Incremental step, marking up to 8 objects GC: Coloring objects, 8 out of 31 to mark GC: Marking complete, 105 objects remain on grey list GC: Incremental step, marking up to 8 objects GC: Coloring objects, 8 out of 105 to mark GC: Marking complete, 101 objects remain on grey list GC: Incremental step, marking up to 8 objects GC: Coloring objects, 8 out of 101 to mark GC: Marking complete, 97 objects remain on grey list

My previous algorithm would've screwed up "8 out of 31 to mark => marking complete, 105 remain" pretty spectacularly.

ferdnyc · 2025-09-27T11:09:01Z

modules/re/regexpr.h

+int re_set_syntax(int);
+char *re_compile_pattern(unsigned char*, int, regexp_t);
+int re_match(regexp_t, unsigned char*, int, int, regexp_registers_t);
+int re_search(regexp_t, unsigned char*, int, int, int, regexp_registers_t);
+void re_compile_fastmap(regexp_t);


This is what I meant about needing to update the code for modern GCCs. Around GCC 11 or 12, declarations like the ones that were there previously became a compile error.

fyi that's a matter of default standard, GCC 15 defaults to -std=gnu23 (which makes this an error), whereas earlier standards accepted it, so adding e.g. -std=gnu99 would fix it too. (note that using -std=c99 is more complicated since it enables a "strict C without extensions" profile on most platforms, which makes common POSIX or XSI extensions unavailable).

Good to know, tho I figure the declarations are better with the argument types provided anyway, so it's not really worth fighting the standard. 😀

indeed, having the full prototype here is an improvement. in other cases where one just wants to compile a piece of software that shipped before GCC 15 materialized, it's usually a lot less hassle to just add -std=gnu99 to CFLAGS than trying to patch the software. and with compilers moving as fast and recklessly like in the past few years it's generally a good idea to just hardcode the expected C/C++ standard one wants to use in the build environment, so it will build even in a decade into the future.

so it will build even in a decade into the future.

I admire your optimism! 😁

ferdnyc · 2025-09-27T12:15:09Z

Output of the debug build with these changes in place:

$ bash run-tests.sh --backend=tpvm-dbg tests/test_gc.py
tests/test_gc.py
./tpc -o tests/test_gc.tpc tests/test_gc.py
./tpvm-dbg tests/test_gc.tpc
Running full GC
GC: Coloring objects, 9 out of 9 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 74 out of 74 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 9 out of 9 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 9 out of 9 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 11 out of 11 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 9 out of 9 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 9 out of 9 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
[...]
Running full GC
GC: Coloring objects, 9 out of 9 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 9 out of 9 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 9 out of 9 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 9 out of 9 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 9 out of 9 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 10 out of 10 to mark
GC complete, 0 items remaining
Running full GC
GC: Coloring objects, 9 out of 9 to mark
GC complete, 0 items remaining
[ STARTED ] tests/test_gc.tpc 2 cases.
[ PASS ] 0: test_create
[ PASS ] 1: test_delete
[ HEALTHY ] tests/test_gc.tpc 0 Known Failures.
All tests passed.

ferdnyc · 2025-09-27T12:43:05Z

GC complete, 0 items remaining

I should fix that message to be more precise. The GC isn't complete at all at the point that's emitted, only the coloring of grey-list candidates to the black list.

ferdnyc · 2025-09-27T13:43:07Z

tinypy/tp_gc.c

    #ifdef TPVM_DEBUG
    tp->gcmax = 0;
    #else
    tp->gcmax = 4096;
    #endif


AUGH! Well, now I know why only full GCs were ever run in the debug build. (tp->gcmax == 0 is one of the conditions for triggering a full GC in tp_gc_run(), even if it wasn't explicitly requested.)

rainwoodman · 2025-09-29T02:59:01Z

Hi! I quickly scanned through the discussion thread. Thanks for the thorough investigation! Quick re on Testing: If we expose some internal state and actions of the gc to a python module maybe we build some assertions on top of that?

…

On Sat, Sep 27, 2025 at 7:32 AM Frank Dana ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In modules/re/regexpr.h <#45 (comment)>: > +int re_set_syntax(int); +char *re_compile_pattern(unsigned char*, int, regexp_t); +int re_match(regexp_t, unsigned char*, int, int, regexp_registers_t); +int re_search(regexp_t, unsigned char*, int, int, int, regexp_registers_t); +void re_compile_fastmap(regexp_t); so it will build even in a decade into the future. I admire your optimism! 😁 — Reply to this email directly, view it on GitHub <#45 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABBWTF6Z4FZ5GTLYNRGFSL3U2NZTAVCNFSM6AAAAACHUUUMHOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTENZUHE3DEMZXGE> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

ferdnyc · 2025-10-01T03:36:44Z

@rainwoodman

Quick re on Testing: If we expose some internal state and actions of the gc to a python module maybe we build some assertions on top of that?

I think that's probably the best/only way. If we take Python's gc and its own Lib/test/test_gc.py as our model, this is what gc contains:

>>> import gc
>>> gc.<tab><tab>
gc.DEBUG_COLLECTABLE    gc.freeze()             gc.get_threshold()      
gc.DEBUG_LEAK           gc.garbage              gc.is_finalized(        
gc.DEBUG_SAVEALL        gc.get_count()          gc.is_tracked(          
gc.DEBUG_STATS          gc.get_debug()          gc.isenabled()          
gc.DEBUG_UNCOLLECTABLE  gc.get_freeze_count()   gc.set_debug(           
gc.callbacks            gc.get_objects(         gc.set_threshold(       
gc.collect(             gc.get_referents(       gc.unfreeze()           
gc.disable()            gc.get_referrers(                               
gc.enable()             gc.get_stats()

gc.collect() just triggers garbage collection, and returns the number of objects reaped. A lot of the simplest Python tests use that alone, and I think it would already be a step up. (And I could eliminate my debug printfs.) Things like:

    def test_list(self):
        l = []
        l.append(l)
        gc.collect()
        del l
        self.assertEqual(gc.collect(), 1)

    def test_tuple(self):
        # since tuples are immutable we close the loop with a list
        l = []
        t = (l,)
        l.append(t)
        gc.collect()
        del t
        del l
        self.assertEqual(gc.collect(), 2)

    def test_instance(self):
        class A:
            pass
        a = A()
        a.a = a
        gc.collect()
        del a
        self.assertNotEqual(gc.collect(), 0)

gc.disable() and gc.enable() do just what it says on the tin.
gc.get_objects() returns a list of all of the tracked objects (presumably, without incrementing their refcounts).
gc.get_count() and gc.get_stats() return statistical data on the numbers of tracked or collected objects, though exactly how they relate to each other or reality is a bit unclear to me right now:

>>> from pprint import pp
>>> import gc
>>> pp(gc.get_stats())
[{'collections': 10, 'collected': 782, 'uncollectable': 0},
 {'collections': 0, 'collected': 0, 'uncollectable': 0},
 {'collections': 0, 'collected': 0, 'uncollectable': 0}]
>>> gc.get_count()
(614, 10, 0)
>>> len(gc.get_objects())
13326
>>> gc.get_count()
(620, 10, 0)
>>> gc.collect()
801
>>> gc.get_count()
(40, 0, 0)
>>> len(gc.get_objects())
12182
>>> pp(gc.get_stats())
[{'collections': 10, 'collected': 782, 'uncollectable': 0},
 {'collections': 0, 'collected': 0, 'uncollectable': 0},
 {'collections': 1, 'collected': 801, 'uncollectable': 0}]
>>> gc.collect()
0
>>> gc.get_count()
(41, 0, 0)

Obviously Python's test_gc.py is a lot more complex, making use of GC callbacks, specially-crafted objects, and even a few C-library facilities to tweak the object destruction process. I don't think tinypy needs to get quite that crazy.

But having some way to inspect its operation programmatically would make it possible to write tests that, right now, I don't think are possible.

ferdnyc added 3 commits September 27, 2025 01:19

RE fixes for modern compilers

3c57953

Add debugging to GC collection

acf55d2

Add basic GC test

51d70ff

ferdnyc marked this pull request as draft September 27, 2025 05:25

ferdnyc mentioned this pull request Sep 27, 2025

Review Garbage Collection algorithm #16

Open

ferdnyc added 3 commits September 27, 2025 03:04

Define DEBUG_PRINTF macro in debug mode

beb65a8

Garbage collection: Fix full GC runs

f5d69fd

- Switch over to `DEBUG_PRINTF()` for debugging output - Add MIN() and MAX() macros borrowed from glibc - Rename tp_mark 'max' parameter to 'gc_max' - Fix loop logic to perform full GCs as expected

Update GC test, still doesn't really test GC

eeeacf6

ferdnyc commented Sep 27, 2025

View reviewed changes

ferdnyc mentioned this pull request Sep 30, 2025

Deletion issues #46

Open

ferdnyc mentioned this pull request Oct 2, 2025

Can't get GSConnect to connect to my Android phone after reboot GSConnect/gnome-shell-extension-gsconnect#2073

Open

Test and fix garbage collector #45

Are you sure you want to change the base?

Test and fix garbage collector #45

Uh oh!

Conversation

ferdnyc commented Sep 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ferdnyc Sep 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rofl0r Sep 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ferdnyc commented Sep 27, 2025

Uh oh!

ferdnyc commented Sep 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rainwoodman commented Sep 29, 2025 via email

Uh oh!

ferdnyc commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ferdnyc Sep 27, 2025 •

edited

Loading

rofl0r Sep 27, 2025 •

edited

Loading

ferdnyc commented Oct 1, 2025 •

edited

Loading