* Adding weak references to CHICKEN's GC #+STARTUP: inlineimages [[./chicken-logo.png]] ** Peter Bex Village CHICKENS event Ziegenhagen, Germany June 3rd, 2023 * Why weak references? ** User "kluk" on IRC mentioned not having line numbers in the interpreter ** was a nonstarter. ** Discussion followed. Why don't we have it? *** Interpreter can read/eval an unbounded number of expressions *** So, line number database would keep growing *** Not great for memory usage - programs would get slower/crash over time ** How could this be fixed? *** Have the line number db use weak pairs, with the input form collectable ** We already "kind of" have weak pairs, but not user-visible * Quick refresher on data types All data is represented by a machine word: #+BEGIN_SRC c #ifdef _LP64 # define C_word int64_t #else # define C_word int32_t #endif /* ...more or less */ #+END_SRC A word can be interpreted as a pointer, and we use two bits to decide. Lowest bit (0) set? => fixnum (shift right by one to get value) Next bit (1) set? => other immediate (more decoding needed) Otherwise => pointer ** Pointers are aligned to word boundary! Lowest two bits never set, can just dereference. * Non-immediate object storage ** Typical compound Scheme objects like vectors, pairs, records, ... Stored as a type tag followed by any number of slots. Length is encoded in the header (can be zero, but not likely). : ------------------------------------ : | | | | | : | type+len | slot 1 | slot 2 | .... | : | | | | | : ------------------------------------ #+BEGIN_SRC c typedef struct C_block_struct { C_header header; C_word data[]; } C_SCHEME_BLOCK; #+END_SRC * Collecting garbage ** Heap is always carved in two: *** one "active" area where we allocate (fromspace) *** one "target" area where objects will be copied to (tospace) * Collecting garbage ** Everything starts out in the fromspace, with tospace empty #+BEGIN_SRC scheme (cons 5 '()) ; unreferenced, may be collected (define x (cons (vector 42) (cons 1 '()))) ; sticks around #+END_SRC : fromspace | tospace : ------------------------------------------------------------ : | : -------------- | : |pair |fx 5|nil| | : -------------- | : | : ---------------- | : |pair |#(42)|'(1)| | EMPTY : ---------------- | : | | | : ----- --- | : | | | : v v | : ------------ -------------- | : |vec(1)|fx 42| |pair |fx 1|nil|| : ------------ -------------- | * Collecting garbage ** When collecting garbage, we copy live data fromspace => tospace ** This happens one object at a time : fromspace | tospace : ------------------------------------------------------------ : | : -------------- | : |pair |fx 5|nil| | : -------------- | : | : ---------------- | : |pair |#(42)|'(1)| | : ---------------- | : | | | : ----- ----------------------- : | | | : v | v : ------------ | -------------- : |vec(1)|fx 42| | |pair |fx 1|nil| : ------------ | -------------- * Collecting garbage ** When collecting garbage, we copy live data fromspace => tospace ** This happens one object at a time : fromspace | tospace : ------------------------------------------------------------ : | : -------------- | ---------------- : |pair |fx 5|nil| | |pair |#(42)|'(1)| : -------------- | ---------------- : | | | : | | | : | | | : ---------------------------------------- | : | | | : | | --------- : | | | : v | v : ------------ | -------------- : |vec(1)|fx 42| | |pair |fx 1|nil| : ------------ | -------------- * Collecting garbage ** When collecting garbage, we copy live data fromspace => tospace ** This happens one object at a time : fromspace | tospace : ------------------------------------------------------------ : | : -------------- | ---------------- : |pair |fx 5|nil| | |pair |#(42)|'(1)| : -------------- | ---------------- : | | | : | ----- | : | | | : | v v : | ------------ -------------- : | |vec(1)|fx 42||pair |fx 1|nil| : | ------------ -------------- * Collecting garbage ** After collecting garbage, "clear" fromspace and swap the two : /TO/space | /FROM/space : ------------------------------------------------------------ : | : | ---------------- : | |pair |#(42)|'(1)| : | ---------------- : EMPTY | | | : | ----- | : | | | : | v v : | ------------ -------------- : | |vec(1)|fx 42||pair |fx 1|nil| : | ------------ -------------- * Collecting garbage ** With multiple/shared references #+BEGIN_SRC scheme (define shr (cons 1 '())) (define x (cons 2 shr)) (define y (cons 3 shr)) ; list tail shared with y #+END_SRC : fromspace | tospace : ------------------------------------------------------------ : | : -------------- | : |pair |fx 1|nil| | : -------------- | : ^ | : |------------------------- | : |----- | | : | | | : -------------- -------------- | : |pair |fx 2|shr||pair |fx 3|shr|| : -------------- -------------- | * Collecting garbage ** With multiple/shared references ** We move and leave a forwarding reference at the old location : fromspace | tospace : ------------------------------------------------------------ : | : ------- | -------------- : | FWD |------------------------>|pair |fx 1|nil| : ------- | -------------- : ^ | : |------------------------- | : |----- | | : | | | : -------------- -------------- | : |pair |fx 2|shr||pair |fx 3|shr|| : -------------- -------------- | * Collecting garbage ** With multiple/shared references ** We move and leave a forwarding reference at the old location : fromspace | tospace : ------------------------------------------------------------ : | : ------- | -------------- : | FWD |------------------------>|pair |fx 1|nil| : ------- | -------------- : ^ | ^ : | | |--------- : |----- | | : | | | : -------------- | -------------- : |pair |fx 2|shr| | |pair |fx 3|shr| : -------------- | -------------- * Collecting garbage ** With multiple/shared references ** We move and leave a forwarding reference at the old location : fromspace | tospace : ------------------------------------------------------------ : | : ------ | -------------- : | FWD |------------------------>|pair |fx 1|nil| : ------ | -------------- : | ^ : | |------------------------- : | |--------- | : | | | : | -------------- -------------- : ||pair |fx 2|shr| |pair |fx 3|shr| : | -------------- -------------- * Weak references ** Weak pairs: car is weakly held, cdr strongly ** We don't mark the car upon copying #+BEGIN_SRC scheme (define w1 (weak-cons (vector 42) '())) (define x (vector 123)) ; kept around (define w2 (weak-cons x '())) #+END_SRC : fromspace | tospace : ------------------------------------------------------------ : | : ------------ ------------- | : |vec(1)|fx 42| |vec(1)|fx 123| | : ------------ ------------- | : ^ ^ | : | |--------- | : |----- | | : | | | : -------------- ------------- | : |Wpair| ?? |nil| |Wpair| x |nil|| : -------------- ------------- | * Collecting garbage ** With WEAK references ** Pointers stay into fromspace #+BEGIN_SRC scheme (define gb (vector 42)) ; garbage (define gone (weak-cons gb '())) (define x (cons 2 (cons 1 '())) (define y (weak-cons (cdr x) 3)) ; shares x's tail (set! gb #f) ; no root #+END_SRC : fromspace | tospace : ------------------------------------------------------------ : | : ------------ ------------- | : |vec(1)|fx 42| |vec(1)|fx 123| | : ------------ ------------- | : ^ ^ | : | |-------------------------------------| : |--------------------------------------| | : | | | : | -------------- ------------- : | |Wpair| ?? |nil| |Wpair| x |nil| : | -------------- ------------- * Collecting garbage ** With WEAK references ** Live data gets copied, fwd pointer stays behind (as usual) #+BEGIN_SRC scheme (define gb (vector 42)) ; garbage (define gone (weak-cons gb '())) (define x (cons 2 (cons 1 '())) (define y (weak-cons (cdr x) 3)) ; shares x's tail (set! gb #f) ; no root #+END_SRC : fromspace | tospace : ------------------------------------------------------------ : | : ------------ ----- | ------------- : |vec(1)|fx 42| | FWD |---------->|vec(1)|fx 123| : ------------ ----- | ------------- : ^ ^ | : | |-------------------------------------| : |--------------------------------------| | : | | | : | -------------- ------------- : | |Wpair| ?? |nil| |Wpair| x |nil| : | -------------- ------------- * Collecting garbage ** With WEAK references ** After (major) GC, walk all weak pairs and check: *** If car is in tospace, all good 👌 *** If car is FWD to tospace, chase ptr and update 👍 *** If car is in fromspace, clear it out ❌ **** MIT Scheme puts #f in there (a bit awkward - race condition!) **** Chez Scheme has a special object a la #!eof #+BEGIN_SRC scheme ;; (define gb (vector 42)) ; garbage ;; (define gone (weak-cons gb '())) ;; (define x (cons 2 (cons 1 '())) ;; (define y (weak-cons (cdr x) 3)) ; shares x's tail ;; (set! gb #f) ; no root (gc #t) ([weak-]car y) => ((1) . 3) ([weak-]car gone) => #!bwp ; broken-weak-pointer, Chez (weak-car gone) => #f ; MIT Scheme. GC can happen after checking! (weak-pair/car? gone) => #t ; MIT Scheme (#f if we stored literal #f as car) #+END_SRC * Weak references, a deeper dive ** With WEAK references ** After (major) GC, walk all weak pairs... *** Wait, how do we even *do that*? * Weak references, a deeper dive ** With WEAK references ** After (major) GC, walk all weak pairs... *** Wait, how do we even *do that*? **** I guess we have to keep track of them somehow... * Weak references, in CHICKEN (weak pairs) ** Weak pairs are not exposed to Scheme code ** They are an implementation detail of symbol table *** Added by yours truly to ensure symbols can be GCed **** To make us resistant to symbol stuffing attacks ** Because of that, we can cheat: no "stray" weak pairs *** Every weak pair is under a hash bucket in symbol table *** Walk the hash table, then: **** - Clear out weak pair GCed cars (NOTE: not stricly needed here) **** - Drop pairs with cleared cars from the bucket's chain /REMEMBER THIS!/ ***** set-cdr! prev pair's cdr (or bucket head) to empty pair's cdr * Weak references, in CHICKEN (locatives) ** We also have /locatives/ : -------------------------------------------------- : | LOCATIVE | PTR | FX offset | FX subtype | OBJ/#f | : -------------------------------------------------- ** The *PTR* is a C pointer /into/ the object, /plus/ the offset *** Offset allows us to refer to a slot in a compound object *** Or to an *index* of a SRFI-4 homogeneous numeric vector ** The *offset* is also stored (in fixnum form) *** To recalculate when the target object got moved #+BEGIN_SRC c ptr = C_block_item(loc, 0); offset = C_unfix(C_block_item(loc, 1)); obj = ptr - offset; #+END_SRC * Weak references, in CHICKEN (locative) ** Weak locatives : -------------------------------------------------- : | LOCATIVE | PTR | FX offset | FX subtype | OBJ/#f | : -------------------------------------------------- ** The last slot either holds *OBJ* or #f *** This is to ensure it is seen as a reference by the GC *** When it's #f, it acts as a weak reference ** Locative type tag is marked as a "special block" #+BEGIN_SRC c #define C_SPECIALBLOCK_BIT 0x2000000000000000L /* 1st item is a non-value */ #+END_SRC *** Therefore *PTR* can be a raw C pointer * Weak references, in CHICKEN (locatives) ** So how can we walk all locatives, then? ** Simply keep track of *all of them* in a big table *** Walk this table after GC, updating all *PTR* values *** If object is still in fromspace (to be reclaimed): **** *PTR* is set to /NULL/ (barf when deref'ed) **** *OBJ* is set to /C_SCHEME_UNDEFINED/ (never seen by user) ** This is kind of a hassle, as the table is not in the Scheme heap *** Malloc, realloc on the fly potentially any time you call *** /make-locative/ because the table might be full *** Also, drop locatives that were garbage themselves (ugh) *** easy to detect: table's ptr points into fromspace (still, ugh) * Implementing user-facing weak pairs ** How could we expose weak pairs to Scheme code? ** Naive approach: easy, just stuff all the weak pairs in a table *** Exactly the same approach as locatives *** Same drawbacks, but worse: **** Schemers expect pairs to be cheap **** /"Lisp programmers know the value of everything,/ **** /but the price of nothing"/ - Alan Perlis **** Having all the (weak) pairs in another table would be **** prohibitively expensive if freely used like regular pairs ***** lots of expensive reallocations (memmove), high memory use * Implementing user-facing weak pairs ** These are available in other Schemes, lets's see how they do it *** Chibi **** Only ephemerons *** Chez Scheme (and also Racket) **** Weak pairs, ephemerons *and* guardians *** MIT Scheme **** Weak pairs and ephemerons ** Ephemerons are key/value pairs (much like a weak cons) *** value (cdr) is kept alive when key (car) is kept alive *** So the cdr is not a strong reference, like for weak pairs *** But the cdr is not /exactly/ a weak reference, either! **** Allows for collectable cyclic weak lists ** Guardians are pools that know what objects *would be* released *** Allows you to "act" on garbage collection of objects *** Sort of like an "on-demand" finalizer * Implementing user-facing weak pairs *** Chibi **** Seems to use a linked list of objects as heap **** Not *too* concerned with efficiency *** Chez Scheme (and also Racket) **** Uses noncopying mark and sweep collector, quite different *** MIT Scheme **** Gotcha! Cheney style copying collector w/ semispaces * Implementing user-facing weak pairs ** MIT Scheme uses tagged blocks like CHICKEN *** Source code is also well-documented! (at least this bit) ** Only needs /a single pointer/ to support weak pairs in GC *** Yet, collection is still O(n) on number of *live* weak pairs * Implementing user-facing weak pairs ** Remember the weak pair situation? *** First, let's draw the FWD pointers for the weak pairs, too : fromspace | tospace : ------------------------------------------------------------ : | : ------------ ----- | ------------- : |vec(1)|fx 42| | FWD |---------->|vec(1)|fx 123| : ------------ ----- | ------------- : ^ ^ | : | |-------------------------------------| : |--------------------------------------| | : | | | : ----- | -------------- ------------- : | FWD |-------------------------->|Wpair| ?? |nil| |Wpair| x |nil| : ----- | -------------- ------------- : ----- | ^ : | FWD |-----------------------------------------------| : ----- | * Implementing user-facing weak pairs ** Remember the weak pair situation? *** The forwarding pointer just replaces the header *** /but the slots are still there, unused!/ : fromspace | tospace : ------------------------------------------------------------ : ------------ ----- | ------------- : |vec(1)|fx 42| | FWD |---------->|vec(1)|fx 123| : ------------ ----- | ------------- : ^ ^ | : | |-------------------------------------| : |--------------------------------------| | : :...... : | | | : -----...:........ : | -------------- ------------- : | FWD | ?? : nil :-:------------>|Wpair| ?? |nil| |Wpair| x |nil| : -----............ : | -------------- ------------- : -----............ : | ^ : | FWD | x : nil :-:---------------------------------| : -----............ : | : : : | : :..........: | * Implementing user-facing weak pairs ** We can chain up the (recovered) weak pairs in a linked list *** Every single one of them will have at least one forwarding ptr *** Link to the first, then use the old car as ptr to next, etc : fromspace | tospace : ------------------------------------------------------------ : | : -----............. | -------------- ------------- : | FWD : NULL : w/e :-------------->|Wpair| ?? |nil| |Wpair| x |nil| : -----............. | -------------- ------------- : ^ | ^ : |------ | | : | | | : -----............ | | : | FWD : $$$ : w/e :-----------------------------------| : -----............ | : ^ : | ----------- : |--| START $$$ | : ------------ * Implementing user-facing weak pairs ** We can chain up the (recovered) weak pairs in a linked list *** This is done during GC, when we move the pair *** We're constructing the FWD pointer there already anyway *** Note that this means that pair *must* be live ** After GC is done, traverse the linked list chain from START and: *** - Follow the FWD pointer to tospace, where the new pair lives *** - Update new pair's slot 1 (the car field), as before: **** If car is in tospace, all good 👌 **** If car is FWD to tospace, chase ptr and update 👍 **** If car is in fromspace, clear it out ❌ *** - Read the "old car" pointer and use it to find the next pair to update ** This is O(n) on the number of live weak pairs ** We already have O(n) on the number of objects for copying in GC ** So, it just adds a constant factor * Symbol table ** Remember when we traversed the symbol table? *** We'd drop unref'ed weak pairs, essentially "compressing" the table's chains *** Unfortunately, we still need to do this at some point *** Can still be done after GC **** Adds another O(n) on the number of symbols to major GC times *** We can also opportunistically do this when traversing the bucket chain **** Adds minor overhead to symbol table lookups/interning * Locatives ** Remember those? : -------------------------------------------------- : | LOCATIVE | PTR | FX offset | FX subtype | OBJ/#f | : -------------------------------------------------- ** We have a luxurious 4 extra slots to use! *** Apply the same "hack" : ----------....................................... : | FWD | $$$ : whatever : whatever : whatever : : ----------....................................... *** Could *even* chain these up in the same chain as weak pairs! **** Probably not useful, though - still need updating code for tospace locative ** I think we can get away with dropping the locative table (which contains /all/)! *** The only reason it exists is for updating locative /PTR/ fields *** More efficient - only FWD pointers for *live* locatives in the chain * Future work/questions ** Two weak pair APIs; Chez and MIT Scheme's. Choose which one. *** MIT Scheme has a fully distinct type, weak-car/weak-cdr **** Awkward to distinguish #f from broken pointer; no bwp-object? predicate *** In Chez, car/cdr work on weak pairs and ephemerons **** Expedient(?), but less Schemely. Can distinguish with {weak,ephemeron}-pair? ** Add native support for weak hash tables? *** Idea: finalizer-like gc callback hook, only invoked for *live* objects *** Allows us to do this fully in pure Scheme; drop broken-weak-pointer pairs ** Add ephemerons? Again, multiple APIs: *** - Chibi (uses SRFI-124; make-ephemeron, ephemeron-{key,datum,broken?}) *** - MIT Scheme (SRFI-124 was lifted from there, with one addition) *** - Chez (ephemeron-{cons,car,cdr,pair?}, bwp-object?, works with car/cdr) *** Not sure if this is useful. Chibi, MIT and Chez Scheme seem to think so ** Add guardians? Kinda cool, but also kinda odd. Practically useful? *** Finalizers could be reimplemented in terms of these *** Worth the effort? * Thank you ** Questions?