From 9457d8ed820d36f1492481413e454c4b20b61b42 Mon Sep 17 00:00:00 2001 From: Peter Bex Date: Thu, 1 Jun 2023 21:47:55 +0200 Subject: Initial story about GC --- chicken-logo.png | Bin 0 -> 27542 bytes weak-refs.org | 709 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 709 insertions(+) create mode 100644 chicken-logo.png create mode 100644 weak-refs.org diff --git a/chicken-logo.png b/chicken-logo.png new file mode 100644 index 0000000..f936457 Binary files /dev/null and b/chicken-logo.png differ diff --git a/weak-refs.org b/weak-refs.org new file mode 100644 index 0000000..fb97694 --- /dev/null +++ b/weak-refs.org @@ -0,0 +1,709 @@ +* Adding weak references to CHICKEN's GC + +#+STARTUP: inlineimages + + [[./chicken-logo.png]] + + +** Peter Bex + + + Village CHICKENS event + Ziegenhagen, Germany + June 3rd, 2023 + + +* Quick refresher on data types + +All data is represented by a machine word: +#+BEGIN_SRC c + #ifdef _LP64 + # define C_word int64_t + #else + # define C_word int32_t + #endif + /* ...more or less */ +#+END_SRC +A word can be interpreted as a pointer, and we use two bits to decide. + +Lowest bit (0) set? => fixnum (shift right by one to get value) +Next bit (1) set? => other immediate (more decoding needed) +Otherwise => pointer + +** Pointers are aligned to word boundary! + +Lowest two bits never set, can just dereference. + +* Non-immediate object storage + +** Typical compound Scheme objects like vectors, pairs, records, ... + +Stored as a type tag followed by any number of slots. Length is +encoded in the header (can be zero, but not likely). + +: ------------------------------------ +: | | | | | +: | type+len | slot 1 | slot 2 | .... | +: | | | | | +: ------------------------------------ + +#+BEGIN_SRC c +typedef struct C_block_struct +{ + C_header header; + C_word data[]; +} C_SCHEME_BLOCK; +#+END_SRC + +* Collecting garbage + +** Heap is always carved in two: +*** one "active" area where we allocate (fromspace) +*** one "target" area where objects will be copied to (tospace) + +* Collecting garbage + +** Everything starts out in the fromspace, with tospace empty +#+BEGIN_SRC scheme + (cons 5 '()) ; unreferenced, may be collected + (define x (cons (vector 42) (cons 1 '()))) ; sticks around +#+END_SRC + +: fromspace | tospace +: ------------------------------------------------------------ +: | +: -------------- | +: |pair |fx 5|nil| | +: -------------- | +: | +: ---------------- | +: |pair |#(42)|'(1)| | EMPTY +: ---------------- | +: | | | +: ----- --- | +: | | | +: v v | +: ------------ -------------- | +: |vec(1)|fx 42| |pair |fx 1|nil|| +: ------------ -------------- | + +* Collecting garbage + +** When collecting garbage, we copy live data fromspace => tospace + +** This happens one object at a time + + +: fromspace | tospace +: ------------------------------------------------------------ +: | +: -------------- | +: |pair |fx 5|nil| | +: -------------- | +: | +: ---------------- | +: |pair |#(42)|'(1)| | +: ---------------- | +: | | | +: ----- ----------------------- +: | | | +: v | v +: ------------ | -------------- +: |vec(1)|fx 42| | |pair |fx 1|nil| +: ------------ | -------------- + +* Collecting garbage + +** When collecting garbage, we copy live data fromspace => tospace + +** This happens one object at a time + + +: fromspace | tospace +: ------------------------------------------------------------ +: | +: -------------- | ---------------- +: |pair |fx 5|nil| | |pair |#(42)|'(1)| +: -------------- | ---------------- +: | | | +: | | | +: | | | +: ---------------------------------------- | +: | | | +: | | --------- +: | | | +: v | v +: ------------ | -------------- +: |vec(1)|fx 42| | |pair |fx 1|nil| +: ------------ | -------------- + + +* Collecting garbage + +** When collecting garbage, we copy live data fromspace => tospace + +** This happens one object at a time + + +: fromspace | tospace +: ------------------------------------------------------------ +: | +: -------------- | ---------------- +: |pair |fx 5|nil| | |pair |#(42)|'(1)| +: -------------- | ---------------- +: | | | +: | ----- | +: | | | +: | v v +: | ------------ -------------- +: | |vec(1)|fx 42||pair |fx 1|nil| +: | ------------ -------------- + + +* Collecting garbage + +** After collecting garbage, "clear" fromspace and swap the two + + + + +: /TO/space | /FROM/space +: ------------------------------------------------------------ +: | +: | ---------------- +: | |pair |#(42)|'(1)| +: | ---------------- +: EMPTY | | | +: | ----- | +: | | | +: | v v +: | ------------ -------------- +: | |vec(1)|fx 42||pair |fx 1|nil| +: | ------------ -------------- + +* Collecting garbage + +** With multiple/shared references +#+BEGIN_SRC scheme + (define shr (cons 1 '())) + (define x (cons 2 shr)) + (define y (cons 3 shr)) ; list tail shared with y +#+END_SRC +: fromspace | tospace +: ------------------------------------------------------------ +: | +: -------------- | +: |pair |fx 1|nil| | +: -------------- | +: ^ | +: |------------------------- | +: |----- | | +: | | | +: -------------- -------------- | +: |pair |fx 2|shr||pair |fx 3|shr|| +: -------------- -------------- | + + +* Collecting garbage + +** With multiple/shared references +** We move and leave a forwarding reference at the old location + + + + +: fromspace | tospace +: ------------------------------------------------------------ +: | +: ------- | -------------- +: | FWD |------------------------>|pair |fx 1|nil| +: ------- | -------------- +: ^ | +: |------------------------- | +: |----- | | +: | | | +: -------------- -------------- | +: |pair |fx 2|shr||pair |fx 3|shr|| +: -------------- -------------- | + +* Collecting garbage + +** With multiple/shared references +** We move and leave a forwarding reference at the old location + + + + +: fromspace | tospace +: ------------------------------------------------------------ +: | +: ------- | -------------- +: | FWD |------------------------>|pair |fx 1|nil| +: ------- | -------------- +: ^ | ^ +: | | |--------- +: |----- | | +: | | | +: -------------- | -------------- +: |pair |fx 2|shr| | |pair |fx 3|shr| +: -------------- | -------------- + +* Collecting garbage + +** With multiple/shared references +** We move and leave a forwarding reference at the old location + + + + +: fromspace | tospace +: ------------------------------------------------------------ +: | +: ------ | -------------- +: | FWD |------------------------>|pair |fx 1|nil| +: ------ | -------------- +: | ^ +: | |------------------------- +: | |--------- | +: | | | +: | -------------- -------------- +: ||pair |fx 2|shr| |pair |fx 3|shr| +: | -------------- -------------- + +* Weak references + +** Weak pairs: car is weakly held, cdr strongly +** We don't mark the car upon copying +#+BEGIN_SRC scheme + (define w1 (weak-cons (vector 42) '())) + (define x (vector 123)) ; kept around + (define w2 (weak-cons x '())) +#+END_SRC + +: fromspace | tospace +: ------------------------------------------------------------ +: | +: ------------ ------------- | +: |vec(1)|fx 42| |vec(1)|fx 123| | +: ------------ ------------- | +: ^ ^ | +: | |--------- | +: |----- | | +: | | | +: -------------- ------------- | +: |Wpair| ?? |nil| |Wpair| x |nil|| +: -------------- ------------- | + +* Collecting garbage + +** With WEAK references +** Pointers stay into fromspace +#+BEGIN_SRC scheme + (define gb (vector 42)) ; garbage + (define gone (weak-cons gb '())) + (define x (cons 2 (cons 1 '())) + (define y (weak-cons (cdr x) 3)) ; shares x's tail + (set! gb #f) ; no root +#+END_SRC +: fromspace | tospace +: ------------------------------------------------------------ +: | +: ------------ ------------- | +: |vec(1)|fx 42| |vec(1)|fx 123| | +: ------------ ------------- | +: ^ ^ | +: | |-------------------------------------| +: |--------------------------------------| | +: | | | +: | -------------- ------------- +: | |Wpair| ?? |nil| |Wpair| x |nil| +: | -------------- ------------- + +* Collecting garbage + +** With WEAK references +** Live data gets copied, fwd pointer stays behind (as usual) +#+BEGIN_SRC scheme + (define gb (vector 42)) ; garbage + (define gone (weak-cons gb '())) + (define x (cons 2 (cons 1 '())) + (define y (weak-cons (cdr x) 3)) ; shares x's tail + (set! gb #f) ; no root +#+END_SRC +: fromspace | tospace +: ------------------------------------------------------------ +: | +: ------------ ----- | ------------- +: |vec(1)|fx 42| | FWD |---------->|vec(1)|fx 123| +: ------------ ----- | ------------- +: ^ ^ | +: | |-------------------------------------| +: |--------------------------------------| | +: | | | +: | -------------- ------------- +: | |Wpair| ?? |nil| |Wpair| x |nil| +: | -------------- ------------- + +* Collecting garbage + +** With WEAK references +** After (major) GC, walk all weak pairs and check: + +*** If car is in tospace, all good 👌 +*** If car is FWD to tospace, chase ptr and update 👍 +*** If car is in fromspace, clear it out ❌ + +**** MIT Scheme puts #f in there (a bit awkward - race condition!) +**** Chez Scheme has a special object a la #!eof +#+BEGIN_SRC scheme + ;; (define gb (vector 42)) ; garbage + ;; (define gone (weak-cons gb '())) + ;; (define x (cons 2 (cons 1 '())) + ;; (define y (weak-cons (cdr x) 3)) ; shares x's tail + ;; (set! gb #f) ; no root + (gc #t) + ([weak-]car y) => ((1) . 3) + ([weak-]car gone) => #!bwp ; broken-weak-pointer, Chez + (weak-car gone) => #f ; MIT Scheme. GC can happen after checking! + (weak-pair/car? gone) => #t ; MIT Scheme (#f if we stored literal #f as car) +#+END_SRC + +* Weak references, a deeper dive + +** With WEAK references + +** After (major) GC, walk all weak pairs... + +*** Wait, how do we even *do that*? + + +* Weak references, a deeper dive + +** With WEAK references + +** After (major) GC, walk all weak pairs... + +*** Wait, how do we even *do that*? + +**** I guess we have to keep track of them somehow... + + +* Weak references, in CHICKEN (weak pairs) + +** Weak pairs are not exposed to Scheme code + +** They are an implementation detail of symbol table + +*** Added by yours truly to ensure symbols can be GCed + +**** To make us resistant to symbol stuffing attacks + +** Because of that, we can cheat: no "stray" weak pairs + +*** Every weak pair is under a hash bucket in symbol table + +*** Walk the hash table, then: + +**** - Clear out weak pair GCed cars (NOTE: not stricly needed here) + +**** - Drop pairs with cleared cars from the bucket's chain /REMEMBER THIS!/ + +***** set-cdr! prev pair's cdr (or bucket head) to empty pair's cdr + +* Weak references, in CHICKEN (locatives) + +** We also have /locatives/ + +: -------------------------------------------------- +: | LOCATIVE | PTR | FX offset | FX subtype | OBJ/#f | +: -------------------------------------------------- + +** The *PTR* is a C pointer /into/ the object, /plus/ the offset + +*** Offset allows us to refer to a slot in a compound object + +*** Or to an *index* of a SRFI-4 homogeneous numeric vector + +** The *offset* is also stored (in fixnum form) + +*** To recalculate when the target object got moved + +#+BEGIN_SRC c + ptr = C_block_item(loc, 0); + offset = C_unfix(C_block_item(loc, 1)); + obj = ptr - offset; +#+END_SRC + +* Weak references, in CHICKEN (locative) + +** Weak locatives + +: -------------------------------------------------- +: | LOCATIVE | PTR | FX offset | FX subtype | OBJ/#f | +: -------------------------------------------------- + +** The last slot either holds *OBJ* or #f + +*** This is to ensure it is seen as a reference by the GC + +*** When it's #f, it acts as a weak reference + +** Locative type tag is marked as a "special block" + +#+BEGIN_SRC c + #define C_SPECIALBLOCK_BIT 0x2000000000000000L /* 1st item is a non-value */ +#+END_SRC + +*** Therefore *PTR* can be a raw C pointer + +* Weak references, in CHICKEN (locatives) + +** So how can we walk all locatives, then? + +** Simply keep track of *all of them* in a big table + +*** Walk this table after GC, updating all *PTR* values + +*** If object is still in fromspace (to be reclaimed): + +**** *PTR* is set to /NULL/ (barf when deref'ed) + +**** *OBJ* is set to /C_SCHEME_UNDEFINED/ (never seen by user) + +** This is kind of a hassle, as the table is not in the Scheme heap + +*** Malloc, realloc on the fly potentially any time you call +*** /make-locative/ because the table might be full + + +*** Also, drop locatives that were garbage themselves (ugh) +*** easy to detect: table's ptr points into fromspace (still, ugh) + +* Implementing user-facing weak pairs + +** How could we expose weak pairs to Scheme code? + +** Naive approach: easy, just stuff all the weak pairs in a table + +*** Exactly the same approach as locatives + +*** Same drawbacks, but worse: + +**** Schemers expect pairs to be cheap +**** /"Lisp programmers know the value of everything,/ +**** /but the price of nothing"/ - Alan Perlis + +**** Having all the (weak) pairs in another table would be +**** prohibitively expensive if freely used like regular pairs + +***** lots of expensive reallocations (memmove), high memory use + + +* Implementing user-facing weak pairs + +** These are available in other Schemes, lets's see how they do it + +*** Chibi +**** Only ephemerons + +*** Chez Scheme (and also Racket) +**** Weak pairs, ephemerons *and* guardians + +*** MIT Scheme +**** Weak pairs and ephemerons + +** Ephemerons are key/value pairs (much like a weak cons) +*** value (cdr) is kept alive when key (car) is kept alive +*** So the cdr is not a strong reference, like for weak pairs +*** But the cdr is not /exactly/ a weak reference, either! +**** Allows for collectable cyclic weak lists + +** Guardians are pools that know what objects *would be* released +*** Allows you to "act" on garbage collection of objects +*** Sort of like an "on-demand" finalizer + +* Implementing user-facing weak pairs + +*** Chibi + +**** Seems to use a linked list of objects as heap +**** Not *too* concerned with efficiency + +*** Chez Scheme (and also Racket) + +**** Uses noncopying mark and sweep collector, quite different + +*** MIT Scheme + +**** Gotcha! Cheney style copying collector w/ semispaces + +* Implementing user-facing weak pairs + +** MIT Scheme uses tagged blocks like CHICKEN + +*** Source code is also well-documented! (at least this bit) + +** Only needs /a single pointer/ to support weak pairs in GC + +*** Yet, collection is still O(n) on number of *live* weak pairs + +* Implementing user-facing weak pairs + +** Remember the weak pair situation? +*** First, let's draw the FWD pointers for the weak pairs, too + + +: fromspace | tospace +: ------------------------------------------------------------ +: | +: ------------ ----- | ------------- +: |vec(1)|fx 42| | FWD |---------->|vec(1)|fx 123| +: ------------ ----- | ------------- +: ^ ^ | +: | |-------------------------------------| +: |--------------------------------------| | +: | | | +: ----- | -------------- ------------- +: | FWD |-------------------------->|Wpair| ?? |nil| |Wpair| x |nil| +: ----- | -------------- ------------- +: ----- | ^ +: | FWD |-----------------------------------------------| +: ----- | + +* Implementing user-facing weak pairs + +** Remember the weak pair situation? +*** The forwarding pointer just replaces the header +*** /but the slots are still there, unused!/ + +: fromspace | tospace +: ------------------------------------------------------------ +: ------------ ----- | ------------- +: |vec(1)|fx 42| | FWD |---------->|vec(1)|fx 123| +: ------------ ----- | ------------- +: ^ ^ | +: | |-------------------------------------| +: |--------------------------------------| | +: :...... : | | | +: -----...:........ : | -------------- ------------- +: | FWD | ?? : nil :-:------------>|Wpair| ?? |nil| |Wpair| x |nil| +: -----............ : | -------------- ------------- +: -----............ : | ^ +: | FWD | x : nil :-:---------------------------------| +: -----............ : | +: : : | +: :..........: | + +* Implementing user-facing weak pairs + +** We can chain up the (recovered) weak pairs in a linked list +*** Every single one of them will have at least one forwarding ptr +*** Link to the first, then use the old car as ptr to next, etc + +: fromspace | tospace +: ------------------------------------------------------------ +: | +: -----............. | -------------- ------------- +: | FWD : NULL : w/e :-------------->|Wpair| ?? |nil| |Wpair| x |nil| +: -----............. | -------------- ------------- +: ^ | ^ +: |------ | | +: | | | +: -----............ | | +: | FWD : $$$ : w/e :-----------------------------------| +: -----............ | +: ^ +: | ----------- +: |--| START $$$ | +: ------------ + +* Implementing user-facing weak pairs + +** We can chain up the (recovered) weak pairs in a linked list +*** This is done during GC, when we move the pair +*** We're constructing the FWD pointer there already anyway +*** Note that this means that pair *must* be live + +** After GC is done, traverse the linked list chain from START and: +*** - Follow the FWD pointer to tospace, where the new pair lives +*** - Update new pair's slot 1 (the car field), as before: +**** If car is in tospace, all good 👌 +**** If car is FWD to tospace, chase ptr and update 👍 +**** If car is in fromspace, clear it out ❌ +*** - Read the "old car" pointer and use it to find the next pair to update + +** This is O(n) on the number of live weak pairs + +** We already have O(n) on the number of objects for copying in GC + +** So, it just adds a constant factor + +* Symbol table + +** Remember when we traversed the symbol table? + +*** We'd drop unref'ed weak pairs, essentially "compressing" the table's chains + +*** Unfortunately, we still need to do this at some point + +*** Can still be done after GC + +**** Adds another O(n) on the number of symbols to major GC times + +*** We can also opportunistically do this when traversing the bucket chain + +**** Adds minor overhead to symbol table lookups/interning + +* Locatives + +** Remember those? + +: -------------------------------------------------- +: | LOCATIVE | PTR | FX offset | FX subtype | OBJ/#f | +: -------------------------------------------------- + +** We have a luxurious 4 extra slots to use! + +*** Apply the same "hack" + +: ----------....................................... +: | FWD | $$$ : whatever : whatever : whatever : +: ----------....................................... + +*** Could *even* chain these up in the same chain as weak pairs! +**** Probably not useful, though - still need updating code for tospace locative + +** I think we can get away with dropping the locative table (which contains /all/)! + +*** The only reason it exists is for updating locative /PTR/ fields +*** More efficient - only FWD pointers for *live* locatives in the chain + +* Future work/questions + +** Two weak pair APIs; Chez and MIT Scheme's. Choose which one. +*** MIT Scheme has a fully distinct type, weak-car/weak-cdr +**** Awkward to distinguish #f from broken pointer; no bwp-object? predicate +*** In Chez, car/cdr work on weak pairs and ephemerons +**** Expedient(?), but less Schemely. Can distinguish with {weak,ephemeron}-pair? + +** Add native support for weak hash tables? +*** Idea: finalizer-like gc callback hook, only invoked for *live* objects +*** Allows us to do this fully in pure Scheme; drop broken-weak-pointer pairs + +** Add ephemerons? Again, multiple APIs: +*** - Chibi (uses SRFI-124; make-ephemeron, ephemeron-{key,datum,broken?}) +*** - MIT Scheme (SRFI-124 was lifted from there, with one addition) +*** - Chez (ephemeron-{cons,car,cdr,pair?}, bwp-object?, works with car/cdr) + +*** Not sure if this is useful. Chibi, MIT and Chez Scheme seem to think so + +** Add guardians? Kinda cool, but also kinda odd. Practically useful? +*** Finalizers could be reimplemented in terms of these +*** Worth the effort? + +* Thank you + + + + +** Questions? -- cgit v1.2.3