summaryrefslogtreecommitdiff
path: root/weak-refs.org
diff options
context:
space:
mode:
authorPeter Bex <peter@more-magic.net>2023-06-01 21:47:55 +0200
committerPeter Bex <peter@more-magic.net>2023-06-02 17:03:27 +0200
commit9457d8ed820d36f1492481413e454c4b20b61b42 (patch)
tree3f71e8e2ee910e28dc185da9ca1b47ab95e83faf /weak-refs.org
downloadweak-refs-for-chicken-9457d8ed820d36f1492481413e454c4b20b61b42.tar.gz
Initial story about GC
Diffstat (limited to 'weak-refs.org')
-rw-r--r--weak-refs.org709
1 files changed, 709 insertions, 0 deletions
diff --git a/weak-refs.org b/weak-refs.org
new file mode 100644
index 0000000..fb97694
--- /dev/null
+++ b/weak-refs.org
@@ -0,0 +1,709 @@
+* Adding weak references to CHICKEN's GC
+
+#+STARTUP: inlineimages
+
+ [[./chicken-logo.png]]
+
+
+** Peter Bex
+
+
+ Village CHICKENS event
+ Ziegenhagen, Germany
+ June 3rd, 2023
+
+
+* Quick refresher on data types
+
+All data is represented by a machine word:
+#+BEGIN_SRC c
+ #ifdef _LP64
+ # define C_word int64_t
+ #else
+ # define C_word int32_t
+ #endif
+ /* ...more or less */
+#+END_SRC
+A word can be interpreted as a pointer, and we use two bits to decide.
+
+Lowest bit (0) set? => fixnum (shift right by one to get value)
+Next bit (1) set? => other immediate (more decoding needed)
+Otherwise => pointer
+
+** Pointers are aligned to word boundary!
+
+Lowest two bits never set, can just dereference.
+
+* Non-immediate object storage
+
+** Typical compound Scheme objects like vectors, pairs, records, ...
+
+Stored as a type tag followed by any number of slots. Length is
+encoded in the header (can be zero, but not likely).
+
+: ------------------------------------
+: | | | | |
+: | type+len | slot 1 | slot 2 | .... |
+: | | | | |
+: ------------------------------------
+
+#+BEGIN_SRC c
+typedef struct C_block_struct
+{
+ C_header header;
+ C_word data[];
+} C_SCHEME_BLOCK;
+#+END_SRC
+
+* Collecting garbage
+
+** Heap is always carved in two:
+*** one "active" area where we allocate (fromspace)
+*** one "target" area where objects will be copied to (tospace)
+
+* Collecting garbage
+
+** Everything starts out in the fromspace, with tospace empty
+#+BEGIN_SRC scheme
+ (cons 5 '()) ; unreferenced, may be collected
+ (define x (cons (vector 42) (cons 1 '()))) ; sticks around
+#+END_SRC
+
+: fromspace | tospace
+: ------------------------------------------------------------
+: |
+: -------------- |
+: |pair |fx 5|nil| |
+: -------------- |
+: |
+: ---------------- |
+: |pair |#(42)|'(1)| | EMPTY
+: ---------------- |
+: | | |
+: ----- --- |
+: | | |
+: v v |
+: ------------ -------------- |
+: |vec(1)|fx 42| |pair |fx 1|nil||
+: ------------ -------------- |
+
+* Collecting garbage
+
+** When collecting garbage, we copy live data fromspace => tospace
+
+** This happens one object at a time
+
+
+: fromspace | tospace
+: ------------------------------------------------------------
+: |
+: -------------- |
+: |pair |fx 5|nil| |
+: -------------- |
+: |
+: ---------------- |
+: |pair |#(42)|'(1)| |
+: ---------------- |
+: | | |
+: ----- -----------------------
+: | | |
+: v | v
+: ------------ | --------------
+: |vec(1)|fx 42| | |pair |fx 1|nil|
+: ------------ | --------------
+
+* Collecting garbage
+
+** When collecting garbage, we copy live data fromspace => tospace
+
+** This happens one object at a time
+
+
+: fromspace | tospace
+: ------------------------------------------------------------
+: |
+: -------------- | ----------------
+: |pair |fx 5|nil| | |pair |#(42)|'(1)|
+: -------------- | ----------------
+: | | |
+: | | |
+: | | |
+: ---------------------------------------- |
+: | | |
+: | | ---------
+: | | |
+: v | v
+: ------------ | --------------
+: |vec(1)|fx 42| | |pair |fx 1|nil|
+: ------------ | --------------
+
+
+* Collecting garbage
+
+** When collecting garbage, we copy live data fromspace => tospace
+
+** This happens one object at a time
+
+
+: fromspace | tospace
+: ------------------------------------------------------------
+: |
+: -------------- | ----------------
+: |pair |fx 5|nil| | |pair |#(42)|'(1)|
+: -------------- | ----------------
+: | | |
+: | ----- |
+: | | |
+: | v v
+: | ------------ --------------
+: | |vec(1)|fx 42||pair |fx 1|nil|
+: | ------------ --------------
+
+
+* Collecting garbage
+
+** After collecting garbage, "clear" fromspace and swap the two
+
+
+
+
+: /TO/space | /FROM/space
+: ------------------------------------------------------------
+: |
+: | ----------------
+: | |pair |#(42)|'(1)|
+: | ----------------
+: EMPTY | | |
+: | ----- |
+: | | |
+: | v v
+: | ------------ --------------
+: | |vec(1)|fx 42||pair |fx 1|nil|
+: | ------------ --------------
+
+* Collecting garbage
+
+** With multiple/shared references
+#+BEGIN_SRC scheme
+ (define shr (cons 1 '()))
+ (define x (cons 2 shr))
+ (define y (cons 3 shr)) ; list tail shared with y
+#+END_SRC
+: fromspace | tospace
+: ------------------------------------------------------------
+: |
+: -------------- |
+: |pair |fx 1|nil| |
+: -------------- |
+: ^ |
+: |------------------------- |
+: |----- | |
+: | | |
+: -------------- -------------- |
+: |pair |fx 2|shr||pair |fx 3|shr||
+: -------------- -------------- |
+
+
+* Collecting garbage
+
+** With multiple/shared references
+** We move and leave a forwarding reference at the old location
+
+
+
+
+: fromspace | tospace
+: ------------------------------------------------------------
+: |
+: ------- | --------------
+: | FWD |------------------------>|pair |fx 1|nil|
+: ------- | --------------
+: ^ |
+: |------------------------- |
+: |----- | |
+: | | |
+: -------------- -------------- |
+: |pair |fx 2|shr||pair |fx 3|shr||
+: -------------- -------------- |
+
+* Collecting garbage
+
+** With multiple/shared references
+** We move and leave a forwarding reference at the old location
+
+
+
+
+: fromspace | tospace
+: ------------------------------------------------------------
+: |
+: ------- | --------------
+: | FWD |------------------------>|pair |fx 1|nil|
+: ------- | --------------
+: ^ | ^
+: | | |---------
+: |----- | |
+: | | |
+: -------------- | --------------
+: |pair |fx 2|shr| | |pair |fx 3|shr|
+: -------------- | --------------
+
+* Collecting garbage
+
+** With multiple/shared references
+** We move and leave a forwarding reference at the old location
+
+
+
+
+: fromspace | tospace
+: ------------------------------------------------------------
+: |
+: ------ | --------------
+: | FWD |------------------------>|pair |fx 1|nil|
+: ------ | --------------
+: | ^
+: | |-------------------------
+: | |--------- |
+: | | |
+: | -------------- --------------
+: ||pair |fx 2|shr| |pair |fx 3|shr|
+: | -------------- --------------
+
+* Weak references
+
+** Weak pairs: car is weakly held, cdr strongly
+** We don't mark the car upon copying
+#+BEGIN_SRC scheme
+ (define w1 (weak-cons (vector 42) '()))
+ (define x (vector 123)) ; kept around
+ (define w2 (weak-cons x '()))
+#+END_SRC
+
+: fromspace | tospace
+: ------------------------------------------------------------
+: |
+: ------------ ------------- |
+: |vec(1)|fx 42| |vec(1)|fx 123| |
+: ------------ ------------- |
+: ^ ^ |
+: | |--------- |
+: |----- | |
+: | | |
+: -------------- ------------- |
+: |Wpair| ?? |nil| |Wpair| x |nil||
+: -------------- ------------- |
+
+* Collecting garbage
+
+** With WEAK references
+** Pointers stay into fromspace
+#+BEGIN_SRC scheme
+ (define gb (vector 42)) ; garbage
+ (define gone (weak-cons gb '()))
+ (define x (cons 2 (cons 1 '()))
+ (define y (weak-cons (cdr x) 3)) ; shares x's tail
+ (set! gb #f) ; no root
+#+END_SRC
+: fromspace | tospace
+: ------------------------------------------------------------
+: |
+: ------------ ------------- |
+: |vec(1)|fx 42| |vec(1)|fx 123| |
+: ------------ ------------- |
+: ^ ^ |
+: | |-------------------------------------|
+: |--------------------------------------| |
+: | | |
+: | -------------- -------------
+: | |Wpair| ?? |nil| |Wpair| x |nil|
+: | -------------- -------------
+
+* Collecting garbage
+
+** With WEAK references
+** Live data gets copied, fwd pointer stays behind (as usual)
+#+BEGIN_SRC scheme
+ (define gb (vector 42)) ; garbage
+ (define gone (weak-cons gb '()))
+ (define x (cons 2 (cons 1 '()))
+ (define y (weak-cons (cdr x) 3)) ; shares x's tail
+ (set! gb #f) ; no root
+#+END_SRC
+: fromspace | tospace
+: ------------------------------------------------------------
+: |
+: ------------ ----- | -------------
+: |vec(1)|fx 42| | FWD |---------->|vec(1)|fx 123|
+: ------------ ----- | -------------
+: ^ ^ |
+: | |-------------------------------------|
+: |--------------------------------------| |
+: | | |
+: | -------------- -------------
+: | |Wpair| ?? |nil| |Wpair| x |nil|
+: | -------------- -------------
+
+* Collecting garbage
+
+** With WEAK references
+** After (major) GC, walk all weak pairs and check:
+
+*** If car is in tospace, all good 👌
+*** If car is FWD to tospace, chase ptr and update 👍
+*** If car is in fromspace, clear it out ❌
+
+**** MIT Scheme puts #f in there (a bit awkward - race condition!)
+**** Chez Scheme has a special object a la #!eof
+#+BEGIN_SRC scheme
+ ;; (define gb (vector 42)) ; garbage
+ ;; (define gone (weak-cons gb '()))
+ ;; (define x (cons 2 (cons 1 '()))
+ ;; (define y (weak-cons (cdr x) 3)) ; shares x's tail
+ ;; (set! gb #f) ; no root
+ (gc #t)
+ ([weak-]car y) => ((1) . 3)
+ ([weak-]car gone) => #!bwp ; broken-weak-pointer, Chez
+ (weak-car gone) => #f ; MIT Scheme. GC can happen after checking!
+ (weak-pair/car? gone) => #t ; MIT Scheme (#f if we stored literal #f as car)
+#+END_SRC
+
+* Weak references, a deeper dive
+
+** With WEAK references
+
+** After (major) GC, walk all weak pairs...
+
+*** Wait, how do we even *do that*?
+
+
+* Weak references, a deeper dive
+
+** With WEAK references
+
+** After (major) GC, walk all weak pairs...
+
+*** Wait, how do we even *do that*?
+
+**** I guess we have to keep track of them somehow...
+
+
+* Weak references, in CHICKEN (weak pairs)
+
+** Weak pairs are not exposed to Scheme code
+
+** They are an implementation detail of symbol table
+
+*** Added by yours truly to ensure symbols can be GCed
+
+**** To make us resistant to symbol stuffing attacks
+
+** Because of that, we can cheat: no "stray" weak pairs
+
+*** Every weak pair is under a hash bucket in symbol table
+
+*** Walk the hash table, then:
+
+**** - Clear out weak pair GCed cars (NOTE: not stricly needed here)
+
+**** - Drop pairs with cleared cars from the bucket's chain /REMEMBER THIS!/
+
+***** set-cdr! prev pair's cdr (or bucket head) to empty pair's cdr
+
+* Weak references, in CHICKEN (locatives)
+
+** We also have /locatives/
+
+: --------------------------------------------------
+: | LOCATIVE | PTR | FX offset | FX subtype | OBJ/#f |
+: --------------------------------------------------
+
+** The *PTR* is a C pointer /into/ the object, /plus/ the offset
+
+*** Offset allows us to refer to a slot in a compound object
+
+*** Or to an *index* of a SRFI-4 homogeneous numeric vector
+
+** The *offset* is also stored (in fixnum form)
+
+*** To recalculate when the target object got moved
+
+#+BEGIN_SRC c
+ ptr = C_block_item(loc, 0);
+ offset = C_unfix(C_block_item(loc, 1));
+ obj = ptr - offset;
+#+END_SRC
+
+* Weak references, in CHICKEN (locative)
+
+** Weak locatives
+
+: --------------------------------------------------
+: | LOCATIVE | PTR | FX offset | FX subtype | OBJ/#f |
+: --------------------------------------------------
+
+** The last slot either holds *OBJ* or #f
+
+*** This is to ensure it is seen as a reference by the GC
+
+*** When it's #f, it acts as a weak reference
+
+** Locative type tag is marked as a "special block"
+
+#+BEGIN_SRC c
+ #define C_SPECIALBLOCK_BIT 0x2000000000000000L /* 1st item is a non-value */
+#+END_SRC
+
+*** Therefore *PTR* can be a raw C pointer
+
+* Weak references, in CHICKEN (locatives)
+
+** So how can we walk all locatives, then?
+
+** Simply keep track of *all of them* in a big table
+
+*** Walk this table after GC, updating all *PTR* values
+
+*** If object is still in fromspace (to be reclaimed):
+
+**** *PTR* is set to /NULL/ (barf when deref'ed)
+
+**** *OBJ* is set to /C_SCHEME_UNDEFINED/ (never seen by user)
+
+** This is kind of a hassle, as the table is not in the Scheme heap
+
+*** Malloc, realloc on the fly potentially any time you call
+*** /make-locative/ because the table might be full
+
+
+*** Also, drop locatives that were garbage themselves (ugh)
+*** easy to detect: table's ptr points into fromspace (still, ugh)
+
+* Implementing user-facing weak pairs
+
+** How could we expose weak pairs to Scheme code?
+
+** Naive approach: easy, just stuff all the weak pairs in a table
+
+*** Exactly the same approach as locatives
+
+*** Same drawbacks, but worse:
+
+**** Schemers expect pairs to be cheap
+**** /"Lisp programmers know the value of everything,/
+**** /but the price of nothing"/ - Alan Perlis
+
+**** Having all the (weak) pairs in another table would be
+**** prohibitively expensive if freely used like regular pairs
+
+***** lots of expensive reallocations (memmove), high memory use
+
+
+* Implementing user-facing weak pairs
+
+** These are available in other Schemes, lets's see how they do it
+
+*** Chibi
+**** Only ephemerons
+
+*** Chez Scheme (and also Racket)
+**** Weak pairs, ephemerons *and* guardians
+
+*** MIT Scheme
+**** Weak pairs and ephemerons
+
+** Ephemerons are key/value pairs (much like a weak cons)
+*** value (cdr) is kept alive when key (car) is kept alive
+*** So the cdr is not a strong reference, like for weak pairs
+*** But the cdr is not /exactly/ a weak reference, either!
+**** Allows for collectable cyclic weak lists
+
+** Guardians are pools that know what objects *would be* released
+*** Allows you to "act" on garbage collection of objects
+*** Sort of like an "on-demand" finalizer
+
+* Implementing user-facing weak pairs
+
+*** Chibi
+
+**** Seems to use a linked list of objects as heap
+**** Not *too* concerned with efficiency
+
+*** Chez Scheme (and also Racket)
+
+**** Uses noncopying mark and sweep collector, quite different
+
+*** MIT Scheme
+
+**** Gotcha! Cheney style copying collector w/ semispaces
+
+* Implementing user-facing weak pairs
+
+** MIT Scheme uses tagged blocks like CHICKEN
+
+*** Source code is also well-documented! (at least this bit)
+
+** Only needs /a single pointer/ to support weak pairs in GC
+
+*** Yet, collection is still O(n) on number of *live* weak pairs
+
+* Implementing user-facing weak pairs
+
+** Remember the weak pair situation?
+*** First, let's draw the FWD pointers for the weak pairs, too
+
+
+: fromspace | tospace
+: ------------------------------------------------------------
+: |
+: ------------ ----- | -------------
+: |vec(1)|fx 42| | FWD |---------->|vec(1)|fx 123|
+: ------------ ----- | -------------
+: ^ ^ |
+: | |-------------------------------------|
+: |--------------------------------------| |
+: | | |
+: ----- | -------------- -------------
+: | FWD |-------------------------->|Wpair| ?? |nil| |Wpair| x |nil|
+: ----- | -------------- -------------
+: ----- | ^
+: | FWD |-----------------------------------------------|
+: ----- |
+
+* Implementing user-facing weak pairs
+
+** Remember the weak pair situation?
+*** The forwarding pointer just replaces the header
+*** /but the slots are still there, unused!/
+
+: fromspace | tospace
+: ------------------------------------------------------------
+: ------------ ----- | -------------
+: |vec(1)|fx 42| | FWD |---------->|vec(1)|fx 123|
+: ------------ ----- | -------------
+: ^ ^ |
+: | |-------------------------------------|
+: |--------------------------------------| |
+: :...... : | | |
+: -----...:........ : | -------------- -------------
+: | FWD | ?? : nil :-:------------>|Wpair| ?? |nil| |Wpair| x |nil|
+: -----............ : | -------------- -------------
+: -----............ : | ^
+: | FWD | x : nil :-:---------------------------------|
+: -----............ : |
+: : : |
+: :..........: |
+
+* Implementing user-facing weak pairs
+
+** We can chain up the (recovered) weak pairs in a linked list
+*** Every single one of them will have at least one forwarding ptr
+*** Link to the first, then use the old car as ptr to next, etc
+
+: fromspace | tospace
+: ------------------------------------------------------------
+: |
+: -----............. | -------------- -------------
+: | FWD : NULL : w/e :-------------->|Wpair| ?? |nil| |Wpair| x |nil|
+: -----............. | -------------- -------------
+: ^ | ^
+: |------ | |
+: | | |
+: -----............ | |
+: | FWD : $$$ : w/e :-----------------------------------|
+: -----............ |
+: ^
+: | -----------
+: |--| START $$$ |
+: ------------
+
+* Implementing user-facing weak pairs
+
+** We can chain up the (recovered) weak pairs in a linked list
+*** This is done during GC, when we move the pair
+*** We're constructing the FWD pointer there already anyway
+*** Note that this means that pair *must* be live
+
+** After GC is done, traverse the linked list chain from START and:
+*** - Follow the FWD pointer to tospace, where the new pair lives
+*** - Update new pair's slot 1 (the car field), as before:
+**** If car is in tospace, all good 👌
+**** If car is FWD to tospace, chase ptr and update 👍
+**** If car is in fromspace, clear it out ❌
+*** - Read the "old car" pointer and use it to find the next pair to update
+
+** This is O(n) on the number of live weak pairs
+
+** We already have O(n) on the number of objects for copying in GC
+
+** So, it just adds a constant factor
+
+* Symbol table
+
+** Remember when we traversed the symbol table?
+
+*** We'd drop unref'ed weak pairs, essentially "compressing" the table's chains
+
+*** Unfortunately, we still need to do this at some point
+
+*** Can still be done after GC
+
+**** Adds another O(n) on the number of symbols to major GC times
+
+*** We can also opportunistically do this when traversing the bucket chain
+
+**** Adds minor overhead to symbol table lookups/interning
+
+* Locatives
+
+** Remember those?
+
+: --------------------------------------------------
+: | LOCATIVE | PTR | FX offset | FX subtype | OBJ/#f |
+: --------------------------------------------------
+
+** We have a luxurious 4 extra slots to use!
+
+*** Apply the same "hack"
+
+: ----------.......................................
+: | FWD | $$$ : whatever : whatever : whatever :
+: ----------.......................................
+
+*** Could *even* chain these up in the same chain as weak pairs!
+**** Probably not useful, though - still need updating code for tospace locative
+
+** I think we can get away with dropping the locative table (which contains /all/)!
+
+*** The only reason it exists is for updating locative /PTR/ fields
+*** More efficient - only FWD pointers for *live* locatives in the chain
+
+* Future work/questions
+
+** Two weak pair APIs; Chez and MIT Scheme's. Choose which one.
+*** MIT Scheme has a fully distinct type, weak-car/weak-cdr
+**** Awkward to distinguish #f from broken pointer; no bwp-object? predicate
+*** In Chez, car/cdr work on weak pairs and ephemerons
+**** Expedient(?), but less Schemely. Can distinguish with {weak,ephemeron}-pair?
+
+** Add native support for weak hash tables?
+*** Idea: finalizer-like gc callback hook, only invoked for *live* objects
+*** Allows us to do this fully in pure Scheme; drop broken-weak-pointer pairs
+
+** Add ephemerons? Again, multiple APIs:
+*** - Chibi (uses SRFI-124; make-ephemeron, ephemeron-{key,datum,broken?})
+*** - MIT Scheme (SRFI-124 was lifted from there, with one addition)
+*** - Chez (ephemeron-{cons,car,cdr,pair?}, bwp-object?, works with car/cdr)
+
+*** Not sure if this is useful. Chibi, MIT and Chez Scheme seem to think so
+
+** Add guardians? Kinda cool, but also kinda odd. Practically useful?
+*** Finalizers could be reimplemented in terms of these
+*** Worth the effort?
+
+* Thank you
+
+
+
+
+** Questions?