summaryrefslogtreecommitdiff
path: root/weak-refs.org
blob: 98cf5a78e778fe8f9dee7c2d1f47361fdfc48c75 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
* Adding weak references to CHICKEN's GC

#+STARTUP: inlineimages

                [[./chicken-logo.png]]


**                Peter Bex


           Village CHICKENS event
            Ziegenhagen, Germany
              June 3rd, 2023


* Why weak references?

** User "kluk" on IRC mentioned not having line numbers in the interpreter
** was a nonstarter.

** Discussion followed.  Why don't we have it?

*** Interpreter can read/eval an unbounded number of expressions
*** So, line number database would keep growing
*** Not great for memory usage - programs would get slower/crash over time

** How could this be fixed?

*** Have the line number db use weak pairs, with the input form collectable

** We already "kind of" have weak pairs, but not user-visible

* Quick refresher on data types

All data is represented by a machine word:
#+BEGIN_SRC c
  #ifdef _LP64
  # define C_word int64_t
  #else
  # define C_word int32_t
  #endif
  /* ...more or less */
#+END_SRC
A word can be interpreted as a pointer, and we use two bits to decide.

Lowest bit (0) set?  => fixnum (shift right by one to get value)
Next bit (1) set? => other immediate (more decoding needed)
Otherwise => pointer

** Pointers are aligned to word boundary!

Lowest two bits never set, can just dereference.

* Non-immediate object storage

** Typical compound Scheme objects like vectors, pairs, records, ...

Stored as a type tag followed by any number of slots.  Length is
encoded in the header (can be zero, but not likely).

:     ------------------------------------
:    |          |        |        |       |
:    | type+len | slot 1 | slot 2 | ....  |
:    |          |        |        |       |
:     ------------------------------------

#+BEGIN_SRC c
typedef struct C_block_struct
{
  C_header header;
  C_word data[];
} C_SCHEME_BLOCK;
#+END_SRC

* Collecting garbage

** Heap is always carved in two:
*** one "active" area where we allocate (fromspace)
*** one "target" area where objects will be copied to (tospace)

* Collecting garbage

** Everything starts out in the fromspace, with tospace empty
#+BEGIN_SRC scheme
  (cons 5 '())  ; unreferenced, may be collected
  (define x (cons (vector 42) (cons 1 '()))) ; sticks around
#+END_SRC

:     fromspace                   |             tospace
: ------------------------------------------------------------
:                                 |
:  --------------                 |
: |pair |fx 5|nil|                |
:  --------------                 |
:                                 |
:  ----------------               |
: |pair |#(42)|'(1)|              |              EMPTY
:  ----------------               |
:          |     |                |
:     -----       ---             |
:    |               |            |
:    v               v            |
:  ------------    -------------- |
: |vec(1)|fx 42|  |pair |fx 1|nil||
:  ------------    -------------- |

* Collecting garbage

** When collecting garbage, we copy live data fromspace => tospace

** This happens one object at a time


:     fromspace                   |             tospace
: ------------------------------------------------------------
:                                 |
:  --------------                 |
: |pair |fx 5|nil|                |
:  --------------                 |
:                                 |
:  ----------------               |
: |pair |#(42)|'(1)|              |
:  ----------------               |
:          |     |                |
:     -----       -----------------------  
:    |                            |     |
:    v                            |     v
:  ------------                   |   --------------
: |vec(1)|fx 42|                  |  |pair |fx 1|nil|
:  ------------                   |   --------------

* Collecting garbage

** When collecting garbage, we copy live data fromspace => tospace

** This happens one object at a time


:     fromspace                   |             tospace
: ------------------------------------------------------------
:                                 |
:  --------------                 |   ----------------
: |pair |fx 5|nil|                |  |pair |#(42)|'(1)|
:  --------------                 |   ----------------
:                                 |          |    |
:                                 |          |    |
:                                 |          |    |
:     ----------------------------------------    |
:    |                            |               |
:    |                            |      ---------
:    |                            |     |            
:    v                            |     v
:  ------------                   |   --------------
: |vec(1)|fx 42|                  |  |pair |fx 1|nil|
:  ------------                   |   --------------


* Collecting garbage

** When collecting garbage, we copy live data fromspace => tospace

** This happens one object at a time


:     fromspace                   |             tospace
: ------------------------------------------------------------
:                                 |
:  --------------                 |   ----------------
: |pair |fx 5|nil|                |  |pair |#(42)|'(1)|
:  --------------                 |   ----------------
:                                 |           |     |
:                                 |      -----      |
:                                 |     |           |
:                                 |     v           v
:                                 |  ------------  --------------
:                                 | |vec(1)|fx 42||pair |fx 1|nil|
:                                 |  ------------  --------------


* Collecting garbage

** After collecting garbage, "clear" fromspace and swap the two




:     /TO/space                   |         /FROM/space
: ------------------------------------------------------------
:                                 |
:                                 |   ----------------
:                                 |  |pair |#(42)|'(1)|
:                                 |   ----------------
:           EMPTY                 |           |     |
:                                 |      -----      |
:                                 |     |           |
:                                 |     v           v
:                                 |  ------------  --------------
:                                 | |vec(1)|fx 42||pair |fx 1|nil|
:                                 |  ------------  --------------

* Collecting garbage

** With multiple/shared references
#+BEGIN_SRC scheme
  (define shr (cons 1 '()))
  (define x (cons 2 shr))
  (define y (cons 3 shr)) ; list tail shared with y
#+END_SRC
:     fromspace                   |             tospace
: ------------------------------------------------------------
:                                 |
:  --------------                 |
: |pair |fx 1|nil|                |
:  --------------                 |
:    ^                            |
:    |-------------------------   |
:    |-----                    |  |
:         |                    |  |
:  --------------  -------------- |
: |pair |fx 2|shr||pair |fx 3|shr||
:  --------------  -------------- |


* Collecting garbage

** With multiple/shared references
** We move and leave a forwarding reference at the old location




:     fromspace                   |             tospace
: ------------------------------------------------------------
:                                 |
:  -------                        |  --------------
:  | FWD  |------------------------>|pair |fx 1|nil|
:  -------                        |  --------------
:    ^                            |
:    |-------------------------   |
:    |-----                    |  |
:         |                    |  |
:  --------------  -------------- |
: |pair |fx 2|shr||pair |fx 3|shr||
:  --------------  -------------- |

* Collecting garbage

** With multiple/shared references
** We move and leave a forwarding reference at the old location




:     fromspace                   |             tospace
: ------------------------------------------------------------
:                                 |
:  -------                        |  --------------
:  | FWD  |------------------------>|pair |fx 1|nil|
:  -------                        |  --------------
:    ^                            |    ^
:    |                            |    |---------       
:    |-----                       |              |
:         |                       |              |
:  --------------                 |  --------------
: |pair |fx 2|shr|                | |pair |fx 3|shr|
:  --------------                 |  --------------

* Collecting garbage

** With multiple/shared references
** We move and leave a forwarding reference at the old location




:     fromspace                   |             tospace
: ------------------------------------------------------------
:                                 |
:   ------                        |  --------------
:  | FWD  |------------------------>|pair |fx 1|nil|
:   ------                        |  --------------
:                                 |    ^
:                                 |    |------------------------- 
:                                 |    |---------              |
:                                 |             |              |
:                                 | --------------   --------------
:                                 ||pair |fx 2|shr| |pair |fx 3|shr|
:                                 | --------------   --------------

* Weak references

** Weak pairs: car is weakly held, cdr strongly
** We don't mark the car upon copying
#+BEGIN_SRC scheme
  (define w1 (weak-cons (vector 42) '()))
  (define x  (vector 123)) ; kept around
  (define w2 (weak-cons x '()))
#+END_SRC

:     fromspace                   |             tospace
: ------------------------------------------------------------
:                                 |
:  ------------    -------------  |
: |vec(1)|fx 42|  |vec(1)|fx 123| |
:  ------------    -------------  |
:    ^                ^           |
:    |                |---------  |
:    |-----                    |  |
:         |                    |  |
:  --------------   ------------- |
: |Wpair| ?? |nil| |Wpair| x |nil||
:  --------------   ------------- |

* Collecting garbage

** With WEAK references
** Pointers stay into fromspace
#+BEGIN_SRC scheme
  (define gb (vector 42)) ; garbage
  (define gone (weak-cons gb '()))
  (define x (cons 2 (cons 1 '()))
  (define y (weak-cons (cdr x) 3)) ; shares x's tail
  (set! gb #f) ; no root
#+END_SRC
:     fromspace                   |             tospace
: ------------------------------------------------------------
:                                 |
:  ------------    -------------  |
: |vec(1)|fx 42|  |vec(1)|fx 123| |
:  ------------    -------------  |
:    ^                ^           |
:    |                |-------------------------------------|
:    |--------------------------------------|               |
:                                 |         |               |
:                                 |  --------------   -------------
:                                 | |Wpair| ?? |nil| |Wpair| x |nil|
:                                 |  --------------   -------------

* Collecting garbage

** With WEAK references
** Live data gets copied, fwd pointer stays behind (as usual)
#+BEGIN_SRC scheme
  (define gb (vector 42)) ; garbage
  (define gone (weak-cons gb '()))
  (define x (cons 2 (cons 1 '()))
  (define y (weak-cons (cdr x) 3)) ; shares x's tail
  (set! gb #f) ; no root
#+END_SRC
:     fromspace                   |             tospace
: ------------------------------------------------------------
:                                 |
:  ------------    -----          |  ------------- 
: |vec(1)|fx 42|  | FWD |---------->|vec(1)|fx 123|
:  ------------    -----          |  ------------- 
:    ^                ^           |
:    |                |-------------------------------------|
:    |--------------------------------------|               |
:                                 |         |               |
:                                 |  --------------   -------------
:                                 | |Wpair| ?? |nil| |Wpair| x |nil|
:                                 |  --------------   -------------

* Collecting garbage

** With WEAK references
** After (major) GC, walk all weak pairs and check:

*** If car is in tospace, all good 👌
*** If car is FWD to tospace, chase ptr and update 👍
*** If car is in fromspace, clear it out ❌

**** MIT Scheme puts #f in there (a bit awkward - race condition!)
**** Chez Scheme has a special object a la #!eof
#+BEGIN_SRC scheme
  ;; (define gb (vector 42)) ; garbage
  ;; (define gone (weak-cons gb '()))
  ;; (define x (cons 2 (cons 1 '()))
  ;; (define y (weak-cons (cdr x) 3)) ; shares x's tail
  ;; (set! gb #f) ; no root
  (gc #t)
  ([weak-]car y) => ((1) . 3)
  ([weak-]car gone) => #!bwp   ; broken-weak-pointer, Chez
  (weak-car gone) => #f        ; MIT Scheme.  GC can happen after checking!
  (weak-pair/car? gone) => #t  ; MIT Scheme (#f if we stored literal #f as car)
#+END_SRC

* Weak references, a deeper dive

** With WEAK references

** After (major) GC, walk all weak pairs...

*** Wait, how do we even *do that*?


* Weak references, a deeper dive

** With WEAK references

** After (major) GC, walk all weak pairs...

*** Wait, how do we even *do that*?

**** I guess we have to keep track of them somehow...


* Weak references, in CHICKEN (weak pairs)

** Weak pairs are not exposed to Scheme code

** They are an implementation detail of symbol table

*** Added by yours truly to ensure symbols can be GCed

**** To make us resistant to symbol stuffing attacks

** Because of that, we can cheat: no "stray" weak pairs

*** Every weak pair is under a hash bucket in symbol table

*** Walk the hash table, then:

**** - Clear out weak pair GCed cars (NOTE: not stricly needed here)

**** - Drop pairs with cleared cars from the bucket's chain /REMEMBER THIS!/

***** set-cdr! prev pair's cdr (or bucket head) to empty pair's cdr

* Weak references, in CHICKEN (locatives)

** We also have /locatives/

:   --------------------------------------------------
:  | LOCATIVE | PTR | FX offset | FX subtype | OBJ/#f |
:   --------------------------------------------------

** The *PTR* is a C pointer /into/ the object, /plus/ the offset

*** Offset allows us to refer to a slot in a compound object

*** Or to an *index* of a SRFI-4 homogeneous numeric vector

** The *offset* is also stored (in fixnum form)

*** To recalculate when the target object got moved

#+BEGIN_SRC c
  ptr = C_block_item(loc, 0);
  offset = C_unfix(C_block_item(loc, 1));
  obj = ptr - offset;
#+END_SRC

* Weak references, in CHICKEN (locative)

** Weak locatives

:   --------------------------------------------------
:  | LOCATIVE | PTR | FX offset | FX subtype | OBJ/#f |
:   --------------------------------------------------

** The last slot either holds *OBJ* or #f

*** This is to ensure it is seen as a reference by the GC

*** When it's #f, it acts as a weak reference

** Locative type tag is marked as a "special block"

#+BEGIN_SRC c
  #define C_SPECIALBLOCK_BIT 0x2000000000000000L   /* 1st item is a non-value */
#+END_SRC

*** Therefore *PTR* can be a raw C pointer

* Weak references, in CHICKEN (locatives)

** So how can we walk all locatives, then?

** Simply keep track of *all of them* in a big table

*** Walk this table after GC, updating all *PTR* values

*** If object is still in fromspace (to be reclaimed):

**** *PTR* is set to /NULL/ (barf when deref'ed)

**** *OBJ* is set to /C_SCHEME_UNDEFINED/ (never seen by user)

** This is kind of a hassle, as the table is not in the Scheme heap

*** Malloc, realloc on the fly potentially any time you call
*** /make-locative/ because the table might be full


*** Also, drop locatives that were garbage themselves (ugh)
*** easy to detect: table's ptr points into fromspace (still, ugh)

* Implementing user-facing weak pairs

** How could we expose weak pairs to Scheme code?

** Naive approach: easy, just stuff all the weak pairs in a table

*** Exactly the same approach as locatives

*** Same drawbacks, but worse:

**** Schemers expect pairs to be cheap
**** /"Lisp programmers know the value of everything,/
**** /but the price of nothing"/ - Alan Perlis

**** Having all the (weak) pairs in another table would be
**** prohibitively expensive if freely used like regular pairs

***** lots of expensive reallocations (memmove), high memory use


* Implementing user-facing weak pairs

** These are available in other Schemes, lets's see how they do it

*** Chibi
**** Only ephemerons

*** Chez Scheme (and also Racket)
**** Weak pairs, ephemerons *and* guardians

*** MIT Scheme
**** Weak pairs and ephemerons

** Ephemerons are key/value pairs (much like a weak cons)
*** value (cdr) is kept alive when key (car) is kept alive
*** So the cdr is not a strong reference, like for weak pairs
*** But the cdr is not /exactly/ a weak reference, either!
**** Allows for collectable cyclic weak lists

** Guardians are pools that know what objects *would be* released
*** Allows you to "act" on garbage collection of objects
*** Sort of like an "on-demand" finalizer

* Implementing user-facing weak pairs

*** Chibi

**** Seems to use a linked list of objects as heap
**** Not *too* concerned with efficiency

*** Chez Scheme (and also Racket)

**** Uses noncopying mark and sweep collector, quite different

*** MIT Scheme

**** Gotcha!  Cheney style copying collector w/ semispaces

* Implementing user-facing weak pairs

** MIT Scheme uses tagged blocks like CHICKEN

*** Source code is also well-documented! (at least this bit)

** Only needs /a single pointer/ to support weak pairs in GC

*** Yet, collection is still O(n) on number of *live* weak pairs

* Implementing user-facing weak pairs

** Remember the weak pair situation?
*** First, let's draw the FWD pointers for the weak pairs, too


:     fromspace                   |             tospace
: ------------------------------------------------------------
:                                 |
:  ------------    -----          |  ------------- 
: |vec(1)|fx 42|  | FWD |---------->|vec(1)|fx 123|
:  ------------    -----          |  ------------- 
:    ^                ^           |
:    |                |-------------------------------------|
:    |--------------------------------------|               |
:                                 |         |               |            
:  -----                          |  --------------   -------------
: | FWD |-------------------------->|Wpair| ?? |nil| |Wpair| x |nil|
:  -----                          |  --------------   -------------
:  -----                          |                     ^       
: | FWD |-----------------------------------------------|
:  -----                          |

* Implementing user-facing weak pairs

** Remember the weak pair situation?
*** The forwarding pointer just replaces the header
*** /but the slots are still there, unused!/

:     fromspace                   |             tospace
: ------------------------------------------------------------
:  ------------    -----          |  ------------- 
: |vec(1)|fx 42|  | FWD |---------->|vec(1)|fx 123|
:  ------------    -----          |  ------------- 
:    ^                ^           |
:    |                |-------------------------------------|
:    |--------------------------------------|               |
:    :......          :           |         |               |
:  -----...:........  :           |  --------------   -------------
: | FWD |  ?? : nil :-:------------>|Wpair| ?? |nil| |Wpair| x |nil|
:  -----............  :           |  --------------   -------------
:  -----............  :           |                     ^       
: | FWD |  x  : nil :-:---------------------------------|
:  -----............  :           |
:          :          :           |
:          :..........:           |

* Implementing user-facing weak pairs

** We can chain up the (recovered) weak pairs in a linked list
*** Every single one of them will have at least one forwarding ptr
*** Link to the first, then use the old car as ptr to next, etc

:     fromspace                   |             tospace
: ------------------------------------------------------------
:                                 |
:  -----.............             |   --------------   -------------
: | FWD : NULL : w/e :-------------->|Wpair| ?? |nil| |Wpair| x |nil|
:  -----.............             |   --------------   -------------
:    ^                            |                     ^
:    |------                      |                     |
:          |                      |                     |
:  -----............              |                     |       
: | FWD : $$$ : w/e :-----------------------------------|
:  -----............              |
:    ^
:    |   ----------- 
:    |--| START $$$ |
:       ------------

* Implementing user-facing weak pairs

** We can chain up the (recovered) weak pairs in a linked list
*** This is done during GC, when we move the pair
*** We're constructing the FWD pointer there already anyway
*** Note that this means that pair *must* be live

** After GC is done, traverse the linked list chain from START and:
*** - Follow the FWD pointer to tospace, where the new pair lives
*** - Update new pair's slot 1 (the car field), as before:
**** If car is in tospace, all good 👌
**** If car is FWD to tospace, chase ptr and update 👍
**** If car is in fromspace, clear it out ❌
*** - Read the "old car" pointer and use it to find the next pair to update

** This is O(n) on the number of live weak pairs

** We already have O(n) on the number of objects for copying in GC

** So, it just adds a constant factor

* Symbol table

** Remember when we traversed the symbol table?

*** We'd drop unref'ed weak pairs, essentially "compressing" the table's chains

*** Unfortunately, we still need to do this at some point

*** Can still be done after GC

**** Adds another O(n) on the number of symbols to major GC times

*** We can also opportunistically do this when traversing the bucket chain

**** Adds minor overhead to symbol table lookups/interning

* Locatives

** Remember those?

:  --------------------------------------------------
: | LOCATIVE | PTR | FX offset | FX subtype | OBJ/#f |
:  --------------------------------------------------

** We have a luxurious 4 extra slots to use!

*** Apply the same "hack"

:  ----------.......................................
: |   FWD    | $$$ : whatever : whatever : whatever :
:  ----------.......................................

*** Could *even* chain these up in the same chain as weak pairs!
**** Probably not useful, though - still need updating code for tospace locative

** I think we can get away with dropping the locative table (which contains /all/)!

*** The only reason it exists is for updating locative /PTR/ fields
*** More efficient - only FWD pointers for *live* locatives in the chain

* Future work/questions

** Two weak pair APIs; Chez and MIT Scheme's.  Choose which one.
*** MIT Scheme has a fully distinct type, weak-car/weak-cdr
**** Awkward to distinguish #f from broken pointer; no bwp-object? predicate
*** In Chez, car/cdr work on weak pairs and ephemerons
**** Expedient(?), but less Schemely.  Can distinguish with {weak,ephemeron}-pair?

** Add native support for weak hash tables?
*** Idea: finalizer-like gc callback hook, only invoked for *live* objects
*** Allows us to do this fully in pure Scheme; drop broken-weak-pointer pairs

** Add ephemerons?  Again, multiple APIs:
*** - Chibi (uses SRFI-124; make-ephemeron, ephemeron-{key,datum,broken?})
*** - MIT Scheme (SRFI-124 was lifted from there, with one addition)
*** - Chez (ephemeron-{cons,car,cdr,pair?}, bwp-object?, works with car/cdr)

*** Not sure if this is useful.  Chibi, MIT and Chez Scheme seem to think so

** Add guardians?  Kinda cool, but also kinda odd.  Practically useful?
*** Finalizers could be reimplemented in terms of these
*** Worth the effort?

* Thank you




**                         Questions?