Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a PoC of making the cycle collector generational.
In some applications, there is an object that transitively references everything (e.g. the DI container or the UnitOfWork), and many objects transitively reference this object back. As a result, most GC runs in these applications will end up scanning the entire application. This defeats the premise of the cycle collector.
Based on that, a generational cycle collector should improve throughput to some degree.
Unlike a tracing GC, we don't need to keep track of created inter-generational references, so this turns out relatively simple.
This is enabled by setting the environment variable
FULL_GC_FREQ
to a non-zero value (e.g. 10 will run a full GC every 10 runs).I've used the following script to test this PoC: https://gist.github.com/arnaud-lb/6bfb493f361e056979571941f1b85990. This simulates a typical Doctrine or Symfony application with a very large DI container or UnitOfWork; a worse-case scenario for the GC.
With
FULL_GC_FREQ=10
, the benchmark runs 10-20% faster when the root object (the Tree class) is large, or it creates garbage often (keeping the threshold low). The benchmark runs slower when the root object is relatively small and it does not create garbage often, as in comparison the non-gen GC will keep the threshold high in this case. This could be mitigated by improving how the threshold is adjusted in the gen GC. In the first case the non-gen GC is slow because of the low threshold, so we may achieve similar results in the non-gen GC with better adjustment of the threshold, at the cost of higher peak mm usage than the gen GC.Of course this is only one benchmark. This needs to be tested on real applications.
This should improve throughput and average pause time, but not maximum pause time.
Related: #17131
Implementation details:
GC_OLD
. This flag is not part ofGC_INFO_MASK
, so it's ignored byGC_INFO_MASK()
/GC_REMOVE_FROM_BUFFER()
/GC_REF_SET_INFO()
and persists across GC runs. Caveat: Had to reclaim a bit fromGC_ADDRESS
.gc_mark_roots()
either ignores OLD nodes (in partial runs), or removes the OLD flag (in full runs)gc_scan_roots()
always ignoresOLD
nodesgc_collect_roots()
movesOLD
roots to the old roots buffer (unless this is a full run, as in this case we have proven these roots are not part of a garbage cycle).