OCamlot - (Untitled post)

<a href="https://mastodon.social/@samth" class="u-url mention">@samth</a> <a href="https://mastodon.social/@wingo" class="u-url mention">@wingo</a> <a href="https://mastodon.social/@sayrer" class="u-url mention">@sayrer</a>I think you’re asking why did the industry make such inefficient collectors, given they’re not fools.In short, there was little incentive to do better. There are two high level issues:1. Methodology. 2. Lack of innovation.The costs we expose simply were not being measured. Two broad failings: not understanding overhead (addressed by LBO). Not measuring user-experienced latency.You can “solve”the first one by discounting resources (making out that they’re free). Gil Tene has often pointed out that most cloud VMs are priced so that they come vastly over-provisioned with memory—so use it!! The falicy here is that ultimately someone is paying for that memory and the power used to keep it on. This is why those collectors are not used by data center scale applications, where the cost of those overheads is measured in $10M’s. Second, most GC vendors focus on “GC pause time,” which is a proxy metric that doesn’t translate to user-experienced pauses, which is where the rubber hits the road. With that backdrop, and those incentives, the lack of innovation is unsurprising. G1, C4, Shen. and ZGC all share the same basic design, which is fundamentally limited by its complete dependence on tracing (slow reclamation, high drag), and strictly copying, which is brutally expensive when made concurrent. (See our LXR paper for a completely different approach.)So it should be no surprise that companies like Google are so dependent on C++ — those collector costs are untenable for many / most data center scale applications.