Actually, even on CPU if you go over the limit and end up swapping to disk your performance can fall off a cliff (or you can even get killed by the OOM killer). Computer systems have grown though and so this is less of an issue for typical programs these days, to the point that we often don’t even consider it.
I believe that ML systems will end up in the same scenario, though of course a subset of power users will know how to push their systems to the limit and understand the consequences; we should assume power users are only a relatively small subset of users (otherwise, I argue, we haven’t built a very usable system).
Mehdi’s example of live ranges shows that it can be NP-hard even in the simplest case to provide true guarantees. And throw in a little bit of dynamicity, control flow, etc. and it rapidly becomes undecidable what can trigger memory exhaustion, to the point that even the simplest form of CSE can provide no real guarantees. Therefore, designing our systems to respect some semantically load-bearing notion of guaranteed memory usage seems like it will be a) futile and b) overconstraining.
So yeah, this isn’t something that falls only on us in the compiler to solve. It’s up to hardware, runtime, etc. folks in collaboration with us to make a usable practical system.