Not a formal proof, but there is this fun theorem called period 3 implies chaos that my gut instinct says applies here.
Basically if you have a continuous mapping from [a,b] -> [a,b] and there exists a 3 cycle then that implies every other cycle length exists.
Which in this case would kinda say that if you are bouncing between three values on the y axis (and the bouncing is a continuous function which admittedly the gradient of a relu is not) you are probably in a chaotic system
Now that requires assuming that the behaviour of y is largely a function of just y. But their derivation seems to imply that it is the case.
Not a formal proof, but there is this fun theorem called period 3 implies chaos that my gut instinct says applies here.
Basically if you have a continuous mapping from [a,b] -> [a,b] and there exists a 3 cycle then that implies every other cycle length exists.
Which in this case would kinda say that if you are bouncing between three values on the y axis (and the bouncing is a continuous function which admittedly the gradient of a relu is not) you are probably in a chaotic system
Now that requires assuming that the behaviour of y is largely a function of just y. But their derivation seems to imply that it is the case.