the intelligence rocket equation

This tweet caught my attention today:

It’s an interesting idea…what if we apply the same formalism as the classic rocket equation?

Ok, let’s construct an “intelligence rocket equation” by mirroring Tsiolkovsky’s derivation, replacing Newton’s momentum balance with Kaplan-style power-law scaling. The original rocket equation emerges from integrating an incremental law for velocity gain per unit fractional mass shed, $dv = v_e \frac{dm}{m}$ which integrates to

$$ \Delta v = v_e \ln\frac{m_0}{m_f} $$

The essential structure is that the cumulative change in a state variable is linear in the logarithm of a resource ratio. Kaplan’s empirical law for scaling laws in AI training provides a directly analogous form.

compute scaling

Let a scalar performance target monotonically improve as training resources increase. Empirically, for large models, the cross-entropy loss follows

$$ \mathcal{L}(C) = \mathcal{L}_\infty + A\, C^{-\alpha}, $$

**where $C$ is the compute budget, $\alpha > 0$ is the scaling exponent, $A > 0$ a constant, and $\mathcal{L}_\infty$ the irreducible floor for current data and optimization. Define the capability potential as

$$ V(C) \equiv -\ln\!\big(\mathcal{L}(C) - \mathcal{L}_\infty\big), $$

which then gives

$$ V(C) = -\ln A + \alpha \ln C \\

we thus obtain the compute scaling “rocket equation”

$$ \Delta V = \alpha \ln\!\frac{C_1}{C_0} $$