This tweet caught my attention today:

image.png

It’s an interesting idea…what if we apply the same formalism as the classic rocket equation?

Ok, let’s construct an “intelligence rocket equation” by mirroring Tsiolkovsky’s derivation, replacing Newton’s momentum balance with Kaplan-style power-law scaling. The original rocket equation emerges from integrating an incremental law for velocity gain per unit fractional mass shed, $dv = v_e \frac{dm}{m}$ which integrates to

$$ \Delta v = v_e \ln\frac{m_0}{m_f} $$

The essential structure is that the cumulative change in a state variable is linear in the logarithm of a resource ratio. Kaplan’s empirical law for scaling laws in AI training provides a directly analogous form.

compute scaling

Let a scalar performance target monotonically improve as training resources increase. Empirically, for large models, the cross-entropy loss follows

$$ \mathcal{L}(C) = \mathcal{L}_\infty + A\, C^{-\alpha}, $$

**where $C$ is the compute budget, $\alpha > 0$ is the scaling exponent, $A > 0$ a constant, and $\mathcal{L}_\infty$ the irreducible floor for current data and optimization. Define the capability potential as

$$ V(C) \equiv -\ln\!\big(\mathcal{L}(C) - \mathcal{L}_\infty\big), $$

which then gives

$$ V(C) = -\ln A + \alpha \ln C \\

$$

we thus obtain the compute scaling “rocket equation”

$$ \Delta V = \alpha \ln\!\frac{C_1}{C_0} $$