Speed Kills
I don’t know if this is a full thought, but I feel like the adandonment of execution speed in favor of shipping speed has lead people to not realize what they are missing out on with speed.
There is a quote that’s been making the rounds recently along the lines of “Quantity has a quality all its own”. If you have an airplane that doesn’t go fast enough, it doesn’t fly. If you don’t add enough force to the crankshaft of an engine to overcome static friction, it won’t drive. If you don’t put in the hours to become a world expert on something, it doesn’t matter how naturally gifted you are, you will not become that expert. If you don’t have enough data for an AI model to generalize, it won’t.
So why does that mentality stop with speed? Why is it that, when we KNOW that smaller models can wipe the floor with bigger models as long as they go through iterative reasoning steps first before providing an answer, we are just completely fucking ignoring it?
If we make the models fast enough, we can get away with making smaller, dumber models. I don’t think we can get away with using smaller datasets, since it seems like the dataset continually plays a bigger part in determining the “intelligence” of a model. Post-training compute will also become more and more important.
LLM APIs of the future should include multi-step reasoning behind the scenes to provide for the best answer possible. Agents as an API. This can only be reasonably achieved if we focus on speed, otherwise the calls will be too slow.