AI Compounds

I recently made a claim on Threads that AI tools don’t just add value linearly - they compound, like interest. Each new AI tool multiplies the effectiveness of your existing ones. I was asked (quite reasonably) if I could clearly quantify this claim. I’m not going to promise perfection, but this is going to be a non-exhaustive list of the ways AI has sped up my work.

To start off, I’d like to provide some context. I am the technical cofounder with a background in web, infra, and AI/ML in a 2 person startup called Blitz AI. It’s a social media analytics tool for marketers and founders. I’ve had the privilege of working in this industry for a few years and have seen first hand how companies with many employees can atrophy. More people means more communication overhead. It means more coordination is required to move the company in the same direction. It means plausible deniability in the same vein as the Bystander Effect. This is very very bad, to the extent that I think few people realize.

As businesses, we’ve put up with these inefficiencies because we needed the pools of intelligence that those people have to offer. The bet is that any given person will bring more value to the company than they take away by simply being another person to coordinate with. With software especially this paradigm is showing cracks thanks to AI.

This is, I think, the core reason why I’m so obsessed with AI. It allows me to punch above my weight class. I can get the pools of intelligence without needing nearly the same amount of communication overhead (if any at all). As such, every single action I take with regards to building is done through the lens of “How can I turn this into something AI can do for me?” Once I started viewing every part of my job through that lens, it became clear that the more I made my workflow “AI compatible”, the more capable I became.

Let’s start with the basics.

1. Switching to Claude Sonnet 3.5 from GPT-4o

Time saved: 5 hours/month

Switching to Sonnet 3.5 from GPT-4o might sound minor, but I would guess that it saves me 5ish hours per month on getting me the right code faster, and saves me quite a lot on my focus. I don’t need to fight Claude. I don’t need to re-specify to give me the entire code. Claude listens to my system prompt with much better adherance than any GPT model ever has. I’ve tried out o1-preview. It’s fine. Comes pretty close to Sonnet 3.5 in terms of performance, but the rate limits and the time delay make it not worth it as a main driver.

2. Learning how to write a proper system prompt

Time saved: 15 minutes for every conversation. I have thousands at this point across providers, so probably 250 hours conservatively

Every modern LLM has the concept of a system prompt. It’s the thing you don’t see before every single conversation that tells the LLM how to behave. This can affect the output in subtle or significant ways, depending on how detailed you are.

And that’s another thing! Putting detailed examples in your system prompt is extremely helpful and underpins every conversation you have with the LLM. So for example, if you’re writing python and you want all of your code to have the correct type hinting and properly formatted docstrings, putting detailed examples (and maybe a little extra emphasis using stern language) into your system prompt influences the AI to act in that specific way. If you do a lot of repetitive work in a domain (such as lots of programming), modify the system prompt.

3. ctrl+a ctrl+c ctrl+p

Time saved: 100s of hours, or maybe dozens since I would have given up on so many one-off side quests way sooner

It might shock you to know this, but not everyone is aware of those hotkeys. Find a tutorial that does something cool and you want it? Just go to the web page, highlight the entire thing with ctrl+a, copy it with ctrl+c, and paste it into the LLM with ctrl+p. Any online text tutorial is now the perfect context that an LLM needs to generate the specific tool that you need or want.

I’ve done this dozens of times at this point. One of the most recent is my mov-http-server project. It’s simple – it’s an HTTP server that compiles to only mov instructions. I created it by combining this repository with this article. That’s right, you remember that old flash game Doodle God? Well now we can do that in real life for software.

I’ve used this for so many parts of Blitz. Setting up infra, but it’s a brand new serverless company that the best LLMs have no knowledge of? Well here ya go, take their entire documentation as context and build out my infra for me. Need to implement this specific set of techniques mentioned in some unknown blog to optimize the inference pipeline? Here’s all their techniques, go for it!

Oh, and because of my system prompt it’s giving me the infra code exactly how I like it so that it looks and functions that same as the rest of my code base.

4. uithub

Time saved: 15-30ish minutes for every message, so maybe 10ish hours?

I would be remiss if I didn’t give a shoutout to https://uithub.com. No that is not a typo. Change any “github” url to “uithub” to get a plaintext version of the repository with a tree of all the files and code with line numbers. Now you can just click “copy” and past an entire code base into an LLM. Very useful for focused changes that have bits and pieces throughout the entire repository. This lets me one-shot features in many cases as long as the thing I’m requesting is focused enough in scope. I only discovered this recently so the absolute scale of time saved is not as much compared to others.

5. Neovim hotkey to surgically select parts of my codebase

Time saved: 30 minutes before I prompt, 1.5 hours re-prompting and fixing code to fix the fact that I didn’t give the LLM all the relevant context in the first place, 2 hours total per prompt, ~1000 hours saved since implementation.

I use Neovim (btw), and one day I found that it was becoming a real PITA to collect all of the accurate context I needed for the LLM to generate an accurate solution, so I wrote a script that lets me highlight a section and add it to a temporary buffer. In just a few short keystrokes, I can give an LLM all the context it needs and nothing it doesn’t. This like a much more surgical version of uithub that is locally hosted.

6. Neovim hotkey to author my commit messages

Time saved: 10 minutes every time I commit (if only for the sheer level of detail of the commit message that the model provides), been doing it for probably 200 commits, so I’m probably at about 30 hours saved

I want high quality commit messages, so when I’m ready to commit, I send the diff off to Claude and have him write a commit message. I have a hotkey set up to do this for me, and in just a couple seconds I get a commit message formatted almost exactly how I want of everything that was updated. I say almost because I think my prompt needs some tweaking, but it’s still pretty darn good.

7. telling claude to write out the entire file

Time saved: Considering the debugging I’d have to do, dozens of hours easy

AI is supposed to make things faster. If it’s pulling crap like:

def main():
    # ...your other code here...
    print(hello)
    # ...some other code...
    print(world)

Now I need to spend time sifting through my code to figure out exactly where it intended for me to place that code. uh uh. Not gonna happen. At one point I would often tell the AI to write out the entire file so I could just copy-paste it. I even had it extremely emphasized in my system prompt.

8. Telling claude to write out the diff

Time saved: TBD

“At one point I would often tell”

There’s a reason why this is in the past tense. It doesn’t scale to extremely long files, it makes the proceeding contexts larger, iteration cycles are very long, and (if using the API) is extremely expensive. So now I’m trying to get the AI to write out valid diffs. It’s been…challenging (1, 2). But if I can get this right, I will be able to create something that’s a cross between Cursor and Devin. Imagine running a command in your terminal and you can just go through this loop:

"make the changes" -> API call to Claude with a really specific system prompt -> parse out structured diff -> fix diff -> apply diff

9. Transcriber tool for Blitz

Time saved: -20 hours, or 20, depending on your POV

We have customer calls for Blitz. We want to be present in those meetings, but we also want to have highly detailed notes afterwards. We haven’t even achieved ramen profitability yet, so paying hundreds of dollars per month for an auto notetaker does not sound appealing to us at all.

Our solution? Setting up a locally hosted transcriber and diarizer (i.e. can identify speakers), give that to Claude with a template and a high quality system prompt, and tell him to fill it out. I haven’t made back the time I invested into it yet (~3 days, most of which was fighting CUDA installations), but if I had tried to set this up by hand I simply wouldn’t have. Without AI, this would have been such an in depth side quest that it wouldn’t have been worth doing, and the result would be less customer attentiveness in our meetings and worse notes.

In the small chance that I would have seen this as a worthy endeavor without LLMs, it would have taken me probably a week to figure out all the CUDA BS.

10. Setting up infrastructure

Time saved: dozens of hours

In this, I’m including our serverless AI architecture, github workflow, and setting up the server on DigitalOcean. We’re using a provider for the serverless AI that’s pretty new and wouldn’t likely be included in any training data (shoutout to RunPod, it’s a good service). Optimizing GitHub workflows was really helpful in the early days and continues to pay dividends. I do not have to worry about pushing buggy code to prod, nor do I need to worry about SSHing into the prod server, pulling the update, and restarting. All of it is handled by CICD.

11. Blitz Compute

Time saved: Months

I place a strong emphasis on speed throughout Blitz. The AI inference that backs Blitz is no exception. The research and development alone would take hundreds of hours, nevermind the actual implementation of it. By combining several of the strategies already mentioned, and many that the AI was able to find from reading the documentation, I am able to squeeze every bit of performance out of it.

12. Blitz Website

Time saved: Months

Needless to say, I use AI to build Blitz. Like, a lot. Using these strategies I can one-shot so many issues. My normal workflow at the moment (which will change) is to find some focused thing to improve, use uithub to get the files into an AI ready format, have Claude solve it, then use Avante to implement it.

I am also not using any frontend framework for Blitz. It’s maybe a little too early to say, but since because I’m not reliant on abstractions meant for humans (React, Next, etc.), I can code directly in Django Templates/HTML/CSS/JS. My brain is the code and Claude is the compiler. This lets me create interactive graphs, or create mobile friendly versions of the site, or rearrange layouts in their entirety with almost no effort.

12. Writing tests

Time saved: 1 month

There is a company called Antithesis. The short version is that they are taking bug hunting to a whole new level by performing a chaos based tree search of all possible ways your system can fail. I took some inspiration from their enthusiasm for proper testing and have been using AI to generate my tests. In 2.5 days I wrote nearly 100 tests with 100% test coverage for Blitz.

What this means in practice is that I can be as wrecklessly aggressive with generating code for Blitz as possible, without worrying about backsliding on functionality. When a test fails, I have Claude take a look at the full test output, the tests that failed, the diff since the last commit, and the actual function that was being tested, and the model is able to determine whether there was an issue with the code or if a test needs to be updated with some new functionality.

I plan to take this two steps further, incorporating both chaotic data input for my tests and mutations to the code it’s testing. At least on the unit test scale I expect this to make my code much more resistant to AI’s misplacing a line here or there.

OK, so what?

Well if you take all of these approaches at once and combine them…

You get a person who can code consistently faster, ship more efficiently, and compete with larger teams. The foundation is a strong AI base configured to match your workflow through system prompts. Knowledge flows in from complete sources rather than fragments - full tutorials and entire codebases. Tools multiply each other’s power - system prompts enhance tutorial code, precise context improves AI outputs, tests verify everything automatically. Previously impractical side projects become feasible, like building a custom transcription system in days instead of weeks. From quick prototypes to production systems, each piece of the toolchain makes every other piece more effective. The result isn’t just automation - it’s compound growth where each new AI tool multiplies your development speed, solution quality, and capabilities.

Remove any one of these and the effects on productivity are palpable. I could do most of these tasks by hand, but they’d be “just good enough”. I’m not an expert in everything, and startups always have urgent work waiting. AI helps me do better. When I spot errors in AI’s work, I can guide it to improve the output beyond what I could do alone. In some cases, that means building deterministic solutions to verify consistency (unit tests, CICD).

At a high level, these are my takeaways from the last few months of increasingly learning how to use LLMs better:

If you are bad at giving instructions in real life, then you will be bad at using LLMs. I think there’s more to say here about theory of mind, but that’s a discussion for another time.
Software is now like Doodle God, in that tutorials are elements to be combined to form completely new things - wholely bespoke to your needs.
Signal is extremely important when communicating with them. Sometimes the shotgun approach of just dumping your entire repo into Claude works, but it’s often the case that intentionally selecting for relevant code will yield higher quality results
Modern AI systems can only effectively tackle 1 focused goal at a time (yes, even o1 and Sonnet 3.5, much to my chagrin)
My prior knowledge of what is possible is carrying a lot of weight here. I know roughly what’s possible within my domain and am using that as leverage when using AI to build these systems.
Success with AI tools comes from changing your workflow, not just adding tools.