Using Agents as Retrofit Solutions to Established Codebases

July 11, 2024

I think what a lot of people have intuitively figured out, but haven’t noticed explicitly, is that using AI for greenfield projects feels much more useful than using it in an established codebase. From what I’ve seen, there are two main reasons for this:

  1. Experienced engineers often work on systems that involves many different parts of systems. Current AI tools just aren’t built for this kind of task.
  2. AI models are trained on a broad range of data, which doesn’t always match up with the specific, deep knowledge that experienced devs have built up over years. New devs are brought up while experienced devs are weighed down.

I’m going to focus on that first point in this article, because I think it’s in part what’s allowing less experienced devs to see things that more experienced devs aren’t. AI models are getting pretty damn good, to the point where using Claude 3.5 rarely leaves me wanting more. AI tooling is the exact opposite.

Working on greenfield projects that have grown, I’ve started to run into problems: it’s becoming increasingly harder to give the AI enough context to get a good response. The changes I’m requesting are touching more parts of the codebase, and it’s tough to include all the relevant bits. For any given change to my web projects (like Django, for example), if I want a solution quickly I need:

  1. The relevant html
  2. Any blocks of other content I’m including
  3. Relevant CSS
  4. Relevant JS
  5. Sometimes an example of a similar feature implemented in another html, css, or js file, to maintain consistency
  6. The view
  7. Any relevant imports
  8. Similar views that may have implemented similar patterns to what I need to happen
  9. Any other functions that the view calls
  10. The URL structure
  11. Any schemas that might be relevant
  12. Database models
  13. Any library code that I’m using that has been updated after the AI’s training

And that’s not even counting things like repo structure, ownership, git diffs, or (for more complicated scenarios) call graphs. More relevant context means better AI output, but getting that context is a pain, and for best results it should all be in a single message.

I got fed up with this and made a Neovim shortcut to collect these snippets in a haphazard kind of way:

local snippet_buffer = {}

-- Helper function to get selected text
local function get_visual_selection()
  local start_pos = vim.fn.getpos "'<"
  local end_pos = vim.fn.getpos "'>"
  local start_line, end_line = start_pos[2], end_pos[2]
  local lines = vim.api.nvim_buf_get_lines(0, start_line - 1, end_line, false)
  return table.concat(lines, '\n')
end

-- Add snippet functions (normal and visual mode)
_G.add_snippet_normal = function()
  local code = vim.api.nvim_get_current_line()
  local current_file = vim.fn.expand '%:p'
  local snippet = {
    directory = current_file,
    stat = vim.fn.system('stat ' .. current_file):gsub('\n', ' '),
    code = code,
  }
  table.insert(snippet_buffer, snippet)
end

_G.add_snippet_visual = function()
  local code = get_visual_selection()
  local current_file = vim.fn.expand '%:p'
  local snippet = {
    directory = current_file,
    stat = vim.fn.system('stat ' .. current_file):gsub('\n', ' '),
    code = code,
  }
  table.insert(snippet_buffer, snippet)
end

-- Function to generate file tree
local function generate_file_tree()
  local root_dir = vim.fn.getcwd()
  local tree = {}

  for _, snippet in ipairs(snippet_buffer) do
    local path = vim.fn.fnamemodify(snippet.directory, ':~:.')
    local parts = vim.split(path, '/')
    local current = tree
    for i, part in ipairs(parts) do
      if i == #parts then
        current[part] = true -- Mark as file
      else
        current[part] = current[part] or {}
        current = current[part]
      end
    end
  end

  local function render_tree(node, prefix, is_last)
    local lines = {}
    local keys = vim.tbl_keys(node)
    table.sort(keys)

    for i, key in ipairs(keys) do
      local is_last_item = (i == #keys)
      local new_prefix = prefix .. (is_last and '    ' or '│   ')
      local line = prefix .. (is_last_item and '└── ' or '├── ') .. key

      table.insert(lines, line)

      if type(node[key]) == 'table' then
        local subtree = render_tree(node[key], new_prefix, is_last_item)
        vim.list_extend(lines, subtree)
      end
    end

    return lines
  end

  return render_tree(tree, '', true)
end

-- View snippets function
_G.view_snippets = function()
  if #snippet_buffer == 0 then
    print 'No snippets in buffer.'
    return
  end

  vim.cmd 'vnew'
  local buf = vim.api.nvim_get_current_buf()
  vim.api.nvim_buf_set_option(buf, 'buftype', 'nofile')
  vim.api.nvim_buf_set_option(buf, 'swapfile', false)
  vim.api.nvim_buf_set_option(buf, 'bufhidden', 'wipe')
  vim.api.nvim_buf_set_option(buf, 'modifiable', true)

  local content = {}
  local file_tree = generate_file_tree()

  -- Add file tree to content
  table.insert(content, 'File Tree:')
  table.insert(content, '../')
  vim.list_extend(content, file_tree)
  table.insert(content, string.rep('-', 40))
  table.insert(content, '')

  -- Add snippets to content
  for i, snippet in ipairs(snippet_buffer) do
    table.insert(content, 'Snippet ' .. i .. ':')
    table.insert(content, 'Directory: ' .. snippet.directory)
    table.insert(content, 'Stat: ' .. snippet.stat)
    table.insert(content, 'Code:')
    for _, line in ipairs(vim.split(snippet.code, '\n')) do
      table.insert(content, line)
    end
    table.insert(content, '')
    table.insert(content, string.rep('-', 40))
    table.insert(content, '')
  end

  vim.api.nvim_buf_set_lines(buf, 0, -1, false, content)
  print('Displaying ' .. #snippet_buffer .. ' snippets.')
end

-- Clear snippets function
_G.clear_snippets = function()
  local count = #snippet_buffer
  snippet_buffer = {}
  print('Snippet buffer cleared. Removed ' .. count .. ' snippets.')
end

-- Set up keymaps
vim.api.nvim_set_keymap('n', 'we', ':lua add_snippet_normal()', { noremap = true, silent = true })
vim.api.nvim_set_keymap('v', 'we', ':lua add_snippet_visual()', { noremap = true, silent = true })
vim.api.nvim_set_keymap('n', 'wr', ':lua view_snippets()', { noremap = true, silent = true })
vim.api.nvim_set_keymap('n', 'wq', ':lua clear_snippets()', { noremap = true, silent = true })

It grabs code snippets, file info, and generates a file structure at the top of a temporary buffer based on the files that snippets are grabbed from. It’s not perfect, but it helps get more context to the AI without spending ages adding all the metadata. Just by using this there has been a noticeable improvement in how often I am able to get zero-shot solutions out of Claude 3.5. At this point I am just doing a manual, informed RAG. I would like to automate this process, so to that end I ask “How can I automatically find all of the snippets that are relevant to the feature I am trying to implement?”

This is where things go more into the theoretical than actually implemented. In my head, the ideal workflow looks something like this:

  1. You type a prompt in some kind of popup input.
  2. The tooling takes your prompt and uses something like tree-sitter to parse your repo, holding the query up to each function/class/etc. and asks a simple question to am LLM of “Is this relevant to the question? 1 for Yes, 0 for No.”. Everything it finds relevant it puts into the context.
  3. A powerful model like Claude 3.5 comes up with solutions based on all that context.
  4. You and the AI go back and forth until the feature’s done or the bug’s squashed.

2 is obviously where the meat of this occurs. I would not use straight RAG here because there is a difference between intention and action. If I request “change the function that does bogosort to use Stalin sort” and the file that implements it is written in x86 assembly with no comments, RAG would not find it but there is a good chance that a direct comparison of the request to the code would. This still feels a bit flimsy, but I bet if in the comparison you also include the contents of all of the functions/classes that the current function you’re trying to assess interacts with (again, using something like tree-sitter or an LSP) you could get a better result. After all, more relevant context is what’s going to solve the query, it should also help along the way to build up the final context. It might also help to make a second pass at the final generated context to re-ask whether everything in it is relevant, in case anything was overlooked. If it’s a large codebase, maybe you could do an LLM pass on it to describe the function in 10 different ways, then save and vectorize those descriptions for use in traditional RAG. When a change is detected it could re-run the description generator/vectorizer to keep the vector store up to date.

I think that this type of tooling/workflow would go a long way to improve the dev experience in established repositories and would help to change the minds of a lot of experienced devs who have not found AI to be personally useful yet.