Let's automate our jobs

Thanks to the coding agents like Claude Code, programming is now over. It’s more efficient to prompt an AI model to write code than it is to write it by hand. However, programming is just one of the many tasks that a software engineer has to take care of, albeit a central one. What about the other tasks? Can they be handled by an AI agent?

The software delivery loop

When you work in a company that produces software, the software delivery works something like this:

You start with some business problem to solve. You refine that into technical requirements for building the software that can provide the solution.
You build some code that matches the requirements and you test it.
You deliver the software to your users, often by deploying to production or by pushing an update to an app store.
You gather data on user behavior, customer feedback, business results and technical issues like bugs, production incidents, and performance problems. This gives you some new business requirements. You go back to step 1.

Claude Code can do a lot for step 2. It cannot yet take care of it completely – right now, there’s a lot of writing prompts and reviewing results involved. As the models improve, I expect them to be able handle more and more complex coding tasks. Let’s look at what else can be automated.

Getting the agents to complete the tasks autonomously. You shouldn’t be prompting claude manually. It should look at your GitHub issues or Jira backlog and start the work independently. Maybe you will review the PR it create, once you’ve had your agents review it first of course.

You already can do something like this by assigning tasks to GitHub Copilot or by mentioning @claude on GitHub. But why isn’t Copilot or Claude deciding by itself what to work on?

OpenClaw is an attempt to make the agents decide by themselves. It has already led to interesting results– like an agent publishing a hit piece on a human open-source author.

Structuring big programs, architecture, and projects. The agents are not that great yet at designing big programs. Right now, fairly fine-grained tasks work the best. Maybe better models can solve this – or maybe having an hierarchy of models where your Software Architect model does the high-level design and creates tasks for the Coder models?

Steve Yegge’s Gas Town was a briefly-infamous attempt at building a hierarchy of agents that can complete tasks at autonomously. It was silly, but I think Yegge was on the right track and we’re going to see other takes on the same ideas. Claude Code already comes with subagents.

Running agents safely. If you’re running claude on your laptop, you’re giving it a lot of ambient authority. Are you logged into your company’s AWS production account? Great, so is Claude.

On one hand, this enables the agents do things and get the job done. On the other hand, the agents make mistakes all the time - just yesterday someone used OpenClaw to accidentally delete their inbox.

There are all kinds of attempts to build a sandbox (including the one in Claude Code itself), but the best practices have yet to emerge.

Setting up verification. The coding agents work the best when they can programmatically verify that what they produced is what you want. They’re good at fixing compiler errors and test failures. A big question for software engineers in near future is how to best provide the agents with these guardrails. How to go from the business requirements to verifiable technical requirements?

And, given how the agents are lazy in interesting ways, how to check that they’re not cheating with the verification?

Operations and monitoring. There’s a whole bunch of work in keeping the lights on for software that runs in production. There are alerts, performance degradations, and runtime exceptions. Traditionally a software engineer or a site reliability engineer triages the issues and devises solutions. Could an agent do the triage instead? Can they debug the issues or connect them back in to the software delivery loop?

For example, if you have a data pipeline and something changes in the input data and now your ETL Python script complains about NoneTypes, can you make an agent fix it without being involved yourself?

Refining the business requirements into technical requirements. If you start from scratch, the coding agents are actually not too bad at this. They can do a lot with a simple prompt.

However, if you work in a big organization, there’s a lot of context that the agents are missing. They don’t know how to fit the new software into your architecture and tech roadmap. How are you going to give this information?

Closing the loop for user and customer feedback. Your users and your customers have things to say about your software. You probably also have metrics that show what they’re doing. You may want to drive those metrics. How could agents do it?

I’ve focused on things that could be attempted with the current AI models. Even if the models were frozen today, they’re so good that we could probably automate a lot more of software engineering work than we have already done.

The models are getting better all the time, though. For now, it’s an interesting opportunity to invent new ways of working. What it means for software engineering in the long run, that I don’t know.