Mistakes in AI Mistake Management
That regex was so stupid - my cousin's email was misclassified!!
No one ever says that - Regular expressions - yucky to many, people would hate to write it, hate to read it even more, but you would never call a regex stupid. I hear all the time though - Claude was so stupid. It couldn't even do that! Doesn't it know? How could it?
We anthropomorphize our coding agents so much, its convenient to blame them, to be upset with them, complain about them.
You wont get good at managing agents until you can manage their mistakes. Here's the structured approach that I take, that helps my agents get smarter faster, and keeps me less annoyed though not completely in eqanimity
Step 1: Manage Ownership
The agents/models are smart and capable. Any mistake, means you failed the agent, not the other way around. The prompt, the context, the code, etc, something under your control failed. Your job is to own the failure, find out what it is, and fix it.
Example: I was having a hard time - I did a simple git checkout master prompt, and Claude assumed I was already on master and didn't do it. It was very frustrating, and unclear how I could it to stop assuming I'm already on master. I complained and complained. Finally I realize - I wanted a lot more than just git checkout master which is why I didn't do !git checkout master - I wanted it to try master/main, wanted it to pull after checking out, I wanted it to see if there was any trash lying around, etc.
Step 2: Manage Misinformation
Agents dont only hallucinate, they also confident relay bad data. Documentation get stale - things that used to not work are now working - skills reference expired information - misspellings, wrong acronyms, etc.
Asking a simple - where did you get that information from? Can sometimes really help.
With misinformation, there is also figuring out which is true - that can be hard for an agent. The Work Record says one thing, Slack said another, and the code says another - Being explicit about a truth hierarchy can help as well.
Example: An agent read from an old branch, ran the wrong code, and got the wrong answers. When I asked - why did you check out that brach - it pointed to the skill file - apparently I told it to use that branch, until its merged to master. It was already merged (and stale), but the skill was never updated.
Step 2: Manage Misalignment
Agents attempt to do their best (possibly with a secondary goal of minimizing cost). But the best is unique to every project, and even within a project, things change (E.g. need fast at the start, need more rigor near the end)
Ask the following - why was that decision made? What hinted you that this was the best way? In what ways, is X optimal?
This gives you an opportunity to see what it optimizes for (other than user annoyance) - It could be trying to make the code readable, instead of fast. It could be trying to be resilient, instead of managing cost.
Update CLAUDE.md when there are disagreements on the goal.
Example: Claude asked me if I was ready to run a SQL query. What does ready even mean here? Am I wearing pants? Do I have enough battery? Anyways, after multiple attempts at asking why - it confessed - it wanted to be polite, provide conversational filler, and do checkins. Learning that, those three items are now banned.
Step 3: Manage Expectations
What is good code? A good dashboard? A good design doc? A good slack post?
Without setting expectations, mistakes are bound to happen because the agent doesn't know what good looks like.
Create validation tasks. I have /validate-pr, /validate-notebook, /validate-dashboard, and more. Get the agent to run them. Get the agent to fix issues, until validation is clean.
Whenever something is missed, add to the validate task. It grows and learns. Every time generate code, get a human reviewer comment - I save that comment and ask agents to asses it for their code. The learning speed is incredible here. You could also try that with bugs, incidences, bad customer feedback and more - I'm not yet there. Build a repository of mistakes to learn from.
Step 4: Manage Guardrails
By the time you get here, either the AI has actually failed. This is saying, I no longer trust CLAUDE.md, skills, context, prompts.
I'm personally resisting this.
I have had countless issues with open the production10 notebook and it hallucinates a URL. Its in TOOL.md. Read TOOLS.md. Its in the CLAUDE.md to read TOOLS.md. Its in the global CLAUDE.md to read TOOLS.md. Its in the memory to read TOOLS.md.
I think the task is "too simple" so following instructions is never triggered.
There are hooks and stuff I could add - it was recommended by claude that I do. I think I would try making a skill before I do that /open-tool or something.
Conclusion
Every time a session ends, everything disappears with it. Unless a feedback loop is established, this basically stateless agent, will make the same mistakes again and again. In short
- Take full responsibility over agent mistakes
- Figure out what and agent knows and from here
- Figure out agent decision making process
- Make what great looks like clear
- Take autonomy away when it's no longer worth it
The more structured and proactive you can be to address it, the faster it learns, and the saner you stay.
Stay sane out there,
Ibrahim