Make the AI Show Its Receipts

One of the easiest mistakes to make with AI-assisted programming is also one of the most expensive.

You ask the AI to fix something.

It sounds confident.

It names plausible files.

It explains a reasonable approach.

It writes code that looks like it might belong.

And then the fix doesn’t work.

Or worse, it works in the one place you were looking at, while quietly damaging the architecture around it.

That doesn’t always happen because the AI can’t do the work.

Sometimes the problem is simpler than that.

The AI stopped reading and started guessing.

That’s a dangerous failure mode because it doesn’t look like ignorance. It looks like confidence.

The AI usually won’t say, “I haven’t inspected the actual implementation.” It won’t say, “I’m reasoning from file names and previous summaries.” It won’t say, “I’m about to patch this like it’s a small isolated problem, even though it’s part of a larger application.”

It just starts coding.

That’s where you need to stop it.

Plausible code isn’t the same as correct code

In a small throwaway project, plausible code may be good enough.

If the whole application is one screen, one component, and one data flow, AI can often infer enough from the immediate context to produce something useful.

Real projects are different.

A real project has history.

It has boundaries.

It has patterns that aren’t obvious from a file name.

It has places where code is supposed to live and places where code should never be placed.

It may have a shared shell, reusable surfaces, theme rules, layout conventions, storage rules, import and export decisions, and carefully separated product layers.

In that kind of project, the question isn’t just this:

Can the AI write code that performs this action?

The better question is this:

Can the AI make this change in the way this application is supposed to be changed?

That’s a much harder question.

And the AI can’t answer it honestly unless it has inspected the actual code.

Make inspection a separate step

A good way to avoid this problem is to separate the work into two phases.

First, make the AI inspect the project.

Then, and only then, let it implement.

Don’t start with this:

Fix this bug.

Start with this:

Inspect the relevant files and show me the existing pattern this fix needs to follow.

That small change matters.

It forces the AI to stop treating the codebase like a blank canvas.

It has to look at what’s already there.

That doesn’t mean you need a long ceremony for every tiny change. If you’re changing a button label, you probably don’t need a research project.

But when the task touches architecture, shared components, data flow, layout, file storage, editor behavior, or anything reusable, make the AI show its receipts first.

What receipts look like

The AI should be able to tell you a few specific things before it changes code.

It should be able to say which files it inspected.

It should be able to explain what those files control.

It should be able to name the existing pattern it’s going to follow.

It should be able to say which boundaries it must not cross.

And it should be able to describe the smallest safe change.

That isn’t busywork.

That’s the programming equivalent of asking, “Did you actually open the code before you changed it?”

A useful prompt looks like this:

Before changing code, inspect the actual repository files involved in this feature.

Don’t rely on previous summaries, file names, assumptions, or remembered architecture.

First report back with:

  1. The exact files you inspected.
  2. What each file controls.
  3. The existing pattern this change must follow.
  4. Any shell, framework, storage, editor, or product boundaries that must not be changed.
  5. The smallest safe implementation plan.

Don’t write or patch code until that inspection is complete.

The point isn’t to make the AI write an essay.

The point is to make it prove it’s working from the actual system instead of from its own assumptions about the system.

Tell it what not to do

In a real codebase, sometimes the most important instruction isn’t what to do.

It’s what not to do.

For example:

Architectural constraints for this task:

  • Don’t redesign the shared shell.
  • Don’t create one-off layout surfaces.
  • Don’t bypass the existing rail or panel system.
  • Don’t introduce custom CSS unless there’s no existing theme mechanism.
  • Don’t move product-specific behavior into reusable framework code.
  • Don’t change shared framework behavior unless that’s explicitly part of the task.
  • Follow the existing application patterns already present in the repo.

Those constraints matter because AI is often very good at solving the local problem in the wrong place.

It may make the broken screen work by creating an exception.

It may bypass the theme because custom CSS is faster.

It may fix one product by contaminating the shared layer.

It may make a new component instead of using the existing surface.

From the AI’s point of view, it solved the problem.

From the project’s point of view, it introduced drift.

That’s one of the sneaky problems with AI-generated code. It can be locally useful and globally wrong.

Watch for guessing language

You can often tell when the AI is slipping into guesswork.

The warning signs are usually there.

It says things like:

This is probably handled in...

Or:

The likely structure is...

Or:

We can simply add...

Or:

It appears that...

Those phrases aren’t always a problem. Sometimes they’re harmless.

But in a real codebase, they should make you pause.

If the AI is about to change code based on “probably,” stop it and make it inspect the file.

A useful interruption is:

Stop. You’re guessing from summaries.

Open the actual files involved in this behavior.

Identify the existing pattern.

Explain the smallest safe change before editing anything else.

That kind of interruption can save hours.

It can also keep the AI from stacking bad patches on top of bad assumptions.

Don’t reward looping

Another warning sign is when the AI keeps changing the solution without becoming more grounded.

The first patch doesn’t work.

Then the second patch doesn’t work.

Then the third patch changes a different part of the system.

Then the fourth patch starts touching files that shouldn’t be involved at all.

That usually isn’t progress.

That’s thrashing.

When that happens, don’t ask for another patch.

Ask for an audit.

Stop implementing.

Review the last changes and compare them against the actual architecture.

Identify which assumptions were wrong.

Identify which files were changed unnecessarily.

Identify what existing pattern should have been followed.

Don’t patch again until the cause of the failure is clear.

The AI will often keep trying to solve the problem by doing more of the same thing.

Your job is to change the mode of work.

When implementation isn’t working, go back to inspection.

The bigger the architecture, the stronger the rule

The more architectural the task, the more evidence the AI should provide before it touches code.

A small visual tweak may only need a quick look.

A change to a shared surface, storage engine, editor, import pipeline, project shell, or reusable framework needs more discipline.

That’s where the AI’s confidence can become dangerous.

The model may know React.

It may know Tauri.

It may know Mantine.

It may know SQLite.

It may know Markdown editors.

But it doesn’t automatically know how your application uses those things.

That knowledge lives in your repo.

It lives in the patterns, boundaries, and decisions already made.

The AI has to read those before it can safely change them.

A better workflow

For serious code work, a safer AI workflow looks something like this.

State the task narrowly.

State the architectural boundaries.

Require file inspection.

Require the AI to name the existing pattern it will follow.

Review the proposed change before implementation.

Let it make the smallest safe patch.

Test it.

And if the test fails, go back to inspection before patching again.

That may sound slower than saying, “Fix it.”

In practice, it’s usually faster.

The slowest AI session isn’t the one where you spend five minutes making sure the AI understands the code.

The slowest session is the one where the AI spends two hours confidently patching the wrong layer.

The rule I use

Here’s the practical rule I use:

The AI doesn’t get to change architecture it hasn’t inspected.

That one sentence catches a lot of trouble.

It doesn’t mean the AI is useless.

It means the AI needs to be managed like a very fast assistant that sometimes confuses confidence with understanding.

When it reads the code, follows the existing pattern, and stays inside the architectural boundaries, it can be extremely useful.

When it guesses, it can get expensive.

So make it show its receipts.

Then let it code.

-- Charles

Related field note

For the lived version of this lesson, see The Day the AI Had to Read the Code.