You Need the Box Before You Can Think Outside It

Same Person, Same Ideas, Worse Results

I have a swing trading analysis system. It detects stock contraction patterns, scores them through a neural network, and helps me decide what to trade. I originally built it with Gemini for discussion and GitHub Copilot for implementation.

Then I lost the project files. So I rebuilt it with Claude Code. Same person. Same trading philosophy. Same detection logic. Same neural network architecture. The only variable was the tool.

The new version was fundamentally wrong at every layer.

A jigsaw puzzle with mismatched pieces

Claude Code did everything I asked. I used its planning mode. I told it to research the domain. I described my philosophy in detail. It planned, built, tested, and told me everything looked good. Every layer was wrong.

Not wrong in a way that crashes. Wrong in a way that looks right. The contraction detection found patterns — just not the ones I meant. The labels categorised trades — just not the way I think about them. The backtest produced numbers — just not against the scenarios that matter.

The git history: 29 commits in 4 days. 12 were rewrites or fixes. I spent 95% of my time not building, but verifying — layer by layer, checking whether the output matched my intent.

When I found fragments of my old code and asked Claude Code to compare, the response was damning. Feature by feature, layer by layer, it walked through both versions — and for every single component, the old design was either more complete, more considered, or accounted for something the new version hadn’t even thought about.

The Missing Collision

The obvious reaction is “you should have discussed more.” I did. Claude Code has planning mode. I used it. I asked it to research the domain. None of that fixed the problem. Because the problem isn’t whether the tool plans. It’s what the tool thinks planning means.

Ideas colliding, not just being exchanged

With Gemini, the conversation was a collision. I’d propose a detection method. Gemini would analyse it. I’d push back — “what happens when a stock gaps down mid-contraction? What about low-float names that compress differently? What if the sector is rotating out?” — and propose solutions for each. Gemini would stress-test my solutions, find holes I hadn’t seen. I’d refine, re-propose, argue back. Sometimes for hours on a single component, until neither of us could break it.

That collision forced me to externalise intuition I didn’t even know was implicit. Each argument transferred another piece of the puzzle from my head into the shared understanding. By the time I started coding, the AI had enough of my pieces to build my picture.

Claude Code’s planning is different. It listens, organises, confirms, executes. Excellent at turning clear requirements into working code. But it doesn’t challenge the assumptions behind the requirements. It trusts what you said, plans around it, and builds. The gap: understanding what you said versus understanding what you meant.

For a login page, what you say and what you mean are the same thing. For a trading system — where “correct” means it reflects 15 years of intuition about market behaviour — the gap is everything.

The Confidence Trap

There’s a deeper problem. The tool validates its own work. It tests, analyses the output, and tells you it’s correct. When the backtest shows a 65% win rate, it says “results look reasonable.”

That confidence costs you. You trust the output, move forward, invest time — and discover the problem only when you verify manually, layer by layer. With my old workflow, verification was minimal. The design collision was so thorough that the implementation naturally reflected my intent. I didn’t need to check because the thinking had already been done.

I don’t care if the trading system makes money — that depends on markets and execution. I care about fidelity: does the output match what I envisioned? A tool that builds the wrong thing perfectly is worse than a tool that builds the right thing roughly.

Implement first vs Think first — the workflow that matters

The Romance of Creation

People love saying “think outside the box.” But they forget something: you need a box first.

The box is your understanding. Your domain knowledge. Your years of watching what works and what doesn’t. Without the box, there’s no “outside” — there’s just randomness. Some randomness is good — unexpected connections and happy accidents are gifts of working with AI. But when everything is random, when the output has no anchor to your understanding, it stops being creation. It becomes noise that happens to compile.

The thinking before the building

Nikola Tesla understood this. In 1919 he wrote:

“When I get an idea, I start at once building it up in my imagination. I change the construction, make improvements and operate the device in my mind. It is absolutely immaterial to me whether I run my turbine in thought or test it in my shop.”

He would construct, test, and perfect an invention entirely in his mind — only building the physical machine once the mental blueprint was flawless. Twenty years without exception.

That’s the part AI coding tools skip. They jump straight to the machine. And the machine runs — but it’s not your machine. It’s built from the pieces the AI had, not the pieces you carry in your head.

The blueprint isn’t the description you give the AI. It’s the complete mental model — tested, pressure-tested, refined through collision — that lives in your head before you describe anything. And building that model takes the slow, messy, uncomfortable work of deep thinking that no planning mode can automate.

The romance of creation isn’t in watching the machine run. It’s in the journey of thought that made the machine inevitable.

Sometimes the most productive thing you can do with an AI coding tool is refuse to write code — and take that journey first.