⚖️

Harness Engineering vs Prompt Engineering

Writing good prompts and building good execution environments are different things

In 2023-2024, how well you craft prompts was the key to AI utilization. In 2025-2026, that's shifting.

Limits of Prompt Engineering

No matter how good your prompt is, if the model can't read files, it can't do code review. If it can't run tests, it can't verify correctness. If it can't see error messages and retry, it has to get it right on the first try.

Prompts are "instructions." Harness is "capability." Perfect instructions mean nothing without capability.

What Harness Engineering Determines

Tool access — Can the model access file systems, browsers, APIs, databases?

Feedback loops — Can it see execution results and retry? Do error messages reach the model?

Context management — During long tasks, what information is kept vs discarded?

Guardrails — Are there mechanisms to preemptively block dangerous actions?

Orchestration — In what order and under what conditions are multiple model calls executed?

What SWE-bench Showed

The same model scoring 2-3x differently depending on the harness is well-documented on SWE-bench. Devin, Claude Code, Cursor, SWE-agent all use the same base models but perform differently. The difference is the harness.

This isn't saying prompt engineering is useless. Prompts are one component of the harness — not the whole thing.

How It Works

Prompt Engineering — optimizing instructions to the model (system prompt, few-shot, chain-of-thought)

Harness Engineering — designing the entire execution environment (tools, feedback loops, context, guardrails)

Prompts are one layer of the harness — prompts alone cannot read files, run tests, or retry on errors

SWE-bench proves it — same model, different harness, 2-3x performance difference

Use Cases

Claude Code vs ChatGPT web — same model family but completely different harness RAG pipelines — retrieval quality, chunking strategy, and reranking matter more than prompts

References

🔗 Anthropic — Building Effective Agents

🔧

What is a Harness?

From test runners to AI agents — the system that wraps and controls a target

→

← 🐝 eBPF Traffic Redirect — Swapping Destinations at the Kernel Level 🛡️ Guardrails Design — Stopping Models Before They Go Wrong →