gstack Anatomy โ Harness Engineering in Practice
The structure that let Garry Tan ship 600K lines with Claude Code in 60 days
gstack is a popular tool with 60K GitHub stars. But here we're not introducing the tool โ we're dissecting which harness engineering patterns gstack implements.
1. Role-Based Routing
gstack's core is splitting one model into 23 roles. Type /review and a Staff Engineer activates. Type /qa and a QA Lead activates.
This is the "routing" orchestration pattern. The developer acts as router, manually invoking the right specialist for the situation. Each role has different system prompts, allowed tools, and output formats.
Same Claude model, but /ceo discusses product strategy while /cso scans OWASP vulnerabilities. The model didn't change โ the harness did.
2. Pipeline Orchestration
/office-hours โ /plan-ceo-review โ /plan-eng-review โ coding โ /review โ /qa โ /ship
This sequence is gstack's sprint pipeline. Each stage's output feeds the next stage's input. The design doc from /office-hours is read by /plan-ceo-review. The test plan from /plan-eng-review is executed by /qa.
Textbook chaining pattern implementation. Different specialists (prompts) at each stage while information accumulates.
3. Guardrail Layer
/careful โ enhanced confirmation before risky actions. /freeze โ completely blocks modification of specific files. /guard โ full safety mode.
These are execution-stage guardrails. Preventing the model from touching production code, deploying without tests, or creating security vulnerabilities.
4. Feedback Loops
/review auto-fixes bugs it finds. /qa opens a real browser, clicks through flows, finds errors, fixes code, and re-verifies.
"Execute โ check results โ fix โ re-execute" loops are built into each skill. Not just generating code and stopping โ agent loop pattern of seeing results and iterating.
5. Context Accumulation
/learn persists learnings across sessions. As project patterns, pitfalls, and preferences accumulate, the harness becomes project-specialized. /retro quantitatively measures weekly performance.
This is overcoming context window limits with external storage. Same principle as CLAUDE.md.
What's Actually Different
Someone tells Claude Code "fix this repo's bug." A gstack user verifies design with /plan-eng-review, catches production bugs with /review, runs browser tests with /qa, opens a PR with /ship.
Same model. The difference is the harness. 600K lines in 60 days proves that difference.
How It Works
Role-based routing โ 23 slash commands to manually invoke the right specialist (same model, different prompts)
Pipeline chaining โ office-hours โ plan โ build โ review โ qa โ ship, each stage output feeds the next
Guardrail skills โ /careful (enhanced confirmation), /freeze (file lock), /guard (safety mode) for execution protection
Feedback loops โ /review and /qa have built-in bug detection โ auto-fix โ re-verify loops
Context accumulation โ /learn persists cross-session learnings, /retro quantifies weekly performance