🏭

gstack Anatomy — Harness Engineering in Practice

The structure that let Garry Tan ship 600K lines with Claude Code in 60 days

gstack is a popular tool with 60K GitHub stars. But here we're not introducing the tool — we're dissecting which harness engineering patterns gstack implements.

1. Role-Based Routing

gstack's core is splitting one model into 23 roles. Type /review and a Staff Engineer activates. Type /qa and a QA Lead activates.

This is the "routing" orchestration pattern. The developer acts as router, manually invoking the right specialist for the situation. Each role has different system prompts, allowed tools, and output formats.

Same Claude model, but /ceo discusses product strategy while /cso scans OWASP vulnerabilities. The model didn't change — the harness did.

2. Pipeline Orchestration

/office-hours → /plan-ceo-review → /plan-eng-review → coding → /review → /qa → /ship

This sequence is gstack's sprint pipeline. Each stage's output feeds the next stage's input. The design doc from /office-hours is read by /plan-ceo-review. The test plan from /plan-eng-review is executed by /qa.

Textbook chaining pattern implementation. Different specialists (prompts) at each stage while information accumulates.

3. Guardrail Layer

/careful — enhanced confirmation before risky actions. /freeze — completely blocks modification of specific files. /guard — full safety mode.

These are execution-stage guardrails. Preventing the model from touching production code, deploying without tests, or creating security vulnerabilities.

4. Feedback Loops

/review auto-fixes bugs it finds. /qa opens a real browser, clicks through flows, finds errors, fixes code, and re-verifies.

"Execute → check results → fix → re-execute" loops are built into each skill. Not just generating code and stopping — agent loop pattern of seeing results and iterating.

5. Context Accumulation

/learn persists learnings across sessions. As project patterns, pitfalls, and preferences accumulate, the harness becomes project-specialized. /retro quantitatively measures weekly performance.

This is overcoming context window limits with external storage. Same principle as CLAUDE.md.

What's Actually Different

Someone tells Claude Code "fix this repo's bug." A gstack user verifies design with /plan-eng-review, catches production bugs with /review, runs browser tests with /qa, opens a PR with /ship.

Same model. The difference is the harness. 600K lines in 60 days proves that difference.

How It Works

1

Role-based routing — 23 slash commands to manually invoke the right specialist (same model, different prompts)

2

Pipeline chaining — office-hours → plan → build → review → qa → ship, each stage output feeds the next

3

Guardrail skills — /careful (enhanced confirmation), /freeze (file lock), /guard (safety mode) for execution protection

4

Feedback loops — /review and /qa have built-in bug detection → auto-fix → re-verify loops

5

Context accumulation — /learn persists cross-session learnings, /retro quantifies weekly performance

Use Cases

Solo founder development — Garry Tan produced 600K lines in 60 days alongside YC CEO duties (10-20K lines/day, 35% tests) Team standardization — commit to repo and entire team uses same harness, unified review/QA quality