gstack vs Compound Engineering: two Claude Code plugins, opposite philosophies
A new Claude Code plugin drops every day. Most of them I ignore. But two have changed how I actually work: gstack by Garry Tan (40K+ GitHub stars) and Compound Engineering by Every.to (11K+ stars).
We build an AI agent platform, so we spend a lot of time watching how developers structure their work with AI. We dug through both GitHub repos, read the creator interviews, followed the community debates, and tried both plugins on real projects. Here is what we found.
Why these two are different
Most Claude Code plugins do one thing well – a git shortcut, a smarter linter, a testing helper. gstack and Compound Engineering are not that. They are complete methodologies for working with AI coding agents. They both try to answer the same question: “Claude Code is powerful, but how do I use it without things going sideways?”
They answer it in opposite ways.
gstack treats AI like a virtual engineering team with specialized roles. Compound Engineering builds a self-improving system where every feature teaches the AI something new about your codebase.
Both work. The question is which fits your situation, and whether running them together makes sense.

gstack: the virtual engineering team
Who built it
Garry Tan is the President and CEO of Y Combinator. Stanford CS grad, started programming at 14, was the 10th employee at Palantir, co-founded Posterous (sold to Twitter for $20M), runs Initialized Capital ($700M+ AUM). The relevant part: he is shipping production code daily while running YC. This is not a side project from someone who talks about AI development. He uses it every day.
His GitHub data tells the story. In 2013, when he was a full-time engineer building YC’s internal Bookface tool, he made 772 contributions for the year. In 2026, coding part-time with gstack while running YC, he has already hit 1,237+ contributions. He claims 600,000+ production lines over 60 days, with 35% of that being tests.
Yes, lines of code is a terrible metric. But the direction of the comparison is worth noting: he is producing more output as a part-time CEO-coder than he did as a full-time engineer, and the code is going into production systems.
How it works
gstack installs as 28 slash commands that turn Claude Code into a role-playing engineering org. The workflow follows a sprint model: Think, Plan, Build, Review, Test, Ship, Reflect.
Each command puts the AI into a specific cognitive mode.
On the product side, /office-hours runs a YC-style product consultation with six forcing questions. /plan-ceo-review challenges your scope using four modes: Expansion, Selective Expansion, Hold Scope, and Reduction. The prompts ask things like “what’s the 10-star product hiding inside your request?” – that is how YC partners actually evaluate startups.
For architecture, /plan-eng-review locks your technical design with ASCII sequence diagrams, data flow analysis, and edge-case enumeration before you write a single line of code.
For quality, /qa launches a real Chromium browser, tests your flows, and generates regression tests. Actual Playwright-driven Chrome with ~100ms per command, not simulated. /cso runs OWASP Top 10 plus STRIDE threat modeling with 17 false-positive exclusions. Community members have reported it catching real XSS vulnerabilities that manual review missed.
For shipping, /ship syncs your main branch, runs tests, audits coverage, and opens a PR. /land-and-deploy takes it from merged PR through CI verification to production health monitoring.
For safety, /careful warns before destructive commands, /freeze restricts edits to a single directory, and /guard combines both. These matter because gstack supports running 10-15 parallel sprint agents via Conductor, and you need guardrails when that many AI instances touch your codebase at once.
Who it is for
Solo founders who need to ship fast. Technical CEOs who want to stay in the code. Engineers tired of ad-hoc prompting who want a structured sprint workflow. If you think in terms of sprints, roles, and deployment pipelines, gstack will feel natural.
The criticism
Critics point out that gstack is “just prompts in text files.” This is technically true and misses the point entirely, like saying a recipe is “just words on paper.” The value is in the selection, sequencing, and opinionated structure of those prompts. Others argue Garry Tan’s celebrity status as YC CEO inflated adoption. There is probably some truth to that – it hit 10,000 stars in 48 hours, which is unusual for any developer tool. But the continued adoption and community workflows suggest it is solving a real problem beyond the initial buzz.
The LOC metric also draws fire. When AI can generate thousands of lines in minutes, raw output numbers mean less than they used to. The real question – how many of those lines survive contact with production – is harder to measure and nobody has a great answer yet.

Compound Engineering: the learning flywheel
Who built it
Every.to is a media company and AI product studio founded by Dan Shipper. Shipper studied philosophy at UPenn (he chose it because it was “the only major that would tell me how to live well”), but he has been writing software since he was a teenager making BlackBerry apps. He co-founded Firefly (collaborative browsing, acquired by Pegasystems in 2014) and now runs Every with about 20 employees, $1M+ ARR, and funding from Reid Hoffman.
The technical co-creator is Kieran Klaassen, who went from film composer to startup CTO. Klaassen runs Cora (Every’s AI email assistant) as a single-person team and coined the term “compound engineering.”
Every runs 4-5 software products, each maintained by one developer. Naveen Naidu’s Monologue app handles 30,000 daily transcriptions across a 143,000-line codebase, built almost entirely with AI assistance. These are production apps with thousands of real users.
How it works
Compound Engineering follows a four-step loop: Plan, Work, Review, Compound. The first three look like normal development. The fourth is what makes it different.
Plan (~30 minutes): /ce:plan spawns 3-4 parallel research agents that analyze your codebase, check framework docs via a Context7 MCP server, survey best practices, and analyze spec flows. You get a detailed implementation plan with checkbox-tracked tasks. The optional /deepen-plan command spawns 20-40 sub-agents for more depth.
Work (~2 hours): /ce:work executes the plan using git worktree isolation, step-by-step implementation, and continuous test/lint/type-check validation.
Review (~20 minutes): /ce:review is the technical standout. 14 specialist review agents run in parallel: security-sentinel (OWASP vulnerabilities), performance-oracle (N+1 queries, missing indexes), architecture-strategist (design decisions), data-integrity-guardian (migrations and transactions), kieran-rails-reviewer (literally Kieran Klaassen’s personal coding style), dhh-rails-reviewer (DHH’s 37signals philosophy), and eight more. Findings get triaged as P1, P2, or P3.
Compound (~5 minutes): /ce:compound is the part most people skip, and it is the most valuable. Six parallel sub-agents analyze the solution you just built and create structured docs in docs/solutions/ with YAML frontmatter – tags, category, module, symptom, root cause. On the next feature cycle, the planning agents automatically consult these docs. Every solved problem teaches the system something new about your codebase.
The methodology follows an 80/20 split: 80% of effort goes to planning and review, 20% to writing code. You move slower on any individual feature, but each feature makes the next one faster.
Who it is for
People working on long-lived codebases. Teams (even one-person teams) who want their AI tooling to get smarter over time. If you think in terms of systems and knowledge management rather than sprints, and you plan to maintain what you build for years, the compound step pays for itself.
The criticism
The full workflow is heavy for small changes. Running 14 parallel review agents and a compound step for a one-line CSS fix is overkill. Token consumption adds up when you are spawning that many agents per feature. And the methodology imposes Every’s specific philosophy rather than letting you customize freely – Klaassen has acknowledged this, saying “ideally, we delete the whole thing someday because it’s all built in.”
Will Larson implemented Compound Engineering across Imprint’s repositories in about an hour and called it “a cheap, useful experiment.” He predicts Claude Code and Cursor will absorb these patterns natively within months. The methodology may outlive the plugin.

Head-to-head comparison
| Dimension | gstack | Compound Engineering |
|---|---|---|
| Core idea | Virtual engineering team with roles | Self-improving learning flywheel |
| Slash commands | 28 | 6 core (22 total with utilities) |
| Browser automation | Yes (persistent Chromium daemon) | No |
| Security scanning | Dedicated /cso (OWASP + STRIDE) |
Via review agents |
| Deploy pipeline | Yes (/ship, /land-and-deploy, /canary) |
No |
| Knowledge accumulation | /retro captures retrospectives |
Core feature – docs/solutions/ feeds future planning |
| Review agents | Single senior-engineer style | 14 parallel specialists |
| Parallel execution | 10-15 sprint agents via Conductor | Sub-agents for research + review |
| Effort split | Spread across sprint lifecycle | 80% plan+review, 20% execution |
| Cross-platform | Claude Code, Codex, Gemini CLI, Cursor | 11 platforms via Bun/TypeScript converter |
| Installation | Git clone + setup script (30 seconds) | Plugin marketplace one-liner |
| GitHub stars | 40K+ | 11K+ |
| Time horizon | One-week sprint | Six-month codebase |
| Best for | Ship fast, solo founders | Build lasting systems, long-term quality |

Using both together
Developers in the community have already started combining these tools. Here is one workflow that works:
- gstack
/office-hoursto pressure-test your idea, then/plan-ceo-reviewto challenge scope - gstack
/plan-eng-reviewfor technical design with ASCII diagrams and edge cases - Compound Engineering
/ce:planfor detailed, research-backed implementation plans - Compound Engineering
/ce:workfor worktree-isolated, plan-tracked implementation - Compound Engineering
/ce:reviewfor 14-agent parallel specialist review - gstack
/qafor real browser testing with Chromium - gstack
/csofor OWASP + STRIDE threat modeling - gstack
/shipfor PR creation and test coverage auditing - Compound Engineering
/ce:compoundto document what you learned
gstack handles the outer loop: what gets built, how it ships. Compound Engineering handles the inner loop: making each cycle through the outer loop better than the last.

The bigger picture
Both of these tools exist because raw AI capability is no longer the bottleneck. Claude Code is powerful out of the box. The bottleneck is structure – how you organize the AI’s work, what constraints you give it, and how you capture what it learns.
Strip away the AI layer and here is what you find: gstack is what good engineering teams already do. Compound Engineering is what good learning organizations already do. AI did not invent new practices. It made existing best practices accessible to individuals who could never afford them before.
A solo founder cannot hire a security auditor, a QA engineer, a technical writer, and a release manager. But they can run all of those functions for $400/month in Claude subscriptions. The shift is not in how software gets built. It is in who can afford to build it well.
Both tools also point toward something you could call AI management as a discipline – workflow design, role definition, quality control, knowledge management. Not prompt engineering. Something closer to engineering management, except your reports are AI agents.
One thing worth calling out: neither tool has fully solved safety. A Hacker News commenter reported a Claude Code agent stuck in a 70-minute loop injecting staging URLs into production configs. Klaassen acknowledges that Claude sometimes “disables test conditions just to make them pass.” Both tools offer guardrails (gstack’s /careful//freeze//guard, Compound Engineering’s git worktree isolation and review gates), but they are all opt-in. If you are running autonomous agents on production codebases, treat permissions like database permissions: least privilege by default.
Getting started
gstack – 30-second install:
git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack
cd ~/.claude/skills/gstack && ./setup
Start with /office-hours to describe what you are building, then follow the sprint flow. GitHub repo
Compound Engineering – one command:
/plugin marketplace add EveryInc/compound-engineering-plugin
/plugin install compound-engineering
Start with /ce:brainstorm to explore your next feature, then follow the plan-work-review-compound loop. GitHub repo
Using both: Install both, then follow the combined workflow above. Use gstack for the strategic bookends (scoping, architecture, QA, security, shipping) and Compound Engineering for the implementation core (planning, execution, review, knowledge capture).
Both repos are actively maintained and the plugin ecosystem is moving fast. The specific tools will change. The practices they encode – structured planning, review gates, knowledge capture – are durable. Learn the practices. The tools are just today’s way to apply them.
At Augmi, we build infrastructure for deploying and managing AI agents. We watch the Claude Code plugin space closely because the patterns showing up there – structured workflows, multi-agent review, knowledge accumulation – are the same patterns production AI agents need to work reliably. If you are building with AI agents, we would like to hear what is working for you.
