gstack vs Compound Engineering: two Claude Code plugins, opposite philosophies

A new Claude Code plugin drops every day. Most of them I ignore. But two have changed how I actually work: gstack by Garry Tan (40K+ GitHub stars) and Compound Engineering by Every.to (11K+ stars).

We build an AI agent platform, so we spend a lot of time watching how developers structure their work with AI. We dug through both GitHub repos, read the creator interviews, followed the community debates, and tried both plugins on real projects. Here is what we found.

Why these two are different

Most Claude Code plugins do one thing well – a git shortcut, a smarter linter, a testing helper. gstack and Compound Engineering are not that. They are complete methodologies for working with AI coding agents. They both try to answer the same question: “Claude Code is powerful, but how do I use it without things going sideways?”

They answer it in opposite ways.

gstack treats AI like a virtual engineering team with specialized roles. Compound Engineering builds a self-improving system where every feature teaches the AI something new about your codebase.

Both work. The question is which fits your situation, and whether running them together makes sense.

gstack virtual engineering team

gstack: the virtual engineering team

Who built it

Garry Tan is the President and CEO of Y Combinator. Stanford CS grad, started programming at 14, was the 10th employee at Palantir, co-founded Posterous (sold to Twitter for $20M), runs Initialized Capital ($700M+ AUM). The relevant part: he is shipping production code daily while running YC. This is not a side project from someone who talks about AI development. He uses it every day.

His GitHub data tells the story. In 2013, when he was a full-time engineer building YC’s internal Bookface tool, he made 772 contributions for the year. In 2026, coding part-time with gstack while running YC, he has already hit 1,237+ contributions. He claims 600,000+ production lines over 60 days, with 35% of that being tests.

Yes, lines of code is a terrible metric. But the direction of the comparison is worth noting: he is producing more output as a part-time CEO-coder than he did as a full-time engineer, and the code is going into production systems.

How it works

gstack installs as 28 slash commands that turn Claude Code into a role-playing engineering org. The workflow follows a sprint model: Think, Plan, Build, Review, Test, Ship, Reflect.

Each command puts the AI into a specific cognitive mode.

On the product side, /office-hours runs a YC-style product consultation with six forcing questions. /plan-ceo-review challenges your scope using four modes: Expansion, Selective Expansion, Hold Scope, and Reduction. The prompts ask things like “what’s the 10-star product hiding inside your request?” – that is how YC partners actually evaluate startups.

For architecture, /plan-eng-review locks your technical design with ASCII sequence diagrams, data flow analysis, and edge-case enumeration before you write a single line of code.

For quality, /qa launches a real Chromium browser, tests your flows, and generates regression tests. Actual Playwright-driven Chrome with ~100ms per command, not simulated. /cso runs OWASP Top 10 plus STRIDE threat modeling with 17 false-positive exclusions. Community members have reported it catching real XSS vulnerabilities that manual review missed.

For shipping, /ship syncs your main branch, runs tests, audits coverage, and opens a PR. /land-and-deploy takes it from merged PR through CI verification to production health monitoring.

For safety, /careful warns before destructive commands, /freeze restricts edits to a single directory, and /guard combines both. These matter because gstack supports running 10-15 parallel sprint agents via Conductor, and you need guardrails when that many AI instances touch your codebase at once.

Who it is for

Solo founders who need to ship fast. Technical CEOs who want to stay in the code. Engineers tired of ad-hoc prompting who want a structured sprint workflow. If you think in terms of sprints, roles, and deployment pipelines, gstack will feel natural.

The criticism

Critics point out that gstack is “just prompts in text files.” This is technically true and misses the point entirely, like saying a recipe is “just words on paper.” The value is in the selection, sequencing, and opinionated structure of those prompts. Others argue Garry Tan’s celebrity status as YC CEO inflated adoption. There is probably some truth to that – it hit 10,000 stars in 48 hours, which is unusual for any developer tool. But the continued adoption and community workflows suggest it is solving a real problem beyond the initial buzz.

The LOC metric also draws fire. When AI can generate thousands of lines in minutes, raw output numbers mean less than they used to. The real question – how many of those lines survive contact with production – is harder to measure and nobody has a great answer yet.

Compound Engineering learning loop

Compound Engineering: the learning flywheel

Who built it

Every.to is a media company and AI product studio founded by Dan Shipper. Shipper studied philosophy at UPenn (he chose it because it was “the only major that would tell me how to live well”), but he has been writing software since he was a teenager making BlackBerry apps. He co-founded Firefly (collaborative browsing, acquired by Pegasystems in 2014) and now runs Every with about 20 employees, $1M+ ARR, and funding from Reid Hoffman.

The technical co-creator is Kieran Klaassen, who went from film composer to startup CTO. Klaassen runs Cora (Every’s AI email assistant) as a single-person team and coined the term “compound engineering.”

Every runs 4-5 software products, each maintained by one developer. Naveen Naidu’s Monologue app handles 30,000 daily transcriptions across a 143,000-line codebase, built almost entirely with AI assistance. These are production apps with thousands of real users.

How it works

Compound Engineering follows a four-step loop: Plan, Work, Review, Compound. The first three look like normal development. The fourth is what makes it different.

Plan (~30 minutes): /ce:plan spawns 3-4 parallel research agents that analyze your codebase, check framework docs via a Context7 MCP server, survey best practices, and analyze spec flows. You get a detailed implementation plan with checkbox-tracked tasks. The optional /deepen-plan command spawns 20-40 sub-agents for more depth.

Work (~2 hours): /ce:work executes the plan using git worktree isolation, step-by-step implementation, and continuous test/lint/type-check validation.

Review (~20 minutes): /ce:review is the technical standout. 14 specialist review agents run in parallel: security-sentinel (OWASP vulnerabilities), performance-oracle (N+1 queries, missing indexes), architecture-strategist (design decisions), data-integrity-guardian (migrations and transactions), kieran-rails-reviewer (literally Kieran Klaassen’s personal coding style), dhh-rails-reviewer (DHH’s 37signals philosophy), and eight more. Findings get triaged as P1, P2, or P3.

Compound (~5 minutes): /ce:compound is the part most people skip, and it is the most valuable. Six parallel sub-agents analyze the solution you just built and create structured docs in docs/solutions/ with YAML frontmatter – tags, category, module, symptom, root cause. On the next feature cycle, the planning agents automatically consult these docs. Every solved problem teaches the system something new about your codebase.

The methodology follows an 80/20 split: 80% of effort goes to planning and review, 20% to writing code. You move slower on any individual feature, but each feature makes the next one faster.

Who it is for

People working on long-lived codebases. Teams (even one-person teams) who want their AI tooling to get smarter over time. If you think in terms of systems and knowledge management rather than sprints, and you plan to maintain what you build for years, the compound step pays for itself.

The criticism

The full workflow is heavy for small changes. Running 14 parallel review agents and a compound step for a one-line CSS fix is overkill. Token consumption adds up when you are spawning that many agents per feature. And the methodology imposes Every’s specific philosophy rather than letting you customize freely – Klaassen has acknowledged this, saying “ideally, we delete the whole thing someday because it’s all built in.”

Will Larson implemented Compound Engineering across Imprint’s repositories in about an hour and called it “a cheap, useful experiment.” He predicts Claude Code and Cursor will absorb these patterns natively within months. The methodology may outlive the plugin.

Comparison side by side

Head-to-head comparison

Dimension	gstack	Compound Engineering
Core idea	Virtual engineering team with roles	Self-improving learning flywheel
Slash commands	28	6 core (22 total with utilities)
Browser automation	Yes (persistent Chromium daemon)	No
Security scanning	Dedicated `/cso` (OWASP + STRIDE)	Via review agents
Deploy pipeline	Yes (`/ship`, `/land-and-deploy`, `/canary`)	No
Knowledge accumulation	`/retro` captures retrospectives	Core feature – `docs/solutions/` feeds future planning
Review agents	Single senior-engineer style	14 parallel specialists
Parallel execution	10-15 sprint agents via Conductor	Sub-agents for research + review
Effort split	Spread across sprint lifecycle	80% plan+review, 20% execution
Cross-platform	Claude Code, Codex, Gemini CLI, Cursor	11 platforms via Bun/TypeScript converter
Installation	Git clone + setup script (30 seconds)	Plugin marketplace one-liner
GitHub stars	40K+	11K+
Time horizon	One-week sprint	Six-month codebase
Best for	Ship fast, solo founders	Build lasting systems, long-term quality

Combined workflow diagram

Using both together

Developers in the community have already started combining these tools. Here is one workflow that works:

gstack /office-hours to pressure-test your idea, then /plan-ceo-review to challenge scope
gstack /plan-eng-review for technical design with ASCII diagrams and edge cases
Compound Engineering /ce:plan for detailed, research-backed implementation plans
Compound Engineering /ce:work for worktree-isolated, plan-tracked implementation
Compound Engineering /ce:review for 14-agent parallel specialist review
gstack /qa for real browser testing with Chromium
gstack /cso for OWASP + STRIDE threat modeling
gstack /ship for PR creation and test coverage auditing
Compound Engineering /ce:compound to document what you learned

gstack handles the outer loop: what gets built, how it ships. Compound Engineering handles the inner loop: making each cycle through the outer loop better than the last.

AI management evolution

The bigger picture

Both of these tools exist because raw AI capability is no longer the bottleneck. Claude Code is powerful out of the box. The bottleneck is structure – how you organize the AI’s work, what constraints you give it, and how you capture what it learns.

Strip away the AI layer and here is what you find: gstack is what good engineering teams already do. Compound Engineering is what good learning organizations already do. AI did not invent new practices. It made existing best practices accessible to individuals who could never afford them before.

A solo founder cannot hire a security auditor, a QA engineer, a technical writer, and a release manager. But they can run all of those functions for $400/month in Claude subscriptions. The shift is not in how software gets built. It is in who can afford to build it well.

Both tools also point toward something you could call AI management as a discipline – workflow design, role definition, quality control, knowledge management. Not prompt engineering. Something closer to engineering management, except your reports are AI agents.

One thing worth calling out: neither tool has fully solved safety. A Hacker News commenter reported a Claude Code agent stuck in a 70-minute loop injecting staging URLs into production configs. Klaassen acknowledges that Claude sometimes “disables test conditions just to make them pass.” Both tools offer guardrails (gstack’s /careful//freeze//guard, Compound Engineering’s git worktree isolation and review gates), but they are all opt-in. If you are running autonomous agents on production codebases, treat permissions like database permissions: least privilege by default.

Getting started

gstack – 30-second install:

git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack
cd ~/.claude/skills/gstack && ./setup

Start with /office-hours to describe what you are building, then follow the sprint flow. GitHub repo

Compound Engineering – one command:

/plugin marketplace add EveryInc/compound-engineering-plugin
/plugin install compound-engineering

Start with /ce:brainstorm to explore your next feature, then follow the plan-work-review-compound loop. GitHub repo

Using both: Install both, then follow the combined workflow above. Use gstack for the strategic bookends (scoping, architecture, QA, security, shipping) and Compound Engineering for the implementation core (planning, execution, review, knowledge capture).

Both repos are actively maintained and the plugin ecosystem is moving fast. The specific tools will change. The practices they encode – structured planning, review gates, knowledge capture – are durable. Learn the practices. The tools are just today’s way to apply them.

At Augmi, we build infrastructure for deploying and managing AI agents. We watch the Claude Code plugin space closely because the patterns showing up there – structured workflows, multi-agent review, knowledge accumulation – are the same patterns production AI agents need to work reliably. If you are building with AI agents, we would like to hear what is working for you.