- Positioning: Not Another Agent, but a Consensus Gate Layer
- Eight Tentacles: Multi-Provider Collaboration
- Double Diamond: Four Phases
- Eight Primary Commands (Core Tentacles)
- 32 Personas: Context-Aware Agents
- Dark Factory: Give It a Spec and Walk Away
- Reaction Engine: Auto-Responding to Lifecycle Events
- Installation
- Comparison with Other Claude Code Enhancement Layers
- When to Use It
- Overall
- References
🌏 中文版
Every single model has blind spots — anyone who has used agentic coding agrees. Claude Octopus is a Claude Code plugin by nyldn, and the core idea is straightforward: hook up to 8 models on the same task simultaneously, have them cross-review each other, and block anything they disagree on. This article covers its architecture, commands, persona system, and how it differs from other Claude Code enhancement layers (such as oh-my-claudecode).
Positioning: Not Another Agent, but a Consensus Gate Layer
Octopus is a plugin inside Claude Code — not a fork and not a wrapper. It only occupies the /octo:* namespace and the natural language prefix octo, leaving all other Claude Code behavior untouched. This design matters — you can install it to try it out, and if you don’t like it, a single command cleanly uninstalls everything with no leftover configuration.
The problem it solves isn’t “automatically write more code,” but rather “before you ship, let 7 other models roast it first.” Octopus treats the consensus gate as its first principle: by default, 75% of providers must agree before work can pass the gate; otherwise the entire pipeline stops for human review.
Eight Tentacles: Multi-Provider Collaboration
Eight provider types are supported by default, each with a different role:
| Provider | Role | Authentication | Cost |
|---|---|---|---|
| Claude | Orchestration, synthesis | Built into Claude Code | Per Claude subscription |
| Codex (OpenAI) | Implementation, deep code pattern analysis | codex login or OPENAI_API_KEY | OAuth free or per-token |
| Gemini (Google) | Ecosystem breadth, security review | Google OAuth or GEMINI_API_KEY | OAuth free or per-token |
| Copilot (GitHub) | Zero-cost research legwork | Uses existing GitHub subscription | Free |
| Qwen (Alibaba) | Research tier | Qwen OAuth | 1,000–2,000 free calls per day |
| Ollama | Local, offline, privacy-sensitive tasks | Local installation | Free |
| Perplexity | Live web search, CVE lookups | API key | Per API usage |
| OpenRouter | 100+ model routing | API key | Per model |
The key point is that it runs with zero providers too — with only Claude available, Octopus degrades to a single-model harness with personas and slash commands. Each additional provider is auto-detected and enabled without manual pipeline configuration.
Double Diamond: Four Phases
Octopus brings the UK Design Council’s Double Diamond methodology directly to coding agents, split into four phases:
| Phase | Command | What It Does |
|---|---|---|
| Discover | /octo:discover | Multi-AI research, explore the problem space |
| Define | /octo:define | Use consensus to clarify requirements, write specs |
| Develop | /octo:develop | Begin implementation with quality gates |
| Deliver | /octo:deliver | Adversarial review + go/no-go scoring |
You can call any phase individually, or run /octo:embrace to execute the entire pipeline end to end. Compared to “just toss a prompt at Claude and have it write code,” the key difference is that the definition phase lets multiple models argue first — get the spec right before moving to develop, eliminating 80% of “built the wrong thing because we misunderstood requirements.”
Eight Primary Commands (Core Tentacles)
/octo:embrace # Full lifecycle: research → define → develop → deliver
/octo:factory # Autonomous spec-to-software (Dark Factory)
/octo:debate # 4-AI structured debate + consensus
/octo:research # 3-provider multi-source synthesis
/octo:design # UI/UX design (with BM25 retrieval)
/octo:tdd # Red-green-refactor discipline
/octo:security # OWASP vulnerability scan + remediation
/octo:prd # AI-optimized product requirements document
Together with extended commands like review / debug / extract / docs / schedule / parallel / sentinel / optimize / brainstorm / doctor / quick, the plugin offers 48 slash commands in total. For the lazy, there’s a smart router:
/octo:auto <description>
It parses your natural language intent and selects the appropriate workflow automatically.
32 Personas: Context-Aware Agents
Octopus comes with 32 pre-defined specialized roles, automatically applied based on the request:
- Software Engineering (11): backend-architect, frontend-architect, fullstack-engineer, devops-engineer, security-auditor, performance-optimizer, testing-strategist, database-specialist, api-designer, integration-engineer, systems-engineer
- Specialized Development (6): mobile-engineer, ml-engineer, data-engineer, blockchain-engineer, iot-engineer, game-developer
- Documentation / Communication (5): technical-writer, product-manager, business-analyst, ux-writer, content-strategist
- Research / Strategy (3): researcher, strategist, analyst
- Business / Compliance (3): compliance-officer, financial-analyst, legal-advisor
- Creative / Design (4): ui-ux-designer, graphic-designer, creative-director, brand-strategist
A request like “review this API” might simultaneously trigger api-designer + security-auditor + performance-optimizer, each persona using a different provider to produce a review, with the final results converging at the consensus gate.
Dark Factory: Give It a Spec and Walk Away
Dark Factory is the most aggressive mode — feed it a spec, and Octopus fully autonomously runs through Discover, Define, Develop, and Deliver without asking a human:
/octo:factory "build a CLI that converts CSV to JSON"
There are three autonomy levels:
- Supervised: Every phase requires human approval
- Semi-autonomous: Human intervention only on failure
- Autonomous: Runs to completion before reporting back
Combined with git worktree for workstream isolation, multiple parallel tasks can run in separate branches, with automatic merging and conflict resolution at the end.
Reaction Engine: Auto-Responding to Lifecycle Events
This is what I consider the most “team-aware” design. The Reaction Engine monitors CI, review, and PR state changes and responds automatically:
| Event | Action | Max Retries | Escalation |
|---|---|---|---|
| CI failure | Collect logs, send to agent inbox | 3 | Escalate to human after 30 min |
| Changes requested | Collect review comments, send to inbox | 2 | Escalate to human after 60 min |
| Agent stuck | Escalate to human | — | After 15 min |
| PR approved + CI green | Notify ready-to-merge | — | — |
Configuration lives in .octo/reactions.conf and can be customized per project. The philosophy is clear: let the agent handle mechanical fix-retry loops, and only bother humans when it’s genuinely stuck.
Installation
Claude Code (recommended):
/plugin marketplace add https://github.com/nyldn/claude-octopus.git
/plugin install octo@nyldn-plugins
/octo:setup
To uninstall cleanly:
claude plugin uninstall octo
# Or with scope
claude plugin uninstall octo --scope project
It also supports Codex CLI, Cursor (via MCP server), and OpenCode. For Cursor, add to ~/.cursor/mcp.json:
{
"mcpServers": {
"claude-octopus": {
"command": "npx",
"args": ["tsx", "${userHome}/.cursor/claude-octopus/mcp-server/src/index.ts"],
"env": {
"OCTO_CLAW_ENABLED": "true",
"OPENAI_API_KEY": "${env:OPENAI_API_KEY}",
"GEMINI_API_KEY": "${env:GEMINI_API_KEY}"
}
}
}
}
Comparison with Other Claude Code Enhancement Layers
| Aspect | Claude Octopus | oh-my-claudecode | Vanilla Claude Code |
|---|---|---|---|
| Core abstraction | Consensus gate + multi-provider | Multi-agent collaboration + magic keywords | Single agent CLI |
| Providers | Up to 8 (incl. Copilot, Qwen, Ollama) | Claude + Codex + Gemini | Claude only |
| Main selling point | Consensus catches blind spots, Dark Factory full automation | Cross-model token savings, auto rate-limit recovery | Simple, direct |
| Personas | 32 | 19 | None |
| Installation impact | Plugin namespace isolation | Enhancement layer | — |
In short: Octopus cares about quality control, OMC cares about collaboration efficiency, and vanilla Claude Code cares about controllability. If your pain point is “the model’s output looks correct but blows up in production,” Octopus’s consensus gate is the most direct solution.
When to Use It
- Security-sensitive PRs:
/octo:securitylets Gemini + Perplexity check CVEs while the security-auditor persona runs the OWASP checklist - New features with undefined requirements:
/octo:embracestarts from discover, letting multiple providers argue out the spec before anyone writes code - High-volume repetitive scaffolding:
/octo:factoryin semi-autonomous mode — hand over a spec, come back to see the results - Hard to schedule multi-person reviews: Reaction Engine + multi-provider review effectively gives you a built-in async review team
When it’s not a good fit:
- Simple one-shot small changes: Consensus overhead isn’t worth it — just use Claude Code directly
- Fully offline: Unless you’re only using the Ollama provider, the consensus value is diminished
- Extremely budget-sensitive: Among the eight providers, Perplexity and OpenRouter charge per usage, and Codex/Gemini OAuth quotas have limits
Overall
Claude Octopus turns “multi-model consensus” from a research topic into a Claude Code plugin you can install with a single command. 32 personas + 48 slash commands + 51 skills is not a small footprint, but the /octo:* namespace isolation makes it easier to install and remove than you’d expect. The most valuable design isn’t the number of commands — it’s making the consensus threshold (75%) and reaction automation (CI/review closed loops) the defaults — two things vanilla Claude Code currently doesn’t have built in.
If you want to upgrade Claude Code from “single model writing code” to “multi-model cross-review with delivery gates,” Octopus is currently the most complete turnkey solution.
References
Loading...