Skip to content

Claude Octopus: The Consensus Plugin That Hooks 8 Models Into Claude Code Simultaneously

Apr 14, 2026 1 min
TL;DR Claude Octopus is a Claude Code plugin that simultaneously calls Codex, Gemini, Copilot, Qwen, Ollama, Perplexity, OpenRouter, and Claude to review the same code, using a 75% consensus threshold to catch single-model blind spots. It ships with 32 personas, 48 /octo:* slash commands, 51 skills, and a Dark Factory fully autonomous spec-to-code pipeline.

🌏 中文版

Every single model has blind spots — anyone who has used agentic coding agrees. Claude Octopus is a Claude Code plugin by nyldn, and the core idea is straightforward: hook up to 8 models on the same task simultaneously, have them cross-review each other, and block anything they disagree on. This article covers its architecture, commands, persona system, and how it differs from other Claude Code enhancement layers (such as oh-my-claudecode).

Positioning: Not Another Agent, but a Consensus Gate Layer

Octopus is a plugin inside Claude Code — not a fork and not a wrapper. It only occupies the /octo:* namespace and the natural language prefix octo, leaving all other Claude Code behavior untouched. This design matters — you can install it to try it out, and if you don’t like it, a single command cleanly uninstalls everything with no leftover configuration.

The problem it solves isn’t “automatically write more code,” but rather “before you ship, let 7 other models roast it first.” Octopus treats the consensus gate as its first principle: by default, 75% of providers must agree before work can pass the gate; otherwise the entire pipeline stops for human review.

Eight Tentacles: Multi-Provider Collaboration

Eight provider types are supported by default, each with a different role:

ProviderRoleAuthenticationCost
ClaudeOrchestration, synthesisBuilt into Claude CodePer Claude subscription
Codex (OpenAI)Implementation, deep code pattern analysiscodex login or OPENAI_API_KEYOAuth free or per-token
Gemini (Google)Ecosystem breadth, security reviewGoogle OAuth or GEMINI_API_KEYOAuth free or per-token
Copilot (GitHub)Zero-cost research legworkUses existing GitHub subscriptionFree
Qwen (Alibaba)Research tierQwen OAuth1,000–2,000 free calls per day
OllamaLocal, offline, privacy-sensitive tasksLocal installationFree
PerplexityLive web search, CVE lookupsAPI keyPer API usage
OpenRouter100+ model routingAPI keyPer model

The key point is that it runs with zero providers too — with only Claude available, Octopus degrades to a single-model harness with personas and slash commands. Each additional provider is auto-detected and enabled without manual pipeline configuration.

Double Diamond: Four Phases

Octopus brings the UK Design Council’s Double Diamond methodology directly to coding agents, split into four phases:

PhaseCommandWhat It Does
Discover/octo:discoverMulti-AI research, explore the problem space
Define/octo:defineUse consensus to clarify requirements, write specs
Develop/octo:developBegin implementation with quality gates
Deliver/octo:deliverAdversarial review + go/no-go scoring

You can call any phase individually, or run /octo:embrace to execute the entire pipeline end to end. Compared to “just toss a prompt at Claude and have it write code,” the key difference is that the definition phase lets multiple models argue first — get the spec right before moving to develop, eliminating 80% of “built the wrong thing because we misunderstood requirements.”

Eight Primary Commands (Core Tentacles)

/octo:embrace       # Full lifecycle: research → define → develop → deliver
/octo:factory       # Autonomous spec-to-software (Dark Factory)
/octo:debate        # 4-AI structured debate + consensus
/octo:research      # 3-provider multi-source synthesis
/octo:design        # UI/UX design (with BM25 retrieval)
/octo:tdd           # Red-green-refactor discipline
/octo:security      # OWASP vulnerability scan + remediation
/octo:prd           # AI-optimized product requirements document

Together with extended commands like review / debug / extract / docs / schedule / parallel / sentinel / optimize / brainstorm / doctor / quick, the plugin offers 48 slash commands in total. For the lazy, there’s a smart router:

/octo:auto <description>

It parses your natural language intent and selects the appropriate workflow automatically.

32 Personas: Context-Aware Agents

Octopus comes with 32 pre-defined specialized roles, automatically applied based on the request:

  • Software Engineering (11): backend-architect, frontend-architect, fullstack-engineer, devops-engineer, security-auditor, performance-optimizer, testing-strategist, database-specialist, api-designer, integration-engineer, systems-engineer
  • Specialized Development (6): mobile-engineer, ml-engineer, data-engineer, blockchain-engineer, iot-engineer, game-developer
  • Documentation / Communication (5): technical-writer, product-manager, business-analyst, ux-writer, content-strategist
  • Research / Strategy (3): researcher, strategist, analyst
  • Business / Compliance (3): compliance-officer, financial-analyst, legal-advisor
  • Creative / Design (4): ui-ux-designer, graphic-designer, creative-director, brand-strategist

A request like “review this API” might simultaneously trigger api-designer + security-auditor + performance-optimizer, each persona using a different provider to produce a review, with the final results converging at the consensus gate.

Dark Factory: Give It a Spec and Walk Away

Dark Factory is the most aggressive mode — feed it a spec, and Octopus fully autonomously runs through Discover, Define, Develop, and Deliver without asking a human:

/octo:factory "build a CLI that converts CSV to JSON"

There are three autonomy levels:

  • Supervised: Every phase requires human approval
  • Semi-autonomous: Human intervention only on failure
  • Autonomous: Runs to completion before reporting back

Combined with git worktree for workstream isolation, multiple parallel tasks can run in separate branches, with automatic merging and conflict resolution at the end.

Reaction Engine: Auto-Responding to Lifecycle Events

This is what I consider the most “team-aware” design. The Reaction Engine monitors CI, review, and PR state changes and responds automatically:

EventActionMax RetriesEscalation
CI failureCollect logs, send to agent inbox3Escalate to human after 30 min
Changes requestedCollect review comments, send to inbox2Escalate to human after 60 min
Agent stuckEscalate to humanAfter 15 min
PR approved + CI greenNotify ready-to-merge

Configuration lives in .octo/reactions.conf and can be customized per project. The philosophy is clear: let the agent handle mechanical fix-retry loops, and only bother humans when it’s genuinely stuck.

Installation

Claude Code (recommended):

/plugin marketplace add https://github.com/nyldn/claude-octopus.git
/plugin install octo@nyldn-plugins
/octo:setup

To uninstall cleanly:

claude plugin uninstall octo
# Or with scope
claude plugin uninstall octo --scope project

It also supports Codex CLI, Cursor (via MCP server), and OpenCode. For Cursor, add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "claude-octopus": {
      "command": "npx",
      "args": ["tsx", "${userHome}/.cursor/claude-octopus/mcp-server/src/index.ts"],
      "env": {
        "OCTO_CLAW_ENABLED": "true",
        "OPENAI_API_KEY": "${env:OPENAI_API_KEY}",
        "GEMINI_API_KEY": "${env:GEMINI_API_KEY}"
      }
    }
  }
}

Comparison with Other Claude Code Enhancement Layers

AspectClaude Octopusoh-my-claudecodeVanilla Claude Code
Core abstractionConsensus gate + multi-providerMulti-agent collaboration + magic keywordsSingle agent CLI
ProvidersUp to 8 (incl. Copilot, Qwen, Ollama)Claude + Codex + GeminiClaude only
Main selling pointConsensus catches blind spots, Dark Factory full automationCross-model token savings, auto rate-limit recoverySimple, direct
Personas3219None
Installation impactPlugin namespace isolationEnhancement layer

In short: Octopus cares about quality control, OMC cares about collaboration efficiency, and vanilla Claude Code cares about controllability. If your pain point is “the model’s output looks correct but blows up in production,” Octopus’s consensus gate is the most direct solution.

When to Use It

  • Security-sensitive PRs: /octo:security lets Gemini + Perplexity check CVEs while the security-auditor persona runs the OWASP checklist
  • New features with undefined requirements: /octo:embrace starts from discover, letting multiple providers argue out the spec before anyone writes code
  • High-volume repetitive scaffolding: /octo:factory in semi-autonomous mode — hand over a spec, come back to see the results
  • Hard to schedule multi-person reviews: Reaction Engine + multi-provider review effectively gives you a built-in async review team

When it’s not a good fit:

  • Simple one-shot small changes: Consensus overhead isn’t worth it — just use Claude Code directly
  • Fully offline: Unless you’re only using the Ollama provider, the consensus value is diminished
  • Extremely budget-sensitive: Among the eight providers, Perplexity and OpenRouter charge per usage, and Codex/Gemini OAuth quotas have limits

Overall

Claude Octopus turns “multi-model consensus” from a research topic into a Claude Code plugin you can install with a single command. 32 personas + 48 slash commands + 51 skills is not a small footprint, but the /octo:* namespace isolation makes it easier to install and remove than you’d expect. The most valuable design isn’t the number of commands — it’s making the consensus threshold (75%) and reaction automation (CI/review closed loops) the defaults — two things vanilla Claude Code currently doesn’t have built in.

If you want to upgrade Claude Code from “single model writing code” to “multi-model cross-review with delivery gates,” Octopus is currently the most complete turnkey solution.

References