Skip to content

OpenClaw Tools (Part 4): TTS, PDF, Lobster, and MCP

Mar 28, 2026 1 min
TL;DR TTS supports three providers — ElevenLabs, Microsoft, and OpenAI. PDF has native and extraction modes. Lobster is a deterministic workflow runtime. MCP enables external tool integration.

🌏 中文版

This post covers OpenClaw’s auxiliary tools: voice synthesis, document analysis, deterministic workflows, and external tool integration.

Text-to-Speech (TTS)

Converts agent responses into speech. Disabled by default.

Three Providers

ProviderRequires API KeyDescription
ElevenLabs✅ (ELEVENLABS_API_KEY)High-quality voices
OpenAI✅ (OPENAI_API_KEY)OpenAI TTS API
MicrosoftUses Edge’s neural TTS, free

Auto-TTS Modes

ModeBehavior
off (default)Disabled
alwaysConvert all responses to speech
inboundConvert only after receiving a voice message
taggedConvert only when the response contains a [[tts]] tag

Configuration

{
  messages: {
    tts: {
      auto: "always",
      provider: "elevenlabs"
    }
  }
}

Skip Conditions

  • Response already contains media
  • Text is fewer than 10 characters
  • Text is too long (auto-summarization before conversion can be enabled)

Slash Commands

/tts status               # Check status
/tts provider openai      # Switch provider
/tts limit 2000           # Set character limit

Settings are stored locally (per-session), not globally.

PDF Tool

Analyzes PDF documents and returns text content.

Two Modes

ModeBehaviorSupported By
NativeSends raw PDF bytes directly to the provider APIAnthropic, Google
Extraction fallbackExtracts text first; renders page images when text is insufficientOther providers

Input Methods

  • Local file path (supports ~ expansion)
  • File URL
  • HTTP/HTTPS URL (remote URLs are blocked in sandbox mode)

Parameters

ParameterDescription
pdfSingle PDF
pdfsMultiple PDFs (up to 10)
promptAnalysis instruction (default: Analyze this PDF document)

Limitations

  • Default file size limit: 10 MB
  • Extraction fallback: max 20 pages
  • Native mode does not support page filtering

Lobster: Deterministic Workflow Runtime

Lobster lets OpenClaw execute multi-step tool sequences as deterministic operations.

The Problem It Solves

LLM-driven workflows have a problem: the token cost and coordination overhead of multiple tool calls is high. Lobster merges multiple tool calls into a single structured operation.

Three Core Advantages

AdvantageDescription
Merged executionOne Lobster call replaces multiple tool calls
Built-in approvalPauses before side effects, waits for human authorization
Resumable statePaused workflows return a token that allows resumption without re-execution

Design Philosophy

Lobster uses a DSL rather than arbitrary code — deterministic + auditable. Pipelines are data, making them easy to log, diff, replay, and review.

Implementation Pattern

inbox list --json | inbox categorize --json | inbox apply --json

Chain small CLI commands with approval steps for control.

Security Mechanisms

  • Enforced timeouts
  • Output size limits
  • Fixed executable naming
  • Sandbox-aware
  • Does not directly handle secrets or network calls

MCP Server Integration

OpenClaw supports MCP (Model Context Protocol) Servers to extend the agent’s toolset.

Configuration

{
  mcp: {
    servers: {
      "my-server": {
        command: "npx",
        args: ["-y", "@my-mcp/server"],
        env: { API_KEY: "..." }
      }
    }
  }
}

Management Commands

/mcp list                  # List MCP servers
/mcp status                # Check status

MCP allows OpenClaw to connect to the external tool ecosystem — databases, APIs, custom services, and more.

Media Processing

Images

  • image tool: Image analysis (requires imageModel)
  • image_generate tool: Image generation/editing (requires imageGenerationModel)
  • Supports OpenAI, Google, fal, and other providers

Media Attachments

Inbound media is automatically copied to the sandbox workspace (media/inbound/*). Supported formats depend on the channel.

Other Tools

ToolDescription
messageSend a message to the current channel
memory_searchSemantic search over memory
memory_getRead a specific memory file
cronScheduled tasks
gatewayGateway management
nodesNode device control
canvasCanvas tool

Summary

OpenClaw’s toolset covers everything from voice to documents, from deterministic workflows to external MCP extensions. TTS gives the agent a “voice,” Lobster makes complex workflows predictable and auditable, and MCP opens up unlimited tool extensibility.

References

This post is compiled from the following OpenClaw source documents: