Three seemingly distinct agent security problems — tool output injection, trust boundaries, malicious agents — share the same root cause: LLMs flatten instructions and data into a single token stream, making them architecturally unable to distinguish between the two. Understand this through-line and you can trace every attack from EchoLeak (CVE-2025-32711, zero-click) to the Morris II AI worm, and see why 'making the model behave' doesn't work — only architectural constraints (six design patterns, CaMeL) do.
A Go read-only scanner open-sourced by Perplexity in May 2026 (v0.1.1, zero non-stdlib dependencies). It inventories npm/PyPI/Go/RubyGems/Composer/MCP/editor and browser extensions into NDJSON, matches against a custom exposure catalog, and answers the question 'which machines in my fleet are currently affected' the moment a supply chain incident hits. It deliberately never invokes any package manager and is not an EDR.
In May 2026, OpenAI published its internal Codex deployment practices: sandboxes define technical boundaries, approval policies determine when to pause, Auto-review delegates approval decisions to a sub-agent instead of a human, and Managed configuration lets enterprise admins enforce policies top-down. The core philosophy: zero friction for low-risk actions, mandatory review for high-risk ones.
Not everyone should use a coding agent to modify code directly. AI Native teams need interface specs, test-first development, monorepo, security guardrails, human-in-the-loop, and token budget controls. Building an agent platform layer on top of coding agents and clearly redefining developer roles is the right path forward.
API Key is the most stable option; OAuth uses PKCE + token sink pattern; SecretRef supports env/file/exec sources; Trusted Proxy delegates authentication to a reverse proxy.
OpenClaw's sandbox has three layers of control: Sandbox determines where code runs (Docker/SSH/OpenShell), Tool Policy determines which tools are available, and Elevated is the host escape hatch for exec.
OpenClaw uses the MITRE ATLAS framework to analyze AI system threats, identifying three Critical risks (prompt injection, malicious skills, credential theft), and employs TLA+ formal verification for security properties.
When reviewing vulnerability scan results for a Node.js Docker image, you can't just look at package names. First distinguish between project dependencies and the packages bundled with npm inside the base image — otherwise you'll fix the wrong thing.
Vulnerability scanning isn't just about generating reports — it helps you discover known risks in your system before they become incidents. This post uses Trivy as a hands-on example to explain what scanners actually look for, how to read the results, and how to get started.
Claude Code has five permission modes: default (confirm each step), acceptEdits (auto-accept edits), plan (read-only planning), auto (background AI classifier review), and bypassPermissions (YOLO, skip everything). Switch with Shift+Tab or configure via settings.json. Auto mode is the sweet spot — no step-by-step confirmations, but with safety guardrails.
The attacks RAG systems face go beyond the technical level — Prompt Injection and Jailbreak are real threats. Both inputs and outputs need independent protection layers.