@playwright/mcp: Microsoft's Official Browser Automation MCP Server

TL;DR @playwright/mcp defaults to an accessibility tree (browser_snapshot) instead of screenshots, cutting token consumption by 90%+. Combined with Playwright's native auto-wait, it's the best starting point for AI agents doing web automation.

#playwright #mcp #browser-automation #ai-agent #e2e-testing #developer-tools

Table of Contents

Installation and Configuration
Tool List
Accessibility Tree Mode vs Screenshot Mode
What Auto-wait Actually Means
Multi-tab Management
Limitations
In Summary
References

🌏 中文版

@playwright/mcp is the official Playwright MCP server maintained by Microsoft, letting AI agents control a browser through the Model Context Protocol. Its defining design choice: no screenshots by default. Instead it returns an ARIA accessibility tree to describe page state, dramatically cutting token consumption.

Installation and Configuration

Start it directly with npx — no global install required:

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp@latest"]
    }
  }
}

Launches headless Chromium by default. For headed mode (visible browser window):

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp@latest", "--headed"]
    }
  }
}

Attach to an existing Chrome instance:

{
  "args": ["@playwright/mcp@latest", "--cdp-endpoint", "ws://localhost:9222"]
}

Tool List

@playwright/mcp organises its tools into several categories:

Navigation

browser_navigate — go to URL
browser_go_back / browser_go_forward — history navigation
browser_reload — refresh the page

Page State

browser_snapshot — get ARIA accessibility tree (default mode, no image)
browser_screenshot — screenshot (base64 PNG, requires a vision model)

Interaction

browser_click — click element (by ARIA label / role / text)
browser_type — type text into an input
browser_press_key — press key (Enter, Tab, Escape, etc.)
browser_hover — mouse hover
browser_drag — drag and drop

Forms

browser_select_option — pick a dropdown value
browser_file_upload — upload a file
browser_handle_dialog — handle alert / confirm / prompt

Network and Dev

browser_network_requests — list page network requests
browser_console_messages — retrieve console output
browser_evaluate — execute JS in the page context

Tab Management

browser_tab_list — list all open tabs
browser_tab_new — open a new tab
browser_tab_select — switch to a tab
browser_tab_close — close a tab

Export

browser_pdf_save — save page as PDF

Accessibility Tree Mode vs Screenshot Mode

browser_snapshot is @playwright/mcp’s most important differentiator. It returns the ARIA tree as structured text, something like this:

- heading "Product List" [level=1]
- list
  - listitem
    - link "MacBook Pro 16-inch" [href="/products/macbook-pro"]
    - text "$2,499"
    - button "Add to Cart"
  - listitem
    - link "iPad Pro" [href="/products/ipad-pro"]
    - text "$1,099"
    - button "Add to Cart"

A 1920×1080 screenshot base64-encoded is roughly 100–300 KB, translating to tens of thousands of tokens; the accessibility tree for the same page is typically 2–10 KB and can be processed by any text model without vision capability.

When to switch to screenshot mode (browser_screenshot):

The page is image-heavy (galleries, maps, Canvas-rendered content)
You need to verify visual styling (colours, layout correctness)
The accessibility tree carries insufficient information to determine page state

What Auto-wait Actually Means

Playwright’s auto-wait applies to every interaction: click waits for the element to be visible + enabled + stable (not mid-animation); browser_type waits for the input to be focused.

For AI agents this means: no need to sprinkle “wait for the page to load” or “wait for the button to appear” into your prompts, and no sleep calls between tool invocations. Playwright handles the timing in the background, so the agent can issue “click Submit” without knowing the current page state.

Multi-tab Management

@playwright/mcp supports a full multi-tab workflow:

browser_tab_new → (work in new tab) → browser_tab_select(original tab) → browser_tab_close

Each tab has its own page context. browser_snapshot and browser_screenshot target the currently active tab. Cross-tab data transfer requires browser_evaluate or the agent tracking the state itself.

Limitations

No access to raw CDP Domains: HeapProfiler, Profiler, Security, and other Domains not wrapped by Playwright are unavailable in @playwright/mcp.

Firefox / WebKit require extra config: Chromium is the default. Switching browsers requires a startup flag, and some tools (such as browser_cdp_send) only work with Chromium.

Accessibility tree coverage: Pages with poor ARIA attributes may produce incomplete snapshots. In those cases, switch to screenshot mode or use browser_evaluate to query the DOM directly.

Sessions are not persistent: Restarting the MCP server clears the session — cookies and localStorage are lost. For persistent sessions, manage a browser profile via --user-data-dir.

In Summary

@playwright/mcp is currently the most AI-agent-friendly browser MCP option available. Accessibility tree mode cuts token costs and removes the dependency on vision-capable models; auto-wait brings interaction reliability close to a full E2E test framework. It’s the sensible default starting point unless you have a specific reason to need screenshot feedback or low-level CDP control.