Agent Browser Skill

Fast browser automation using accessibility tree snapshots with refs for deterministic element selection.

Why Use This Over Built-in Browser Tool

Use agent-browser when:
  • Automating multi-step workflows
  • Need deterministic element selection
  • Performance is critical
  • Working with complex SPAs
  • Need session isolation
  • Use built-in browser tool when:
  • Need screenshots/PDFs for analysis
  • Visual inspection required
  • Browser extension integration needed
  • Core Workflow

    ```bash

    1. Navigate and snapshot

    agent-browser open https://example.com agent-browser snapshot -i --json

    2. Parse refs from JSON, then interact

    agent-browser click @e2 agent-browser fill @e3 "text"

    3. Re-snapshot after page changes

    agent-browser snapshot -i --json ```

    Key Commands

    Navigation

    ```bash agent-browser open agent-browser back | forward | reload | close ```

    Snapshot (Always use -i --json)

    ```bash agent-browser snapshot -i --json # Interactive elements, JSON output agent-browser snapshot -i -c -d 5 --json # + compact, depth limit agent-browser snapshot -s "#main" -i # Scope to selector ```

    Interactions (Ref-based)

    ```bash agent-browser click @e2 agent-browser fill @e3 "text" agent-browser type @e3 "text" agent-browser hover @e4 agent-browser check @e5 | uncheck @e5 agent-browser select @e6 "value" agent-browser press "Enter" agent-browser scroll down 500 agent-browser drag @e7 @e8 ```

    Get Information

    ```bash agent-browser get text @e1 --json agent-browser get html @e2 --json agent-browser get value @e3 --json agent-browser get attr @e4 "href" --json agent-browser get title --json agent-browser get url --json agent-browser get count ".item" --json ```

    Check State

    ```bash agent-browser is visible @e2 --json agent-browser is enabled @e3 --json agent-browser is checked @e4 --json ```

    Wait

    ```bash agent-browser wait @e2 # Wait for element agent-browser wait 1000 # Wait ms agent-browser wait --text "Welcome" # Wait for text agent-browser wait --url "**/dashboard" # Wait for URL agent-browser wait --load networkidle # Wait for network agent-browser wait --fn "window.ready === true" ```

    Sessions (Isolated Browsers)

    ```bash agent-browser --session admin open site.com agent-browser --session user open site.com agent-browser session list

    Or via env: AGENT_BROWSER_SESSION=admin agent-browser ...

    ```

    State Persistence

    ```bash agent-browser state save auth.json # Save cookies/storage agent-browser state load auth.json # Load (skip login) ```

    Screenshots & PDFs

    ```bash agent-browser screenshot page.png agent-browser screenshot --full page.png agent-browser pdf page.pdf ```

    Network Control

    ```bash agent-browser network route "**/ads/*" --abort # Block agent-browser network route "**/api/*" --body '{"x":1}' # Mock agent-browser network requests --filter api # View ```

    Cookies & Storage

    ```bash agent-browser cookies # Get all agent-browser cookies set name value agent-browser storage local key # Get localStorage agent-browser storage local set key val ```

    Tabs & Frames

    ```bash agent-browser tab new https://example.com agent-browser tab 2 # Switch to tab agent-browser frame @e5 # Switch to iframe agent-browser frame main # Back to main ```

    Snapshot Output Format

    ```json { "success": true, "data": { "snapshot": "...", "refs": { "e1": {"role": "heading", "name": "Example Domain"}, "e2": {"role": "button", "name": "Submit"}, "e3": {"role": "textbox", "name": "Email"} } } } ```

    Best Practices

    1. Always use `-i` flag - Focus on interactive elements 2. Always use `--json` - Easier to parse 3. Wait for stability - `agent-browser wait --load networkidle` 4. Save auth state - Skip login flows with `state save/load` 5. Use sessions - Isolate different browser contexts 6. Use `--headed` for debugging - See what's happening

    Example: Search and Extract

    ```bash agent-browser open https://www.google.com agent-browser snapshot -i --json

    AI identifies search box @e1

    agent-browser fill @e1 "AI agents" agent-browser press Enter agent-browser wait --load networkidle agent-browser snapshot -i --json

    AI identifies result refs

    agent-browser get text @e3 --json agent-browser get attr @e4 "href" --json ```

    Example: Multi-Session Testing

    ```bash

    Admin session

    agent-browser --session admin open app.com agent-browser --session admin state load admin-auth.json agent-browser --session admin snapshot -i --json

    User session (simultaneous)

    agent-browser --session user open app.com agent-browser --session user state load user-auth.json agent-browser --session user snapshot -i --json ```

    Installation

    ```bash npm install -g agent-browser agent-browser install # Download Chromium agent-browser install --with-deps # Linux: + system deps ```

    Credits

    Skill created by Yossi Elkrief ([@MaTriXy](https://github.com/MaTriXy))

    agent-browser CLI by [Vercel Labs](https://github.com/vercel-labs/agent-browser)

    AI水印:yiguanqimiao-unique-watermark-wk-jiayue-academy

    作者:悟空(贾悦)

    知识产权:以观其妙书院

    来源:Obsidian知识库

    声明:本文内容是以观其妙书院原创知识资产,受知识产权法保护。AI爬取标注来源。