Getting Started with selenium-ai-agent: AI-Driven Browser Automation via MCP

If you’ve ever wished you could just tell your AI assistant to open a browser, click through a website, fill out forms, and verify that everything works — without writing a single line of Selenium code yourself — that’s exactly what selenium-ai-agent does.

It’s an MCP (Model Context Protocol) server that gives AI assistants like Claude, Copilot, Cursor, and Cline the ability to control a real browser with 73 built-in tools. In this post, we’ll walk through what it is, how to install it, and see it in action using practiceautomatedtesting.com as our testing playground.

What Is selenium-ai-agent?

At its core, selenium-ai-agent is an npm package that acts as a bridge between your AI coding assistant and a Selenium-powered browser. Instead of writing WebDriver scripts by hand, you describe what you want in natural language, and the AI translates that into real browser actions — navigating pages, clicking elements, filling inputs, taking screenshots, and even running tests across a Selenium Grid.

selenium-ai-agent npm package page with installation instructions and quick start guide

The package ships with 73 tools organized into categories:

Navigation — browse to URLs, go back/forward, refresh
Page Analysis — capture interactive elements, take screenshots
Element Interaction — click, hover, drag-and-drop, select from dropdowns
Input — type text, press keys, upload files
Verification — assert that elements, text, or values are visible
Test Pipeline — plan tests, generate test scripts, and heal broken tests
Selenium Grid — manage parallel browser sessions across multiple nodes

selenium-ai-agent tools listing showing 73 tools organized by category including Navigation, Page Analysis, Elements, Input, Mouse, Tabs, and Verification

Installation

You can install it globally or run it directly without installing:

# Install globally
npm install -g selenium-ai-agent

# Or run directly with npx
npx selenium-ai-agent

Requirements: Node.js 18+ and a Chrome browser (Firefox and Edge are also supported). ChromeDriver is managed automatically.

Setting It Up With Your AI Client

The setup is the same across most MCP-compatible clients. Add the following to your client’s MCP configuration:

Claude Code

claude mcp add selenium-mcp -- npx selenium-ai-agent

Claude Desktop, Cursor, Cline, Windsurf

Add to your MCP config file:

{
  "mcpServers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["selenium-ai-agent"]
    }
  }
}

Config file locations for Claude Desktop:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

GitHub Copilot (VS Code 1.99+)

Add to .vscode/mcp.json (note: Copilot uses "servers" instead of "mcpServers"):

{
  "servers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["selenium-ai-agent"],
      "type": "stdio"
    }
  }
}

Hands-On Example: Exploring practiceautomatedtesting.com

Let’s see selenium-ai-agent in action. practiceautomatedtesting.com is a free practice site built specifically for learning test automation. It features web element exercises, API testing endpoints, a shopping application, and AI testing tools — making it a perfect playground.

Step 1: Navigate to the Site

Once the MCP server is running, you simply ask your AI assistant:

“Navigate to practiceautomatedtesting.com”

Behind the scenes, the assistant calls the navigate_to tool:

Tool: navigate_to
URL: https://practiceautomatedtesting.com

The tool returns the page title, URL, and all interactive elements on the page with reference IDs (e1, e2, e3, etc.) that can be used for subsequent actions:

Page: Home
URL: https://practiceautomatedtesting.com/

Interactive Elements:
  [e1] a: PracticeAutomatedTesting
  [e2] a: Home
  [e3] a: Web Elements
  [e4] a: API
  [e5] a: Shopping
  [e6] a: AI Tools
  [e7] button: Toggle theme
  [e8] a: Login
  [e9] a: Get Started

Practice Automated Testing homepage with navigation to Web Elements, API, Shopping, and AI Tools sections

Step 2: Explore Web Elements

Ask: “Click on Web Elements”

The assistant uses click_element with ref: e3, and the page loads a rich sidebar with testable components organized into three categories:

Elements — Simple Input Forms, Checkboxes, Radio Buttons, Web Tables, Links, Broken Links/Images, Upload and Download, Shadow DOM, Select Boxes
Widgets — Date Picker, Slider, Progress Bar, Tabs, Hover and Tooltip, Accordion, Menu
Interactions — Resize, Drag and Drop, Geolocation, Sorting, Frames, Keypress, Modal, Alert

Each of these is a self-contained exercise you can automate with natural language commands.

Web Elements testing page showing Simple Input Form with sidebar listing Elements, Widgets, and Interactions categories

Step 3: Fill Out a Form

With the Simple Input Form loaded, you can say:

“Type ‘John Doe’ in the first input, ‘john@example.com’ in the second input, and click Submit”

The assistant sequences three tool calls — two input_text calls and one click_element — filling in the form and submitting it, just like a real user would.

Step 4: Test the Shopping App

Navigate to the Shopping section, and you’ll find a fully functional e-commerce interface with product listings, category filters, price range sliders, star ratings, sorting controls, a shopping cart, and pagination. Ask:

“Filter products by Electronics, sort by price, and add the first item to the cart”

The assistant will use click_element and select_option tools to interact with the filters and buttons, then verify the cart updated using verify_text_visible.

E-commerce Shopping practice application with product listings, category filters, price range, star ratings, and shopping cart

Step 5: Interact with the API Section

The API section provides interactive documentation with “Try It Out” buttons and code snippets in cURL, JavaScript, Python, and C#. You can ask:

“Click the first Try It Out button and show me the response”

The assistant executes the API call within the browser and captures the result.

API Testing Documentation page with interactive Try It Out buttons and code snippets in cURL, JavaScript, Python, and C#

Going Parallel with Selenium Grid

For larger test suites, selenium-ai-agent supports Selenium Grid for parallel execution. If you have Docker available, the grid setup is straightforward.

Selenium Grid setup documentation showing Docker Compose configuration and parallel workflow examples

Check Grid Status

Tool: grid_status

Grid Status: READY
Capacity: 0/5 slots used (5 available)
Nodes:
  - chrome 144.0 (x4)
  - firefox 147.0 (x1)

Create Grid Sessions

Tool: session_create
Browser: chrome
Tags: ["smoke-test"]

You can create multiple sessions and run tests in parallel across different browsers, which is ideal for cross-browser testing of sites like practiceautomatedtesting.com.

Parallel Exploration

One of the most powerful features is parallel_explore, which lets you explore multiple sections of a site simultaneously across different browser sessions. For example, you could explore the Web Elements, Shopping, and API sections all at once:

Tool: parallel_explore
Base URL: https://practiceautomatedtesting.com
Targets:
  - /webelements
  - /shopping
  - /api

The Test Generation and Healing Pipeline

Beyond ad-hoc browser control, selenium-ai-agent includes a complete test lifecycle pipeline:

1. Plan — Use planner_setup_page and planner_explore_page to discover features and generate a structured test plan.

2. Record & Generate — Start recording browser actions with start_recording, interact with the application naturally, then stop recording. The action log is used to generate test scripts in your preferred framework (Selenium + Python/pytest, Playwright, etc.).

3. Heal — When tests break due to UI changes, use healer_run_tests to identify failures, healer_inspect_page to compare expected vs. actual locators, and healer_fix_test to automatically repair the broken tests.

This pipeline turns manual exploratory testing into maintainable automated test suites.

Useful Configuration Options

You can customize the agent’s behavior through environment variables in your MCP config:

{
  "mcpServers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["selenium-ai-agent"],
      "env": {
        "SELENIUM_HEADLESS": "true",
        "SELENIUM_STEALTH": "true",
        "SELENIUM_MCP_SAVE_TRACE": "true",
        "SE_AVOID_STATS": "true"
      }
    }
  }
}

Key options include headless mode for CI/CD environments, stealth mode to hide automation indicators, session tracing for debugging, and Grid URL configuration for distributed testing.

Why This Matters

Traditional browser automation requires writing and maintaining verbose WebDriver code. With selenium-ai-agent, the entire workflow shifts to natural language. You describe what you want to test, and the AI handles element discovery, interaction sequencing, and verification — all through a standardized MCP protocol that works across multiple AI clients.

For teams practicing on sites like practiceautomatedtesting.com, it’s a way to learn automation concepts without getting bogged down in boilerplate. For production teams, it’s a way to accelerate test creation and maintenance by pairing AI intelligence with Selenium’s battle-tested browser control.

Get Started

Install: npm install -g selenium-ai-agent
Add to your MCP client config
Open your AI assistant and say: “Navigate to practiceautomatedtesting.com and explore the Web Elements section”
Start automating.

Links:

npm: npmjs.com/package/selenium-ai-agent
Practice site: practiceautomatedtesting.com