adk-browser

Browser automation tools for Rust Agent Development Kit (ADK-Rust) agents using WebDriver.

Overview

adk-browser provides 46 comprehensive browser automation tools that enable AI agents to interact with web pages, extract information, fill forms, take screenshots, and more. Built on the WebDriver protocol (Selenium), it works with any WebDriver-compatible browser.

Features

46 Browser Tools: Complete web automation toolkit for AI agents
WebDriver Compatible: Works with Selenium, ChromeDriver, GeckoDriver, etc.
ADK Integration: Tools implement adk_core::Tool for seamless agent integration
Configurable: Filter tools by category based on agent needs
Async: Built on Tokio for efficient async operations

Quick Start

Add to your Cargo.toml:

[dependencies]
adk-browser = "0.1"
adk-agent = "0.1"
adk-model = "0.1"

Prerequisites

Start a WebDriver server (e.g., Selenium):

docker run -d -p 4444:4444 -p 7900:7900 --shm-size=2g selenium/standalone-chrome:latest

Basic Usage

use adk_browser::{BrowserSession, BrowserToolset, BrowserConfig};
use adk_agent::LlmAgentBuilder;
use adk_model::GeminiModel;
use std::sync::Arc;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create browser session
    let config = BrowserConfig::new("http://localhost:4444");
    let session = BrowserSession::new(config).await?;

    // Create toolset with all 46 tools
    let toolset = BrowserToolset::new(session);
    let tools = toolset.all_tools();

    // Create AI agent with browser tools
    let model = Arc::new(GeminiModel::from_env("gemini-2.0-flash")?);

    let mut builder = LlmAgentBuilder::new("web_agent")
        .model(model)
        .instruction("You are a web automation assistant. Use browser tools to help users.");

    for tool in tools {
        builder = builder.tool(tool);
    }

    let agent = builder.build()?;

    Ok(())
}

Filtered Tools

Select only the tools your agent needs:

let toolset = BrowserToolset::new(session)
    .with_navigation(true)   // Navigate, back, forward, refresh
    .with_extraction(true)   // Extract text, links, HTML
    .with_interaction(true)  // Click, type, select
    .with_forms(false)       // Disable form tools
    .with_screenshots(true)  // Screenshots
    .with_javascript(false)  // Disable JS execution
    .with_cookies(false)     // Disable cookie tools
    .with_frames(false)      // Disable frame tools
    .with_windows(false)     // Disable window tools
    .with_actions(false);    // Disable advanced actions

let tools = toolset.selected_tools();

Available Tools (46 Total)

Navigation (6 tools)

Tool	Description
`browser_navigate`	Navigate to a URL
`browser_back`	Go back in history
`browser_forward`	Go forward in history
`browser_refresh`	Refresh current page
`browser_page_info`	Get current URL and title
`browser_close`	Close the browser session

Extraction (6 tools)

Tool	Description
`browser_extract_text`	Extract visible text from page or element
`browser_extract_html`	Get HTML source
`browser_extract_links`	Extract all links from page
`browser_extract_images`	Extract all image sources
`browser_extract_tables`	Extract table data as JSON
`browser_extract_metadata`	Get page metadata (title, description, etc.)

Interaction (6 tools)

Tool	Description
`browser_click`	Click on an element
`browser_type`	Type text into an element
`browser_clear`	Clear an input field
`browser_select`	Select option from dropdown
`browser_submit`	Submit a form
`browser_hover`	Hover over an element

Forms (5 tools)

Tool	Description
`browser_fill_form`	Fill multiple form fields at once
`browser_get_form_fields`	List all form fields
`browser_get_field_value`	Get value of a form field
`browser_set_checkbox`	Set checkbox state
`browser_upload_file`	Upload file to input

Screenshots (3 tools)

Tool	Description
`browser_screenshot`	Take full page screenshot
`browser_screenshot_element`	Screenshot specific element
`browser_print_pdf`	Generate PDF of page

JavaScript (3 tools)

Tool	Description
`browser_evaluate`	Execute JavaScript synchronously
`browser_evaluate_async`	Execute async JavaScript
`browser_scroll`	Scroll page or element

Wait (4 tools)

Tool	Description
`browser_wait_element`	Wait for element to appear
`browser_wait_text`	Wait for text to appear
`browser_wait_url`	Wait for URL to match
`browser_wait_load`	Wait for page load complete

Cookies (4 tools)

Tool	Description
`browser_get_cookies`	Get all cookies
`browser_get_cookie`	Get specific cookie
`browser_set_cookie`	Set a cookie
`browser_delete_cookies`	Delete cookies

Frames (3 tools)

Tool	Description
`browser_switch_frame`	Switch to iframe
`browser_switch_parent`	Switch to parent frame
`browser_switch_default`	Switch to main document

Windows (4 tools)

Tool	Description
`browser_get_windows`	List all windows/tabs
`browser_switch_window`	Switch to window
`browser_new_tab`	Open new tab
`browser_close_window`	Close current window

Actions (2 tools)

Tool	Description
`browser_drag_drop`	Drag and drop elements
`browser_double_click`	Double-click element

Element Selectors

Tools that target elements accept CSS selectors:

// By ID
"#login-button"

// By class
".submit-btn"

// By tag
"input[type='email']"

// By attribute
"[data-testid='search']"

// Complex selectors
"form.login input[name='password']"

Example: Web Research Agent

use adk_browser::{BrowserSession, BrowserToolset, BrowserConfig};
use adk_agent::LlmAgentBuilder;

let session = BrowserSession::new(BrowserConfig::new("http://localhost:4444")).await?;

let toolset = BrowserToolset::new(session)
    .with_navigation(true)
    .with_extraction(true)
    .with_screenshots(true);

let agent = LlmAgentBuilder::new("researcher")
    .model(model)
    .instruction(r#"
        You are a web research assistant. When asked about a topic:
        1. Navigate to relevant websites
        2. Extract key information using browser_extract_text
        3. Take screenshots of important content
        4. Summarize your findings
    "#)
    .tools(toolset.selected_tools())
    .build()?;

Configuration

let config = BrowserConfig::new("http://localhost:4444")
    .with_headless(true)          // Run headless (if supported)
    .with_timeout(Duration::from_secs(30))
    .with_implicit_wait(Duration::from_secs(10));

let session = BrowserSession::new(config).await?;

WebDriver Options

Works with any WebDriver-compatible server:

Server	Command
Selenium (Chrome)	`docker run -d -p 4444:4444 selenium/standalone-chrome`
Selenium (Firefox)	`docker run -d -p 4444:4444 selenium/standalone-firefox`
ChromeDriver	`chromedriver --port=4444`
GeckoDriver	`geckodriver --port=4444`

Examples

Run the included examples:

# Basic browser session
cargo run --example browser_basic

# AI agent with browser tools
cargo run --example browser_agent

# Full interactive example with all 46 tools
cargo run --example browser_interactive

# OpenAI-powered browser agent
cargo run --example browser_openai --features openai

License

Apache-2.0

adk-browser 0.1.4