adk-browser
Browser automation tools for Rust Agent Development Kit (ADK-Rust) agents using WebDriver.
Overview
adk-browser provides 46 comprehensive browser automation tools that enable AI agents to interact with web pages, extract information, fill forms, take screenshots, and more. Built on the WebDriver protocol (Selenium), it works with any WebDriver-compatible browser.
Features
- 46 Browser Tools: Complete web automation toolkit for AI agents
- WebDriver Compatible: Works with Selenium, ChromeDriver, GeckoDriver, etc.
- ADK Integration: Tools implement
adk_core::Tool for seamless agent integration
- Configurable: Filter tools by category based on agent needs
- Async: Built on Tokio for efficient async operations
Quick Start
Add to your Cargo.toml:
[dependencies]
adk-browser = "0.1"
adk-agent = "0.1"
adk-model = "0.1"
Prerequisites
Start a WebDriver server (e.g., Selenium):
docker run -d -p 4444:4444 -p 7900:7900 --shm-size=2g selenium/standalone-chrome:latest
Basic Usage
use adk_browser::{BrowserSession, BrowserToolset, BrowserConfig};
use adk_agent::LlmAgentBuilder;
use adk_model::GeminiModel;
use std::sync::Arc;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = BrowserConfig::new("http://localhost:4444");
let session = BrowserSession::new(config).await?;
let toolset = BrowserToolset::new(session);
let tools = toolset.all_tools();
let model = Arc::new(GeminiModel::from_env("gemini-2.0-flash")?);
let mut builder = LlmAgentBuilder::new("web_agent")
.model(model)
.instruction("You are a web automation assistant. Use browser tools to help users.");
for tool in tools {
builder = builder.tool(tool);
}
let agent = builder.build()?;
Ok(())
}
Filtered Tools
Select only the tools your agent needs:
let toolset = BrowserToolset::new(session)
.with_navigation(true) .with_extraction(true) .with_interaction(true) .with_forms(false) .with_screenshots(true) .with_javascript(false) .with_cookies(false) .with_frames(false) .with_windows(false) .with_actions(false);
let tools = toolset.selected_tools();
Available Tools (46 Total)
Navigation (6 tools)
| Tool |
Description |
browser_navigate |
Navigate to a URL |
browser_back |
Go back in history |
browser_forward |
Go forward in history |
browser_refresh |
Refresh current page |
browser_page_info |
Get current URL and title |
browser_close |
Close the browser session |
Extraction (6 tools)
| Tool |
Description |
browser_extract_text |
Extract visible text from page or element |
browser_extract_html |
Get HTML source |
browser_extract_links |
Extract all links from page |
browser_extract_images |
Extract all image sources |
browser_extract_tables |
Extract table data as JSON |
browser_extract_metadata |
Get page metadata (title, description, etc.) |
Interaction (6 tools)
| Tool |
Description |
browser_click |
Click on an element |
browser_type |
Type text into an element |
browser_clear |
Clear an input field |
browser_select |
Select option from dropdown |
browser_submit |
Submit a form |
browser_hover |
Hover over an element |
Forms (5 tools)
| Tool |
Description |
browser_fill_form |
Fill multiple form fields at once |
browser_get_form_fields |
List all form fields |
browser_get_field_value |
Get value of a form field |
browser_set_checkbox |
Set checkbox state |
browser_upload_file |
Upload file to input |
Screenshots (3 tools)
| Tool |
Description |
browser_screenshot |
Take full page screenshot |
browser_screenshot_element |
Screenshot specific element |
browser_print_pdf |
Generate PDF of page |
JavaScript (3 tools)
| Tool |
Description |
browser_evaluate |
Execute JavaScript synchronously |
browser_evaluate_async |
Execute async JavaScript |
browser_scroll |
Scroll page or element |
Wait (4 tools)
| Tool |
Description |
browser_wait_element |
Wait for element to appear |
browser_wait_text |
Wait for text to appear |
browser_wait_url |
Wait for URL to match |
browser_wait_load |
Wait for page load complete |
Cookies (4 tools)
| Tool |
Description |
browser_get_cookies |
Get all cookies |
browser_get_cookie |
Get specific cookie |
browser_set_cookie |
Set a cookie |
browser_delete_cookies |
Delete cookies |
Frames (3 tools)
| Tool |
Description |
browser_switch_frame |
Switch to iframe |
browser_switch_parent |
Switch to parent frame |
browser_switch_default |
Switch to main document |
Windows (4 tools)
| Tool |
Description |
browser_get_windows |
List all windows/tabs |
browser_switch_window |
Switch to window |
browser_new_tab |
Open new tab |
browser_close_window |
Close current window |
Actions (2 tools)
| Tool |
Description |
browser_drag_drop |
Drag and drop elements |
browser_double_click |
Double-click element |
Element Selectors
Tools that target elements accept CSS selectors:
"#login-button"
".submit-btn"
"input[type='email']"
"[data-testid='search']"
"form.login input[name='password']"
Example: Web Research Agent
use adk_browser::{BrowserSession, BrowserToolset, BrowserConfig};
use adk_agent::LlmAgentBuilder;
let session = BrowserSession::new(BrowserConfig::new("http://localhost:4444")).await?;
let toolset = BrowserToolset::new(session)
.with_navigation(true)
.with_extraction(true)
.with_screenshots(true);
let agent = LlmAgentBuilder::new("researcher")
.model(model)
.instruction(r#"
You are a web research assistant. When asked about a topic:
1. Navigate to relevant websites
2. Extract key information using browser_extract_text
3. Take screenshots of important content
4. Summarize your findings
"#)
.tools(toolset.selected_tools())
.build()?;
Configuration
let config = BrowserConfig::new("http://localhost:4444")
.with_headless(true) .with_timeout(Duration::from_secs(30))
.with_implicit_wait(Duration::from_secs(10));
let session = BrowserSession::new(config).await?;
WebDriver Options
Works with any WebDriver-compatible server:
| Server |
Command |
| Selenium (Chrome) |
docker run -d -p 4444:4444 selenium/standalone-chrome |
| Selenium (Firefox) |
docker run -d -p 4444:4444 selenium/standalone-firefox |
| ChromeDriver |
chromedriver --port=4444 |
| GeckoDriver |
geckodriver --port=4444 |
Examples
Run the included examples:
cargo run --example browser_basic
cargo run --example browser_agent
cargo run --example browser_interactive
cargo run --example browser_openai --features openai
License
Apache-2.0