Expand description
§adk-browser
Browser automation tools for ADK agents using WebDriver (via thirtyfour).
§Overview
This crate provides browser automation capabilities as ADK tools, allowing LLM agents to interact with web pages. Tools are designed to work with any LlmAgent and inherit all ADK benefits (callbacks, session management, etc.).
§Quick Start
ⓘ
use adk_browser::{BrowserSession, BrowserConfig, BrowserToolset};
use adk_agent::LlmAgentBuilder;
use std::sync::Arc;
async fn example() -> anyhow::Result<()> {
// Create browser session
let config = BrowserConfig::new()
.headless(true)
.viewport(1920, 1080);
let browser = Arc::new(BrowserSession::new(config));
browser.start().await?;
// Create toolset
let toolset = BrowserToolset::new(browser.clone());
// Add tools to agent (example - requires model)
// let agent = LlmAgentBuilder::new("browser_agent")
// .model(model)
// .instruction("You are a web automation assistant.")
// .tools(toolset.all_tools())
// .build()?;
// Clean up
browser.stop().await?;
Ok(())
}§Available Tools
§Navigation
browser_navigate- Navigate to a URLbrowser_back- Go back in historybrowser_forward- Go forward in historybrowser_refresh- Refresh the page
§Interaction
browser_click- Click on an elementbrowser_double_click- Double-click an elementbrowser_type- Type text into an inputbrowser_clear- Clear an input fieldbrowser_select- Select from a dropdown
§Extraction
browser_extract_text- Get text from elementsbrowser_extract_attribute- Get attribute valuesbrowser_extract_links- Get all links on pagebrowser_page_info- Get current URL and titlebrowser_page_source- Get HTML source
§Screenshots
browser_screenshot- Capture page or element screenshot
§Waiting
browser_wait_for_element- Wait for element to appearbrowser_wait- Wait for a durationbrowser_wait_for_page_load- Wait for page to loadbrowser_wait_for_text- Wait for text to appear
§JavaScript
browser_evaluate_js- Execute JavaScript codebrowser_scroll- Scroll the pagebrowser_hover- Hover over an elementbrowser_handle_alert- Handle JavaScript alerts
§Cookies
browser_get_cookies- Get all cookiesbrowser_get_cookie- Get a specific cookiebrowser_add_cookie- Add a cookiebrowser_delete_cookie- Delete a cookiebrowser_delete_all_cookies- Delete all cookies
§Windows/Tabs
browser_list_windows- List all windows/tabsbrowser_new_tab- Open a new tabbrowser_new_window- Open a new windowbrowser_switch_window- Switch to a windowbrowser_close_window- Close current windowbrowser_maximize_window- Maximize windowbrowser_minimize_window- Minimize windowbrowser_set_window_size- Set window size
§Frames
browser_switch_to_frame- Switch to an iframebrowser_switch_to_parent_frame- Exit current iframebrowser_switch_to_default_content- Exit all iframes
§Advanced Actions
browser_drag_and_drop- Drag and drop elementsbrowser_right_click- Right-click (context menu)browser_focus- Focus on an elementbrowser_element_state- Check element statebrowser_press_key- Press keyboard keysbrowser_file_upload- Upload filesbrowser_print_to_pdf- Print page to PDF
§Requirements
A WebDriver server (like ChromeDriver, geckodriver, or Selenium) must be
running and accessible. By default, tools connect to http://localhost:4444.
§Starting ChromeDriver
# Install ChromeDriver (macOS)
brew install chromedriver
# Start ChromeDriver
chromedriver --port=4444§Using Docker
docker run -d -p 4444:4444 selenium/standalone-chrome§Architecture
Tools are implemented using the ADK Tool trait, allowing them to:
- Work with any LLM model (Gemini, OpenAI, Anthropic)
- Use callbacks for monitoring and control
- Access session state and artifacts
- Compose with other tools and agents
┌─────────────────────────────────────────────────┐
│ LlmAgent │
│ (with callbacks, session, artifacts, memory) │
└─────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ BrowserToolset │
│ NavigateTool, ClickTool, TypeTool, ... │
└─────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ BrowserSession │
│ (wraps thirtyfour WebDriver) │
└─────────────────────────────────────────────────┘
│
▼
WebDriver Server
(ChromeDriver, etc.)Re-exports§
pub use tools::AddCookieTool;pub use tools::AlertTool;pub use tools::BackTool;pub use tools::ClearTool;pub use tools::ClickTool;pub use tools::CloseWindowTool;pub use tools::DeleteAllCookiesTool;pub use tools::DeleteCookieTool;pub use tools::DoubleClickTool;pub use tools::DragAndDropTool;pub use tools::ElementStateTool;pub use tools::EvaluateJsTool;pub use tools::ExtractAttributeTool;pub use tools::ExtractLinksTool;pub use tools::ExtractTextTool;pub use tools::FileUploadTool;pub use tools::FocusTool;pub use tools::ForwardTool;pub use tools::GetCookieTool;pub use tools::GetCookiesTool;pub use tools::HoverTool;pub use tools::ListWindowsTool;pub use tools::MaximizeWindowTool;pub use tools::MinimizeWindowTool;pub use tools::NewTabTool;pub use tools::NewWindowTool;pub use tools::PageInfoTool;pub use tools::PageSourceTool;pub use tools::PressKeyTool;pub use tools::PrintToPdfTool;pub use tools::RefreshTool;pub use tools::RightClickTool;pub use tools::ScreenshotTool;pub use tools::ScrollTool;pub use tools::SelectTool;pub use tools::SetWindowSizeTool;pub use tools::SwitchToDefaultContentTool;pub use tools::SwitchToFrameTool;pub use tools::SwitchToParentFrameTool;pub use tools::SwitchWindowTool;pub use tools::TypeTool;pub use tools::WaitForElementTool;pub use tools::WaitForPageLoadTool;pub use tools::WaitForTextTool;pub use tools::WaitTool;
Modules§
Structs§
- Browser
Config - Configuration for browser sessions.
- Browser
Session - A browser session that wraps thirtyfour’s WebDriver.
- Browser
Toolset - A toolset that provides all browser automation tools.
- Element
State - State information about an element.
Enums§
- Browser
Type - Supported browser types.
Functions§
- minimal_
browser_ tools - Helper function to create a minimal browser toolset with only essential tools.
- readonly_
browser_ tools - Helper function to create a read-only browser toolset (no interaction).
- shared_
session - Create a shared browser session.