The AI Browser Agent node executes browser automation tasks through natural language instructions. It can navigate web pages, interact with DOM elements, fill forms, and extract data.

Navigation

Navigate to URLs and traverse page structures

Form Interaction

Fill input fields, select dropdowns, and submit forms

Data Extraction

Extract text, attributes, and structured data from elements

Vision Support

Process visual page elements when enabled

Configuration Options

Basic Settings

Prompt Configuration

The prompt field contains the task instructions for the browser agent. Include:

  • Task Objective: Specify the primary action (navigate, extract, fill form, click element)
  • Target Elements: Describe specific elements, selectors, or page sections to interact with
  • Transitions: Specify when and where should the node transition to

You can add variables to the AI Browser Node by entering text inbetween two curly braces and a dot before the variable name {{.USER_INFO}}

Example:

Navigate to /form, fill the form with {{.USER_INFO}} and submit the form.
When you have saved the answers, transition to the "Book a Meeting" node.

Agent Tools

Base Browser Interaction

Default: Enabled - functions:

  • URL navigation
  • Element selection and clicking
  • Form field population
  • Text and attribute extraction
  • Page screenshot capture

Vision Capabilities

Optional: Visual processing capabilities:

  • Image-based element recognition
  • Layout understanding
  • Visual CAPTCHA handling
  • Screenshot analysis

Vision capabilities increase resource usage and execution time. Enable only when visual analysis is required.

Additional Tools

Custom tools such as Send User Message, Scratchpad or Upload/Download Files can be added to extend browser agent functionality. Learn more here:

Use Cases