Evaluators are quality assurance tools that determine how well your AI Browser Agent performed during a run. They use LLM-based scoring to assess agent executions against custom criteria, providing objective performance metrics.

Judge Your Runs

Assess and score how well your agent performed

LLM Configuration

Define custom prompts for performance evaluation

Score Configuration

Set up scoring methods and criteria

Agent Binding

Automatically bind evaluators to specific agents

What are Evaluators?

Evaluators analyze AI Browser Agent executions to provide objective performance assessments. They examine the agent’s actions, decisions, and outcomes to generate scores based on your specific criteria.

Each evaluator uses a custom LLM prompt to assess different aspects of agent performance, such as task completion accuracy, efficiency, or adherence to best practices.

Creating an Evaluator

Basic Information

LLM Configuration

Configure how the Large Language Model will evaluate agent performance.

Score Configuration

Define how the evaluator should score the results.

Best Practices

Examples