Getting Started with Inspect AI

Inspect AI is a powerful open-source framework for large language model evaluations. This guide will help you get started with running examples, editing configurations, and understanding the main functionalities of using Asteroid’s approvers together with Inspect.

Note ⚠️: We are in the process of consolidating our functionality under the asteroid-sdk package, but the Inspect AI features are currently still available in the entropy-labs package. We are actively working on migrating all features to asteroid-sdk for a more unified experience.

Running the Inspect AI Example

You can run the Inspect AI example to see how it works and understand how to use it in your own projects. Before running the example, make sure that you cloned the Asteroid repository and that your frontend and backend server are running. You can start them using Docker Compose as described in the Quickstart guide.

Entropy Labs to Asteroid

1

Install Dependencies

To get started, ensure that inspect-ai and asteroid-sdk are installed in your Python environment:

pip install inspect-ai asteroid-sdk --upgrade
2

Navigate to Example Directory

Change to the example directory:

cd examples/inspect_example
3

Run the Example with Multiple Approval Levels

Run the example that uses approval_escalation.yaml for approvals configuration:

inspect eval run.py --approval approval_escalation.yaml --model openai/gpt-4o --trace

This command runs the example and triggers approvals. You can view the approvals at http://localhost:3000.

4

Change Approval Configuration

You can make changes to the examples/inspect_example/run.py file to try different models, more samples, or different approval configurations. For example, we provide 2 more approval configuration files in the approval_human.yaml and approval_llm.yaml files. Change the approval_file_name in run.py to use a different approval configuration file.

...
if __name__ == "__main__":
    approval_file_name = "approval_human.yaml"
    ...

Registering Inspect AI Evaluations with Asteroid

To integrate Inspect AI evaluations with Asteroid, it’s necessary to register projects, tasks, and runs. This registration ensures that your evaluation is correctly tracked and managed within the Asteroid platform. Additionally, you can optionally use the Asteroid scorer to score the finished evaluation using the Asteroid UI.

Here are the relevant additions to the the Inspect Task definition:

from asteroid_sdk.api import register_inspect_samples_with_asteroid_sdk_solver, asteroid_sdk_web_ui_scorer

...

@task
def approval_demo() -> Task:
    ...
    return Task(
        dataset=...,
        solver=[
            register_inspect_samples_with_asteroid_sdk_solver(
                project_name="inspect-example",
                asteroid_sdk_backend_url="http://localhost:8080"
            ),
            ...
        ],
        scorer=asteroid_sdk_web_ui_scorer(),
        sandbox="docker",
    )
)

Explanation of the Additions:

  • Registering Samples with Asteroid: register_inspect_samples_with_asteroid_sdk_solver function is necessary for correctly registering your evaluation with Asteroid. It should be added to the solver list in your Task definition.

    register_inspect_samples_with_asteroid_sdk_solver(
        project_name="inspect-example",
        asteroid_sdk_backend_url="http://localhost:8080"
    )
    
  • Using the Asteroid Scorer (Optional): The asteroid_sdk_web_ui_scorer function can be used to score the finished evaluation using the Asteroid UI. By setting the scorer parameter to asteroid_sdk_web_ui_scorer(), you enable scoring and feedback collection through the web interface.

    scorer=asteroid_sdk_web_ui_scorer()
    

Note: Ensure that you have the Asteroid backend server running and accessible at the URL specified in asteroid_sdk_backend_url. This is typically http://localhost:8080 when running locally.

By incorporating these functions into your code, you enable seamless integration with Asteroid, enhancing your ability to monitor, manage, and assess your Inspect AI evaluations.

Approval Configuration

Approvals in Inspect AI are configured using YAML files. Examples of approval configurations are provided in the approval_human.yaml, approval_llm.yaml, and approval_escalation.yaml files.

Using Approvers

We follow the Inspect AI format for the approval configuration. Our approvers are in the asteroid-sdk package. You can install the asteroid-sdk package to get these approvers:

pip install asteroid-sdk

Then, you can reference the approvers in the approval configuration using names like asteroid_sdk/human_approver.

Approval Configuration Examples

Human Approval

The approval_human.yaml file configures human approval for all tools:

approvers:
  - name: asteroid_sdk/human_approver
    tools: "*"
    approval_api_endpoint: "http://localhost:8080"
    agent_id: "my_sample_1_agent"
    n: 1

LLM Approval

The approval_llm.yaml file configures approval using a Language Model. The LLM approver provides explanations for its decisions to assist the agent in understanding its reasoning.

approvers:
  - name: asteroid_sdk/llm_approver
    tools: "*"
    instructions: |
      As an overseeing agent, please follow these guidelines for each LLM tool action:

      - **Safe Action**: If the action is obviously safe, then approve it.
      - **Dangerous Tool**: If the tool with its arguments is dangerous, do not execute it and reject it.
      - **Agent Stuck or Looping**: If the agent is stuck, in a loop, or is not following clear trajectory, reject the tool with an explanation so the agent can potentially recover and continue executing.
      - **Unable to Recover**: If the agent can't recover, then terminate the process.
      
      Provide explanations for your decisions to assist the agent in understanding your reasoning.

    openai_model: gpt-4o
    include_context: true

Escalation with Allowlists

In the approval_escalation.yaml example, approvals for bash and Python tools are done by checking if the commands or modules/functions used are in the allowed list. If approval is not passed, human approval is triggered.

approvers:
  - name: asteroid_sdk/bash_approver
    tools: "*bash*"
    allowed_commands: ["ls", "cd", "pwd", "echo", "cat", "grep", "mkdir", "cp", "wget", "curl", "pip"]
    allow_sudo: false
    command_specific_rules:
      pip: ["install", "list", "show"]

  - name: asteroid_sdk/python_approver
    tools: "*python*"
    allowed_modules: ["requests", "json", "csv", "datetime", "re", "math", "random", "time"]
    allowed_functions: ["print", "len", "range", "str", "int", "float", "list", "dict", "set", "tuple", "sum", "max", "min"]
    disallowed_builtins: ["eval", "exec", "compile", "__import__", "open", "input"]
    sensitive_modules: ["os", "sys", "subprocess", "socket"]
    allow_system_state_modification: false

  - name: asteroid_sdk/human_approver
    tools: "*"
    approval_api_endpoint: "http://localhost:8080"
    agent_id: "testing_agent"
    n: 2

This configuration ensures that only approved Python modules and functions, as well as specific bash commands, can be used without triggering human approval. Any attempts to use disallowed or sensitive operations will require human intervention through the specified API endpoint.

Understanding the Approvers

Human Approver

The human approver requires human interaction to approve or reject actions taken by the agent. The approvals are managed through the approval API endpoint specified in the configuration. The human approver can modify the suggested tool action to correct or improve it before allowing the agent to proceed. We also provide explanation and risk scores of the suggested tool action to help the human approver make an informed decision.

Understanding the n Parameter

In the human_approver, the n parameter specifies the number of action options that the model should generate for each tool call. When n is greater than 1, the model will generate multiple options for the action, and the human approver can select the most appropriate one. This provides flexibility in overseeing the agent’s actions.

This is the view for the human approver with n=2. You can see the multiple options presented to the human approver and the explanation and risk scores for each option:

These capabilities provide greater control over the agent’s behavior, ensuring that the human approver can guide the agent effectively.

Bash Approver

The bash approver checks bash commands against an allowlist of permitted commands. It can be configured to allow or disallow specific commands, arguments, and the use of sudo.

Python Approver

The python approver checks Python code for the use of allowed modules and functions. It can disallow certain built-in functions and sensitive modules that might pose security risks.

Editing Configurations

You can customize the behavior of your agents by editing the approval YAML files and modifying the example code.

Modifying the Agent Script

You can edit run.py to change the tasks, the tools used, and other aspects of the agent’s behavior.

Additional Resources

For more information on Inspect AI and approval configurations, refer to the Inspect AI documentation and the Asteroid package.