Skip to main content

Tuning and Beyond QA

The Waterwheel agent is tuned out of the box for web testing. However, individual situations may require further tuning. More broadly, the agent is a general-purpose web browser agent capable of tasks beyond QA. Whatever your goal, we expose the necessary tools to tailor its behaviour and efficiency.

Prompt Tuning

Prompt Types

The agent uses three types of prompts to fulfil its objectives: the system prompt, the task prompt, and the compressed task prompt.

System Prompt

The system prompt contains up to three pieces of information:

  • Default system prompt (required): content in /agent/config/system.prompt.md. Defines the agent's role and core behaviour.
  • Global variables (optional): read from /agent/instructions/global-context.json by default. A JSON file containing values shared across all tasks in a run.
  • Extra instructions (optional): read from /agent/instructions/extra-instructions.md by default. Provides any additional instructions needed for a specific run.

Task Prompt

A task prompt describes the objective and steps of a single task. Task prompts are stored as individual Markdown files under /agent/tasks. Refer to the Manage Test Tasks section for file format requirements.

Compressed Task Prompt

A compressed task prompt is generated automatically and replaces the task prompt whenever a compression occurs. It contains a summary of all completed verifications and known context values up to that point. See Token Efficiency Recommendations for details on how compression is triggered.

Prompt Analysis

By default, the agent prints the prompts used for all tasks to /agent/outputs/agent.log after each run. The following environment variables expose additional detail for prompt tuning:

VariableDescription
COMPRESSION_DEBUGWhen true, logs a compact compression summary for every compression lifecycle to agent.log
COMPRESSION_DEBUG_INCLUDE_REQUESTWhen COMPRESSION_DEBUG is true, also logs the full compressed prompt to agent.log
ENABLE_API_LOGGINGWhen true, writes /agent/outputs/api-log.json detailing every API call made during the run

Tuning Recommendation

The simplest way to tune prompts is to feed the following files into an AI application such as Claude and ask it for suggestions:

  • /agent/outputs/agent.log
  • /agent/outputs/api-log.json
  • /agent/outputs/test-results.json
  • Individual task files from /agent/tasks
  • Individual test log files from /agent/outputs (files ending with _log.json)

Repurpose the Agent

Update MCPs

The agent discovers available MCPs by reading /agent/config/mcp-config.json. New MCPs can be added either by installing them directly on the container or by using our image as a base image. Once installed, update mcp-config.json to expose them to the agent.

For example, you can replace the preinstalled Email MCP with a Gmail MCP by updating the mcpServers.email value in mcp-config.json.

After adding new MCPs, update /agent/config/system.prompt.md to provide explicit instructions on how the agent should use them.

Configure MCP Permissions

The agent runs as agentuser. Configuring any new MCP to be accessible to agentuser is the path of least resistance.

Update the Execution Pattern

The preconfigured run-qa command starts the agent once the two preinstalled MCPs are ready. After configuring new MCPs, you may need to update the agent start logic. The agent is a Node.js program and can be started directly with:

Start agent directly
cd /agent && node dist/index.cjs

Built-in Tools

The following built-in tools can be used directly through task prompts.

Context Manager

context-manager handles variables shared across individual tasks. If your tasks are split across multiple files and need to share data, context-manager is a useful alternative to global context — particularly for preserving values generated by the LLM at runtime.

Available actions: get, set, delete, has, keys.

Example usage in a task prompt:

"Generate a username, then use context-manager set to save it as username."

"Read username from context-manager and fill in the registration form."

Complete Verification

complete_verification is the tool the agent uses as the trigger point for message compression. If you want to control when compression occurs, specify the conditions under which complete_verification should be called in your task prompt.

complete_verification requires a purpose property. For compression to trigger correctly, purpose must be set to verification.