Tuning and Beyond QA
The Waterwheel agent is tuned out of the box for web testing. However, individual situations may require further tuning. More broadly, the agent is a general-purpose web browser agent capable of tasks beyond QA. Whatever your goal, we expose the necessary tools to tailor its behaviour and efficiency.
Prompt Tuning
Prompt Types
The agent uses three types of prompts to fulfil its objectives: the system prompt, the task prompt, and the compressed task prompt.
System Prompt
The system prompt contains up to three pieces of information:
- Default system prompt (required): content in
/agent/config/system.prompt.md. Defines the agent's role and core behaviour. - Global variables (optional): read from
/agent/instructions/global-context.jsonby default. A JSON file containing values shared across all tasks in a run. - Extra instructions (optional): read from
/agent/instructions/extra-instructions.mdby default. Provides any additional instructions needed for a specific run.
Task Prompt
A task prompt describes the objective and steps of a single task. Task prompts are stored as individual Markdown files under /agent/tasks. Refer to the Manage Test Tasks section for file format requirements.
Compressed Task Prompt
A compressed task prompt is generated automatically and replaces the task prompt whenever a compression occurs. It contains a summary of all completed verifications and known context values up to that point. See Token Efficiency Recommendations for details on how compression is triggered.
Prompt Analysis
By default, the agent prints the prompts used for all tasks to /agent/outputs/agent.log after each run. The following environment variables expose additional detail for prompt tuning:
| Variable | Description |
|---|---|
COMPRESSION_DEBUG | When true, logs a compact compression summary for every compression lifecycle to agent.log |
COMPRESSION_DEBUG_INCLUDE_REQUEST | When COMPRESSION_DEBUG is true, also logs the full compressed prompt to agent.log |
ENABLE_API_LOGGING | When true, writes /agent/outputs/api-log.json detailing every API call made during the run |
Tuning Recommendation
The simplest way to tune prompts is to feed the following files into an AI application such as Claude and ask it for suggestions:
/agent/outputs/agent.log/agent/outputs/api-log.json/agent/outputs/test-results.json- Individual task files from
/agent/tasks - Individual test log files from
/agent/outputs(files ending with_log.json)
Repurpose the Agent
Update MCPs
The agent discovers available MCPs by reading /agent/config/mcp-config.json. New MCPs can be added either by installing them directly on the container or by using our image as a base image. Once installed, update mcp-config.json to expose them to the agent.
For example, you can replace the preinstalled Email MCP with a Gmail MCP by updating the mcpServers.email value in mcp-config.json.
After adding new MCPs, update /agent/config/system.prompt.md to provide explicit instructions on how the agent should use them.
Configure MCP Permissions
The agent runs as agentuser. Configuring any new MCP to be accessible to agentuser is the path of least resistance.
Update the Execution Pattern
The preconfigured run-qa command starts the agent once the two preinstalled MCPs are ready. After configuring new MCPs, you may need to update the agent start logic. The agent is a Node.js program and can be started directly with:
cd /agent && node dist/index.cjs
Built-in Tools
The following built-in tools can be used directly through task prompts.
Context Manager
context-manager handles variables shared across individual tasks. If your tasks are split across multiple files and need to share data, context-manager is a useful alternative to global context — particularly for preserving values generated by the LLM at runtime.
Available actions: get, set, delete, has, keys.
Example usage in a task prompt:
"Generate a username, then use
context-manager setto save it asusername.""Read
usernamefromcontext-managerand fill in the registration form."
Complete Verification
complete_verification is the tool the agent uses as the trigger point for message compression. If you want to control when compression occurs, specify the conditions under which complete_verification should be called in your task prompt.
complete_verification requires a purpose property. For compression to trigger correctly, purpose must be set to verification.