Reference
Tool and Response Assertions
Complete assertion guide with examples for tool checks, response checks, and semantic agent checks (also known as agent judge).
When To Use Tool vs Response Assertions
- Use tool assertions when behavior depends on action correctness (which tools were called, and in which order).
- Use response assertions when behavior depends on final answer quality or format.
- Use agent checks (agent judge) when the validation is semantic or fuzzy and strict string/regex checks would be too brittle.
- Use both together for high-confidence checks: action correctness plus answer correctness.
Tool Assertions
Tool assertions validate whether the agent used the right tools and, when needed, whether they were called in the right order.
These checks look at the observed tool-call list for a run. They do not inspect the final answer text.
- Use
required_toolswhen a tool call is mandatory for the scenario to pass. - Use
forbidden_toolswhen a tool call would be unsafe, irrelevant, or should not appear in a valid solution. - Use
tool_sequencewhen the relative order matters, but other tool calls may happen between the listed tools. - Repeated tools are matched from left to right, so
tool_sequence: [search, search, summarize]requires two separatesearchcalls beforesummarize.
eval:
tool_constraints:
required_tools: [lookup_account, verify_identity]
forbidden_tools: [delete_account]eval:
tool_sequence:
- lookup_account
- verify_identity
- process_refundResponse Assertions
Response assertions validate the final answer text.
Literal string checks are case-insensitive by default. Use regex only when you need pattern matching, and use JSONPath when the model must return structured JSON.
- Use
containsfor a phrase that should appear anywhere in the final response. - Use
not_containsfor words or phrases that must not appear. - Use
starts_with,ends_with, orequalswhen the response format is mostly fixed. - Use
regexfor variable text such as IDs, timestamps, alternations, or optional wording. - Use
jsonpath,jsonpath_exists, andjsonpath_not_existsonly when the final response is valid JSON.
eval:
response_assertions:
- type: contains
value: refund processedeval:
response_assertions:
- type: not_contains
value: internal erroreval:
response_assertions:
- type: starts_with
value: helloeval:
response_assertions:
- type: ends_with
value: thank youeval:
response_assertions:
- type: equals
value: successeval:
response_assertions:
- type: regex
pattern: "refund\s+(processed|completed)"eval:
response_assertions:
- type: jsonpath
path: $.status
- type: jsonpath
path: $.status
equals: successeval:
response_assertions:
- type: jsonpath_exists
path: $.data.ideval:
response_assertions:
- type: jsonpath_not_exists
path: $.errorAgent Checks (Agent Judge)
Agent checks (also called agent judge) use a workspace-configured judge model to evaluate the final answer against a short freeform instruction. They are useful for semantic validation such as “does this answer include a valid time range?” when deterministic string checks are too rigid.
By default the judge receives only the final answer. Use the optional agent_context block to send additional context — the original scenario prompt and/or the list of called tool names — shared once across all checks in the scenario.
eval:
agent_assertions:
- label: logical_time_range
prompt: Confirm the final answer includes an earliest and latest timestamp, and that both values are present and logically ordered.eval:
agent_context:
include_prompt: true # sends scenario prompt to the Agent Judge so that it can be included in the evaluation
include_tool_sequence: true # sends called tool names as Agent Judge so that it can be included in the evaluation
agent_assertions:
- label: addresses_question
prompt: Confirm the answer directly addresses the original question.
- label: logical_time_range
prompt: Confirm the final answer includes an earliest and latest timestamp, and that both values are present and logically ordered.Behavior Notes and Edge Cases
- required_tools passes when each listed tool appears at least once in the run, in any order.
- forbidden_tools fails when any listed tool appears in the run.
- tool_sequence is an ordered subsequence check: listed tools must appear in order, but other tools may appear between them.
- contains/not_contains/starts_with/ends_with/equals are literal, case-insensitive string checks.
- regex is case-insensitive by default and uses JavaScript regular expressions.
- jsonpath/jsonpath_exists/jsonpath_not_exists require valid JSON in the final response.
- If final response is not valid JSON, JSONPath assertions fail with an invalid JSON error.
- Agent checks run in a single batched judge request per scenario run and require a default evaluation judge to be configured in workspace settings.
- agent_context is optional and applies to all agent_assertions in the scenario. Omit it to evaluate the final answer only (default).
- include_prompt sends the scenario prompt as context.scenario_prompt. include_tool_sequence sends the called tool names as context.tool_sequence.
- agent_context fields default to false; omitting the block entirely is equivalent to both fields being false.
- Agent checks are more flexible, but they are also less reproducible and more expensive than deterministic checks.