MCPLab

Reference

Tool and Response Assertions

Complete assertion guide with examples for tool checks, response checks, and semantic agent checks (also known as agent judge).

When To Use Tool vs Response Assertions

  • Use tool assertions when behavior depends on action correctness (which tools were called, and in which order).
  • Use response assertions when behavior depends on final answer quality or format.
  • Use agent checks (agent judge) when the validation is semantic or fuzzy and strict string/regex checks would be too brittle.
  • Use both together for high-confidence checks: action correctness plus answer correctness.

Tool Assertions

Tool assertions validate whether the agent used the right tools and, when needed, whether they were called in the right order.

These checks look at the observed tool-call list for a run. They do not inspect the final answer text.

  • Use required_tools when a tool call is mandatory for the scenario to pass.
  • Use forbidden_tools when a tool call would be unsafe, irrelevant, or should not appear in a valid solution.
  • Use tool_sequence when the relative order matters, but other tool calls may happen between the listed tools.
  • Repeated tools are matched from left to right, so tool_sequence: [search, search, summarize] requires two separate search calls before summarize.
required and forbidden tools
eval:
  tool_constraints:
    required_tools: [lookup_account, verify_identity]
    forbidden_tools: [delete_account]
ordered tool sequence
eval:
  tool_sequence:
    - lookup_account
    - verify_identity
    - process_refund

Response Assertions

Response assertions validate the final answer text.

Literal string checks are case-insensitive by default. Use regex only when you need pattern matching, and use JSONPath when the model must return structured JSON.

  • Use contains for a phrase that should appear anywhere in the final response.
  • Use not_contains for words or phrases that must not appear.
  • Use starts_with, ends_with, or equals when the response format is mostly fixed.
  • Use regex for variable text such as IDs, timestamps, alternations, or optional wording.
  • Use jsonpath, jsonpath_exists, and jsonpath_not_exists only when the final response is valid JSON.
contains
eval:
  response_assertions:
    - type: contains
      value: refund processed
not_contains
eval:
  response_assertions:
    - type: not_contains
      value: internal error
starts_with
eval:
  response_assertions:
    - type: starts_with
      value: hello
ends_with
eval:
  response_assertions:
    - type: ends_with
      value: thank you
equals
eval:
  response_assertions:
    - type: equals
      value: success
regex
eval:
  response_assertions:
    - type: regex
      pattern: "refund\s+(processed|completed)"
jsonpath (exists or equals)
eval:
  response_assertions:
    - type: jsonpath
      path: $.status
    - type: jsonpath
      path: $.status
      equals: success
jsonpath_exists
eval:
  response_assertions:
    - type: jsonpath_exists
      path: $.data.id
jsonpath_not_exists
eval:
  response_assertions:
    - type: jsonpath_not_exists
      path: $.error

Agent Checks (Agent Judge)

Agent checks (also called agent judge) use a workspace-configured judge model to evaluate the final answer against a short freeform instruction. They are useful for semantic validation such as “does this answer include a valid time range?” when deterministic string checks are too rigid.

By default the judge receives only the final answer. Use the optional agent_context block to send additional context — the original scenario prompt and/or the list of called tool names — shared once across all checks in the scenario.

agent_assertions
eval:
  agent_assertions:
    - label: logical_time_range
      prompt: Confirm the final answer includes an earliest and latest timestamp, and that both values are present and logically ordered.
agent_assertions with agent_context
eval:
  agent_context:
    include_prompt: true          # sends scenario prompt to the Agent Judge so that it can be included in the evaluation
    include_tool_sequence: true   # sends called tool names as Agent Judge so that it can be included in the evaluation

  agent_assertions:
    - label: addresses_question
      prompt: Confirm the answer directly addresses the original question.
    - label: logical_time_range
      prompt: Confirm the final answer includes an earliest and latest timestamp, and that both values are present and logically ordered.

Behavior Notes and Edge Cases

  • required_tools passes when each listed tool appears at least once in the run, in any order.
  • forbidden_tools fails when any listed tool appears in the run.
  • tool_sequence is an ordered subsequence check: listed tools must appear in order, but other tools may appear between them.
  • contains/not_contains/starts_with/ends_with/equals are literal, case-insensitive string checks.
  • regex is case-insensitive by default and uses JavaScript regular expressions.
  • jsonpath/jsonpath_exists/jsonpath_not_exists require valid JSON in the final response.
  • If final response is not valid JSON, JSONPath assertions fail with an invalid JSON error.
  • Agent checks run in a single batched judge request per scenario run and require a default evaluation judge to be configured in workspace settings.
  • agent_context is optional and applies to all agent_assertions in the scenario. Omit it to evaluate the final answer only (default).
  • include_prompt sends the scenario prompt as context.scenario_prompt. include_tool_sequence sends the called tool names as context.tool_sequence.
  • agent_context fields default to false; omitting the block entirely is equivalent to both fields being false.
  • Agent checks are more flexible, but they are also less reproducible and more expensive than deterministic checks.