Skip to content

Targets Configuration

Targets define which agent or LLM provider to evaluate. AgentV uses one composable config graph across project manifests and eval files:

  • .agentv/config.yaml is the project-local discovery and composition root. It can hold targets, graders, tests, defaults, execution policy, results settings, and repo-local project policy.
  • $AGENTV_HOME/config.yaml is the user/operator config. Use it for defaults that apply across projects, project registry data, default result locations, and provider defaults that should not be copied into each repo.
  • eval.yaml is a focused, shareable slice of the same graph. Use it for a suite-specific target, grader, tests, evaluator settings, or run controls that should travel with the eval.

Any supported top-level field can stay inline or become a direct field reference such as targets: file://targets.yaml. Both forms normalize to the same config graph.

targets:
- id: local-openai
provider: openai
runtime: host
config:
api_format: chat
base_url: ${{ LOCAL_OPENAI_PROXY_BASE_URL }}
api_key: ${{ LOCAL_OPENAI_PROXY_API_KEY }}
model: ${{ LOCAL_OPENAI_PROXY_MODEL }}
- id: codex-local
provider: codex-app-server
runtime: host
config:
command: ["codex", "app-server"]
model: gpt-5-codex
graders:
- id: openai-grader
provider: openai
config:
model: gpt-5-mini
defaults:
target: codex-local
grader: openai-grader

Use id for the stable AgentV target identity. provider selects the adapter or control boundary. runtime describes where the provider runs; use host as the shorthand for the current machine, or object form when you need mode: host | profile | sandbox plus runtime-specific settings. Provider settings belong under config. Process-backed coding-agent providers use config.command as a non-empty argv array.

Use runtime: host when you want AgentV to run the target exactly as it is installed on the current machine. This is the best fit for local research, subscription-auth workflows, and evaluating the same CLI profile an engineer uses manually.

Use runtime.mode: profile when the target still runs as a host process but should use an isolated home/config directory, such as a dedicated CODEX_HOME or HOME.

Use runtime.mode: sandbox when the target should run inside a separate execution substrate. The built-in sandbox runner currently supports Docker for provider: cli; provider-specific coding-agent adapters such as codex-cli, claude-cli, copilot-cli, and pi-cli return a structured unsupported target error until their transcript parsers are wired through sandbox-aware runners.

targets:
- id: codex-sandbox
provider: codex-cli
runtime:
mode: sandbox
engine: docker
image: ghcr.io/acme/codex-agent:sha256
workdir: /workspace
network: none
mounts:
- source: ./workspace
target: /workspace
access: rw
- source: ./.agentv/results
target: /results
access: rw
env:
AGENTV_RESULT_DIR: /results
secrets:
OPENAI_API_KEY: ${{ OPENAI_API_KEY }}
config:
command: ["codex", "exec", "--json"]
timeout_seconds: 300

Sandbox mode does not inherit host credentials by default. Mount only the workspace, results, cache, or credential paths the target needs, and pass only the environment variables and secrets listed under runtime.env and runtime.secrets. Install the target CLI by using an image that already contains it or by adding explicit setup under runtime.setup; locate the CLI with config.command.

For CI, API-key or explicitly injected secret auth is the most reproducible path. Subscription OAuth can work in a sandbox only when you intentionally mount or seed the relevant profile directory into the sandbox. That makes the run less portable than API-key CI and should be reserved for workflows where matching a local subscription profile is the point of the evaluation.

Inline and decomposed forms are equivalent. This single-file config:

targets:
- id: codex-local
provider: codex-app-server
runtime: host
config:
command: ["codex", "app-server"]
model: gpt-5-codex
graders:
- id: openai-grader
provider: openai
config:
model: gpt-5-mini
tests:
- id: smoke
input: Fix the failing test.
defaults:
target: codex-local
grader: openai-grader

can be decomposed like this:

targets: file://targets.yaml
graders: file://graders.yaml
tests: file://tests.yaml
defaults: file://defaults.yaml

Referenced field files contain the field value directly. targets.yaml contains a bare array, not an object wrapped in targets::

.agentv/targets.yaml
- id: codex-local
provider: codex-app-server
runtime: host
config:
command: ["codex", "app-server"]
model: gpt-5-codex
.agentv/defaults.yaml
target: codex-local
grader: openai-grader

File refs are optional. Use them when a config field is large, reused, or owned by a separate team; keep fields inline when that is easier to read.

Use ${{ VARIABLE_NAME }} syntax to reference values from your environment. AgentV reads exported process environment variables directly, and it also loads .env files from the eval directory hierarchy when present:

targets:
- id: my-target
provider: anthropic
runtime: host
config:
api_key: ${{ ANTHROPIC_API_KEY }}
model: ${{ ANTHROPIC_MODEL }}

This keeps secrets out of version-controlled files and avoids requiring a CI step that rewrites already-exported secrets into .env.

ProviderTypeDescription
azureLLMAzure OpenAI
anthropicLLMAnthropic Claude API
geminiLLMGoogle Gemini
claude-cliAgentClaude CLI subprocess
claude-sdkAgentClaude Agent SDK in an isolated child runner
codex-cliAgentCodex CLI subprocess
codex-app-serverAgentCodex app-server subprocess
codex-sdkAgentCodex SDK in an isolated child runner
copilot-cliAgentCopilot CLI subprocess
copilot-logAgentPassive Copilot CLI session log reader
copilot-sdkAgentCopilot SDK in an isolated child runner
pi-sdkAgentPi SDK in an isolated child runner
pi-cliAgentPi CLI subprocess
pi-rpcAgentPi RPC subprocess over stdio
vscodeAgentVS Code with Copilot
vscode-insidersAgentVS Code Insiders
cliAgentAny CLI command — see CLI Provider
mockTestingExplicit mock target for examples and tests

Select the system under test with defaults.target, top-level target, or CLI --target, depending on the command flow. Test cases do not choose targets; split target-specific cases into separate eval suites, select them with tags/filters, or run the same eval with different --target values.

target: local-openai
tests:
- id: test-1
- id: test-2

The string is a configured target id. Use object form when an eval needs a local target variant:

target:
id: codex-high-reasoning
provider: codex-app-server
runtime: host
config:
command: ["codex", "app-server"]
model: gpt-5-codex
reasoning_effort: high

Use defaults.grader for the project default grader. A specific evaluator can still choose its own grader target when the evaluator supports that override.

Run non-provisioning setup at Promptfoo-compatible lifecycle points using top-level extensions. The harness materializes workspace.template and workspace.repos first, then runs beforeAll extensions. Use extensions for dependency installs, builds, fixture generation, and agent-rule staging. Use target hooks for runner-specific setup. Keep repo identity and checkout pins in workspace.repos; extensions must not become the default repo acquisition path.

extensions:
- file://scripts/workspace.mjs:beforeAll
- file://scripts/workspace.mjs:beforeEach
- file://scripts/workspace.mjs:afterEach
- file://scripts/workspace.mjs:afterAll
- id: agentv:agent-rules
hook: beforeAll
skills: agent-rules/skills
rules: agent-rules/AGENTS.md
workspace:
template: ./workspace-templates/my-project
hooks:
after_each:
reset: fast
FieldDescription
templateDirectory to copy as workspace
extensions[]file://...:beforeAll, beforeEach, afterEach, afterAll, or agentv:agent-rules
hooks.after_each.resetReset mode: none, fast, strict

Lifecycle order: template copy → repo materialization → extensions.beforeAll → target hooks.before_all → git baseline → (extensions.beforeEach → target hooks.before_each → agent runs → file changes captured → target hooks.after_eachextensions.afterEachworkspace.hooks.after_each.reset) × N tests → target hooks.after_allextensions.afterAll → cleanup

Shared workspace: The workspace is created once and shared across all tests in a suite. Use hooks.after_each.reset to reset state between tests (e.g., fast/strict).

Error handling:

  • beforeAll / beforeEach extension failure aborts the affected run with an error result
  • afterAll / afterEach extension failure is non-fatal

File hook context: Exported functions receive a JSON-compatible object with case context:

{
"workspace_path": "/home/user/.agentv/workspaces/run-123/case-01",
"test_id": "case-01",
"eval_run_id": "run-123",
"case_input": "Fix the bug",
"case_metadata": { "repo": "sympy/sympy", "source_commit": "abc123" }
}

workspace.hooks remains the reset-policy home for after_each.reset. Legacy command hooks still parse for existing local suites, but new portable evals should use extensions for executable setup.

Materialize git repositories into the shared eval workspace. Repo entries declare provenance only: the repository identity and checkout pin. AgentV resolves acquisition separately using registered projects, configured mirrors, its git cache, and finally remote clone. Define repos at the suite level or per test:

workspace:
repos:
- path: ./my-repo
repo: https://github.com/org/repo.git
commit: main
ancestor: 1 # check out the parent commit
hooks:
after_each:
reset: fast # none | fast | strict
scope: suite # suite (default) | attempt

repo declares the repository identity. Acquisition is harness-owned: AgentV first applies configured repo_resolvers, then uses the built-in git path of registered projects, configured mirrors, AgentV’s git cache, and remote clone. See Workspace Architecture for the resolver order, command resolver protocol, and git_cache.mirrors config.

FieldDescription
repos[].pathDirectory within the workspace to clone into
repos[].repoRepository identity: full clone URL or GitHub org/name shorthand
repos[].commitBranch, tag, or SHA to check out (default: HEAD)
repos[].ancestorWalk N commits back from the checked-out ref (e.g., 1 for parent)
repos[].sparseSparse checkout paths
hooks.after_each.resetReset policy after each test: none, fast, strict
scopesuite reuses one harness-managed workspace for the suite; attempt creates a clean workspace for each resolved execution attempt
hooks.enabledBoolean (default: true). Set false to skip all lifecycle hooks.

Use scope: attempt when mutating agents need clean filesystem state for every prompt-target-test-repeat execution. Use scope: suite when the suite intentionally shares state across tests.

Existing local workspaces: do not commit local paths in eval YAML. Use --workspace-path /path/to/workspace for a one-off run, or put execution.workspace_path in .agentv/config.local.yaml.

Workspace command:

  • agentv workspace deps <eval-paths> — scan eval files and output a JSON manifest of required git repos (for CI pre-cloning)

Common patterns:

# Pinned commit
workspace:
repos:
- path: ./repo
repo: https://github.com/org/repo.git
commit: abc123def
# Multi-repo shared workspace with reset
workspace:
repos:
- path: ./frontend
repo: https://github.com/org/frontend.git
- path: ./backend
repo: https://github.com/org/backend.git
hooks:
after_each:
reset: fast
# GitHub shorthand with a pinned commit
workspace:
repos:
- path: ./repo
repo: org/repo
commit: abc123def

Default finish behavior:

  • Success: cleanup
  • Failure: keep

CLI overrides:

  • --retain-on-success keep|cleanup
  • --retain-on-failure keep|cleanup

Use cwd on a target to run in an existing directory (shared across tests). If not set, the eval file’s directory is used as the working directory.

Eval files can define per-target hooks that run setup/teardown scripts to customize the workspace for each target variant. This enables comparing different harness configurations (e.g., baseline vs with-plugins) in a single eval file.

Targets do not declare repos. Repositories belong to the shared eval workspace so every target runs in the same world; target hooks customize the harness under evaluation. Use hooks for per-target setup such as enabling wrappers or changing provider-local config. Keep installs, builds, fixture generation, and case setup in top-level lifecycle extensions.

Target hooks can be scoped to an eval-local target object:

target:
extends: default
hooks:
before_each:
command: ["setup-plugins.sh", "skills"]

Target hooks run after workspace hooks on setup, before workspace hooks on teardown:

  1. Extension beforeAll
  2. Target before_all
  3. For each test:
    • Workspace before_each
    • Target before_each
    • Test executes
    • Target after_each
    • Workspace after_each
  4. Target after_all
  5. Workspace after_all

Target hooks follow the same schema as workspace hooks:

hooks:
before_all:
command: ["setup.sh"] # Command array or shell string
timeout_ms: 60000 # Optional timeout
cwd: "./scripts" # Optional working directory
before_each:
command: "echo setup" # String shorthand (runs via sh -c)
after_each:
command: ["cleanup.sh"]
after_all:
command: ["teardown.sh"]