Skip to content

Coding Agents

Coding agent targets evaluate assistants that run against a real workspace. They use the target shape id, provider, runtime, and config. The target id is AgentV’s stable selection and artifact identity. provider names the adapter or control boundary. runtime names where the target runs. Provider settings live under config.

Use defaults.grader, CLI --grader / --grader-target, or an evaluator-specific target override for LLM-based grading. Grader selection is separate from the coding-agent target, so target definitions do not carry a grader field.

targets:
- id: codex-local
provider: codex-app-server
runtime: host
config:
command: ["codex", "app-server"]
model: gpt-5-codex
reasoning_effort: high
graders:
- id: openai-grader
provider: openai
config:
model: gpt-5-mini
defaults:
target: codex-local
grader: openai-grader
execution:
max_concurrency: 3

Process-backed coding-agent providers use config.command as a non-empty argv array. The first element is the executable or shim, and the remaining elements are arguments.

RuntimeBoundaryBest fit
hostRuns the installed CLI or child runner on the current machine.Local research, subscription OAuth, and evaluating the same profile an engineer uses manually.
profileRuns a host process with isolated home/config/env such as HOME, CODEX_HOME, or temp dirs.Cleaner local evals without container cost.
sandboxRuns through a separate substrate such as Docker or a managed sandbox.CI, reproducibility, untrusted tasks, and stronger filesystem containment.

Sandbox mode does not inherit host credentials. For CI, prefer API keys or explicit secrets injected through sandbox configuration. Subscription OAuth can be evaluated only by intentionally mounting or seeding the profile directory the agent needs, which trades portability for fidelity to the local agent setup.

targets:
- id: codex-clean-profile
provider: codex-cli
runtime:
mode: profile
codex_home: .agentv/profiles/codex-clean
tmp_dir: .agentv/tmp/codex-clean
config:
command: ["codex", "exec", "--json"]
model: gpt-5-codex
sandbox_mode: workspace-write
approval_policy: never
targets:
- id: codex-ci-sandbox
provider: codex-cli
runtime:
mode: sandbox
engine: docker
image: ghcr.io/acme/codex-agent:sha256
workdir: /workspace
mounts:
- source: ./workspace
target: /workspace
access: rw
- source: ./.agentv/results
target: /results
access: rw
secrets:
OPENAI_API_KEY: ${{ OPENAI_API_KEY }}
config:
command: ["codex", "exec", "--json"]
model: gpt-5-codex
timeout_seconds: 300
ProviderBoundaryTranscript and isolation notes
codex-app-serverCodex app-server subprocess.Preferred Codex path for rich protocol events, session control, cancellation, and structured transcripts.
codex-cliCodex CLI subprocess.Best for simple local/CI process isolation and evaluating an installed Codex shim or profile.
codex-sdkCodex SDK in an AgentV child runner.Explicit SDK path. SDK crashes, malformed child output, and missing optional SDK packages are target execution errors.
pi-rpcPi launched in RPC mode over stdio.Preferred rich Pi control boundary; AgentV launches the configured command with RPC mode when needed.
pi-cliPi CLI subprocess.Simple process boundary; transcript richness depends on Pi CLI output.
pi-sdkPi SDK in an AgentV child runner.Explicit SDK path for SDK-native events with child-process isolation.
claude-cliClaude CLI subprocess.Default Claude path; captures structured stream output when available.
claude-sdkClaude Agent SDK in an AgentV child runner.Explicit SDK path; useful when SDK-native events matter more than matching a local CLI invocation.
copilot-cliCopilot CLI subprocess/protocol path.Active Copilot eval run through the installed process.
copilot-logPassive Copilot session-log reader.Zero-cost transcript grading for existing sessions; it does not run a new agent.
copilot-sdkCopilot SDK in an AgentV child runner.Explicit SDK path with child-process isolation.

Every coding-agent provider returns a structured target execution envelope. Run bundles preserve target id, provider kind, runtime mode, command argv, cwd, stdout/stderr, transcripts or logs, final output when available, timing, timeouts, exit codes, signals, and partial artifacts on failure.

Use codex-app-server when you want rich protocol control:

targets:
- id: codex-local
provider: codex-app-server
runtime: host
config:
command: ["codex", "app-server"]
model: gpt-5-codex
reasoning_effort: high
model_verbosity: medium

Use codex-cli when a simple CLI boundary is enough or when you want an operator-specific shim:

targets:
- id: codex-eng
provider: codex-cli
runtime: host
config:
command: ["codex-eng", "exec", "--json"]
model: ${{ CODEX_MODEL }}

Use codex-sdk only when you intentionally want the Codex SDK path:

targets:
- id: codex-sdk-isolated
provider: codex-sdk
runtime: host
config:
model: gpt-5-codex

Common Codex config fields include command, model, reasoning_effort, model_verbosity, base_url, api_key, api_format, sandbox_mode, approval_policy, cwd, timeout_seconds, log_dir, stream_log, and system_prompt.

Use pi-rpc for the rich stdio/RPC boundary:

targets:
- id: pi-rpc-local
provider: pi-rpc
runtime: host
config:
command: ["pi"]
model: gpt-5-codex
thinking: medium

Use pi-cli for simple subprocess execution:

targets:
- id: pi-cli-local
provider: pi-cli
runtime:
mode: profile
home: .agentv/profiles/pi-local
config:
command: ["pi"]
subprovider: openrouter
model: ${{ OPENROUTER_MODEL }}
api_key: ${{ OPENROUTER_API_KEY }}

Use pi-sdk only when you intentionally want the SDK path:

targets:
- id: pi-sdk-isolated
provider: pi-sdk
runtime: host
config:
subprovider: openai-codex
model: gpt-5.5
thinking: medium

Pi config fields include command, subprovider, model, thinking, tools, api_key, base_url, cwd, timeout_seconds, log_dir, stream_log, and system_prompt. With pi-cli, the built-in OpenAI provider does not expose a CLI base-url option; use a Pi custom provider name or Pi’s Azure provider path for custom gateways.

targets:
- id: claude-local
provider: claude-cli
runtime: host
config:
command: ["claude"]
model: claude-sonnet-4-20250514
max_turns: 10

Use claude-cli when you want AgentV to spawn the same Claude CLI a user runs locally. Use claude-sdk only when you intentionally want the Claude Agent SDK path:

targets:
- id: claude-sdk-isolated
provider: claude-sdk
runtime: host
config:
model: claude-sonnet-4-20250514
max_turns: 10

Claude config fields include command, model, cwd, timeout_seconds, max_turns, max_budget_usd, bypass_permissions, log_dir, stream_log, and system_prompt.

targets:
- id: copilot-local
provider: copilot-cli
runtime: host
config:
command: ["copilot"]
model: gpt-5-mini

Route Copilot through an OpenAI-compatible endpoint:

targets:
- id: copilot-openai
provider: copilot-cli
runtime: host
config:
command: ["copilot"]
subprovider: openai
base_url: ${{ OPENAI_ENDPOINT }}
api_key: ${{ OPENAI_API_KEY }}
api_format: responses

Read an existing Copilot session log without running a new agent:

targets:
- id: copilot-session-log
provider: copilot-log
runtime: host
config:
discover: latest

Use copilot-sdk only when you intentionally want the SDK path:

targets:
- id: copilot-sdk-isolated
provider: copilot-sdk
runtime: host
config:
model: gpt-5-mini

Copilot config fields include command, model, cwd, timeout_seconds, subprovider, base_url, api_key, bearer_token, api_version, api_format, log_dir, stream_log, system_prompt, and session-log fields such as discover, session_id, and session_dir for copilot-log.

Agent providers receive file inputs as paths, not inline file content. The prompt includes a preread block with file:// URIs pointing to absolute paths on disk, then the user query references each file:

input:
- role: user
content:
- type: file
value: ./src/example.ts
- type: text
value: Review this code

The agent receives a prompt like:

Read all input files:
* [example.ts](file:///abs/path/src/example.ts).
If any file is missing, fail with ERROR: missing-file <filename> and stop.
Then apply system_instructions on the user query below.
[[ ## user_query ## ]]
<file: path="./src/example.ts">
Review this code

LLM providers receive file content inline instead; see LLM providers.

For deterministic harness checks without a real provider:

targets:
- id: mock-target
provider: mock
runtime: host
config:
response: ok