Open Harness
The harness-agnostic framework for deep agents.
Build, evaluate, and deploy AI agents that operate computers like users do. Plug in Claude Code, Codex, OpenCode, or your own harness. No vendor lock-in.
What are Deep Agents?
Deep agents are AI systems that operate computers like users do — they run bash commands, read and write files, execute code, and interact with real infrastructure. Think Claude Code, Codex, or OpenCode.
These harnesses provide the basic tools (bash, file I/O, MCP) that any agent needs. Open Harness lets you build on top of them — and swap between them — without rewriting your agent logic.
The Problem with Agent Development
Building demos is easy
Wire up some prompts, add tools, and it works... maybe 60-70% of the time.
But how does it actually perform?
How does your prompt affect success rate? What's the cost? Does it work across models?
Iteration without data is just vibes
You can't improve what you can't measure. Every change is a guess.
Vendor lock-in is real
Your agent is tied to one harness. Switching providers means rewriting everything.
Open Harness: Build → Eval → Deploy
Build
Define agents with typed state, conditional activation, and signal-based coordination. Swap harnesses without changing your agent logic.
harness: new ClaudeHarness()
harness: new CodexHarness()Eval
Record agent runs, replay them deterministically. Test across harnesses. Measure cost, latency, and success rate with real data.
recording: { mode: "record" }
recording: { mode: "replay" }Deploy
Production-ready observability with structured logging. Signal traces for debugging. Run in CI with deterministic replay.
LOG_LEVEL=debug
result.signals import { createWorkflow, ClaudeHarness } from "@open-harness/core";
type State = {
issue: string;
diagnosis: string | null;
fix: string | null;
};
const { agent, runReactive } = createWorkflow<State>();
const diagnoser = agent({
prompt: `Analyze this issue and identify the root cause:
{{ state.issue }}
Use bash tools to inspect the codebase.`,
activateOn: ["workflow:start"],
emits: ["diagnosis:complete"],
updates: "diagnosis",
});
const fixer = agent({
prompt: `Apply a fix for this diagnosis:
{{ state.diagnosis }}
Edit the relevant files and run tests.`,
activateOn: ["diagnosis:complete"],
when: (ctx) => ctx.state.diagnosis !== null,
emits: ["fix:complete"],
updates: "fix",
});
// Run with Claude Code
const result = await runReactive({
agents: { diagnoser, fixer },
state: { issue: "Tests failing in auth module", diagnosis: null, fix: null },
harness: new ClaudeHarness(),
recording: { mode: "record", store }, // Capture for replay
});Why Open Harness?
Harness Agnostic
Plug in Claude Code, Codex, OpenCode, or build your own provider. Your agent logic stays the same. Test performance across harnesses with the same recordings.
Deterministic Evals
Record agent runs once, replay them forever. No more flaky tests. No more "it worked on my machine." Iterate with actual data, not vibes.
TypeScript-First
Full type safety for state, guards, and agent configuration. Autocomplete everywhere. Catch errors at compile time, not runtime.
Production Ready
Structured logging with Pino. Signal traces for debugging. Vitest matchers for CI. Built for real deployments, not just demos.
Open Harness is open source and free to use.