Open Harness

The harness-agnostic framework for deep agents.

Build, evaluate, and deploy AI agents that operate computers like users do. Plug in Claude Code, Codex, OpenCode, or your own harness. No vendor lock-in.

Get Started How It Works

What are Deep Agents?

Deep agents are AI systems that operate computers like users do — they run bash commands, read and write files, execute code, and interact with real infrastructure. Think Claude Code, Codex, or OpenCode.

These harnesses provide the basic tools (bash, file I/O, MCP) that any agent needs. Open Harness lets you build on top of them — and swap between them — without rewriting your agent logic.

The Problem with Agent Development

Building demos is easy

Wire up some prompts, add tools, and it works... maybe 60-70% of the time.

But how does it actually perform?

How does your prompt affect success rate? What's the cost? Does it work across models?

Iteration without data is just vibes

You can't improve what you can't measure. Every change is a guess.

Vendor lock-in is real

Your agent is tied to one harness. Switching providers means rewriting everything.

Open Harness: Build → Eval → Deploy

🔧

Build

Define agents with typed state, conditional activation, and signal-based coordination. Swap harnesses without changing your agent logic.

harness: new ClaudeHarness()
harness: new CodexHarness()

🧪

Eval

Record agent runs, replay them deterministically. Test across harnesses. Measure cost, latency, and success rate with real data.

recording: { mode: "record" }
recording: { mode: "replay" }

🚀

Deploy

Production-ready observability with structured logging. Signal traces for debugging. Run in CI with deterministic replay.

LOG_LEVEL=debug
result.signals

coding-agent.tsA deep agent that fixes bugs

import { createWorkflow, ClaudeHarness } from "@open-harness/core";

type State = {
  issue: string;
  diagnosis: string | null;
  fix: string | null;
};

const { agent, runReactive } = createWorkflow<State>();

const diagnoser = agent({
  prompt: `Analyze this issue and identify the root cause:
{{ state.issue }}

Use bash tools to inspect the codebase.`,
  activateOn: ["workflow:start"],
  emits: ["diagnosis:complete"],
  updates: "diagnosis",
});

const fixer = agent({
  prompt: `Apply a fix for this diagnosis:
{{ state.diagnosis }}

Edit the relevant files and run tests.`,
  activateOn: ["diagnosis:complete"],
  when: (ctx) => ctx.state.diagnosis !== null,
  emits: ["fix:complete"],
  updates: "fix",
});

// Run with Claude Code
const result = await runReactive({
  agents: { diagnoser, fixer },
  state: { issue: "Tests failing in auth module", diagnosis: null, fix: null },
  harness: new ClaudeHarness(),
  recording: { mode: "record", store }, // Capture for replay
});

Why Open Harness?

Harness Agnostic

Plug in Claude Code, Codex, OpenCode, or build your own provider. Your agent logic stays the same. Test performance across harnesses with the same recordings.

Deterministic Evals

Record agent runs once, replay them forever. No more flaky tests. No more "it worked on my machine." Iterate with actual data, not vibes.

TypeScript-First

Full type safety for state, guards, and agent configuration. Autocomplete everywhere. Catch errors at compile time, not runtime.

Production Ready

Structured logging with Pino. Signal traces for debugging. Vitest matchers for CI. Built for real deployments, not just demos.

Open Harness is open source and free to use.

Get Started GitHub