Recording & Replay

Record agent executions and replay them for deterministic testing.

Why Recording & Replay?

Live agent tests are:

Slow: Each test calls the AI harness
Expensive: You pay for each API call
Non-deterministic: LLM output varies

Recording & replay solves this:

Fast: Replay from memory, no API calls
Free: No harness costs during replay
Deterministic: Same signals every time

Basic Usage

Record

import { createWorkflow, ClaudeHarness, MemorySignalStore } from "@open-harness/core";

type State = { input: string; result: string | null };
const { agent, runReactive } = createWorkflow<State>();

const analyzer = agent({
  prompt: "Analyze: {{ state.input }}",
  activateOn: ["workflow:start"],
  updates: "result",
});

// Create a store
const store = new MemorySignalStore();

// Record execution
const result = await runReactive({
  agents: { analyzer },
  state: { input: "Hello", result: null },
  harness: new ClaudeHarness(),
  recording: { mode: "record", store },
  endWhen: (s) => s.result !== null,
});

// Save the recording ID
const recordingId = result.recordingId;
console.log("Recorded:", recordingId);

Replay

// Replay the same execution
const replay = await runReactive({
  agents: { analyzer },
  state: { input: "Hello", result: null },
  harness: new ClaudeHarness(), // Not called during replay
  recording: { mode: "replay", store, recordingId },
  endWhen: (s) => s.result !== null,
});

// Same result, no API calls
expect(replay.state.result).toBe(result.state.result);
expect(replay.signals).toEqual(result.signals);

Test Pattern

Use recordings for fast, deterministic tests:

analyzer.test.ts

import { describe, it, expect, beforeAll } from "vitest";
import { createWorkflow, ClaudeHarness, MemorySignalStore } from "@open-harness/core";

type State = { input: string; result: string | null };

describe("Analyzer", () => {
  const { agent, runReactive } = createWorkflow<State>();
  const store = new MemorySignalStore();
  let recordingId: string;

  const analyzer = agent({
    prompt: "Analyze: {{ state.input }}",
    activateOn: ["workflow:start"],
    updates: "result",
  });

  const initialState = { input: "Test input", result: null };

  beforeAll(async () => {
    // Record once before all tests
    const result = await runReactive({
      agents: { analyzer },
      state: initialState,
      harness: new ClaudeHarness(),
      recording: { mode: "record", store },
      endWhen: (s) => s.result !== null,
    });
    recordingId = result.recordingId!;
  });

  it("produces output", async () => {
    const result = await runReactive({
      agents: { analyzer },
      state: initialState,
      harness: new ClaudeHarness(),
      recording: { mode: "replay", store, recordingId },
      endWhen: (s) => s.result !== null,
    });

    expect(result.state.result).toBeDefined();
  });

  it("emits expected signals", async () => {
    const result = await runReactive({
      agents: { analyzer },
      state: initialState,
      harness: new ClaudeHarness(),
      recording: { mode: "replay", store, recordingId },
      endWhen: (s) => s.result !== null,
    });

    expect(result.signals).toContainSignal("agent:completed");
    expect(result.signals).toHaveSignalsInOrder([
      "workflow:start",
      "agent:activated",
      "workflow:end",
    ]);
  });
});

Recording Metadata

Add metadata to recordings for organization:

const result = await runReactive({
  agents: { analyzer },
  state: initialState,
  harness,
  recording: {
    mode: "record",
    store,
    name: "analyzer-happy-path",
    tags: ["unit", "analyzer"],
  },
});

Store Types

MemorySignalStore

In-memory storage, useful for tests:

import { MemorySignalStore } from "@open-harness/core";

const store = new MemorySignalStore();

Pros: Fast, no I/O Cons: Lost when process ends

Persistent Storage

For persisting recordings across test runs, you can serialize the store:

import { writeFileSync, readFileSync } from "fs";

// After recording
const recordings = store.export();
writeFileSync("recordings.json", JSON.stringify(recordings));

// Before replay
const data = JSON.parse(readFileSync("recordings.json", "utf-8"));
store.import(data);

CI/CD Integration

Strategy 1: Pre-recorded Fixtures

Record fixtures locally:

RECORD=true bun test

Commit fixtures to git:

git add recordings/
git commit -m "Update test recordings"

CI runs in replay mode:

bun test # Replays from committed fixtures

Strategy 2: Conditional Recording

const mode = process.env.RECORD ? "record" : "replay";

const result = await runReactive({
  agents: { analyzer },
  state: initialState,
  harness,
  recording: { mode, store, recordingId: mode === "replay" ? savedId : undefined },
});

Multi-Agent Recordings

Recordings capture the full workflow:

const result = await runReactive({
  agents: { analyzer, reviewer, fixer },
  state: initialState,
  harness,
  recording: { mode: "record", store },
});

// All agent interactions are recorded
const replay = await runReactive({
  agents: { analyzer, reviewer, fixer },
  state: initialState,
  harness,
  recording: { mode: "replay", store, recordingId: result.recordingId },
});

Re-recording

When agent behavior changes, re-record:

// Delete old recording
store.delete(recordingId);

// Record new execution
const fresh = await runReactive({
  agents: { analyzer },
  state: initialState,
  harness,
  recording: { mode: "record", store },
});

// Update saved recording ID
recordingId = fresh.recordingId!;

Gotchas

State Must Match

Initial state must be identical for replay:

// This will fail
const record = await runReactive({
  state: { input: "Hello", result: null },
  recording: { mode: "record", store },
});

const replay = await runReactive({
  state: { input: "Different!", result: null }, // ❌ Different state
  recording: { mode: "replay", store, recordingId },
});

Agent Configuration Must Match

Agent definitions must be identical:

// Record with one prompt
const analyzer = agent({ prompt: "Analyze: {{ state.input }}" });
const record = await runReactive({ agents: { analyzer }, ... });

// Can't replay with different prompt
const analyzer2 = agent({ prompt: "Different prompt" }); // ❌ Won't replay correctly

Harness Not Called During Replay

The harness is not called during replay—signals are returned from the store. You can pass any harness:

// Harness is ignored during replay
const replay = await runReactive({
  harness: new ClaudeHarness(), // Not called
  recording: { mode: "replay", store, recordingId },
});

Recording & Replay

Recording & Replay

Why Recording & Replay?

Basic Usage

Record

Replay

Test Pattern

Recording Metadata

Store Types

MemorySignalStore

Persistent Storage

CI/CD Integration

Strategy 1: Pre-recorded Fixtures

Strategy 2: Conditional Recording

Multi-Agent Recordings

Re-recording

Gotchas

State Must Match

Agent Configuration Must Match

Harness Not Called During Replay

Next Steps

Signal Assertions

Troubleshooting

On this page