Agentspan logo

Agentspan

Open-source runtime for durable AI agents

2026-05-18

Product Introduction

  1. Definition: Agentspan is an open-source, MIT-licensed server and SDK for building durable AI agents and multi-agent workflows. It functions as a durable execution layer that compiles agent definitions into persistent workflows, decoupling execution state from the application process.
  2. Core Value Proposition: It exists to solve the fundamental reliability and production-readiness challenges of AI agents. Agentspan provides crash recovery, human-in-the-loop approvals, observability, and guardrails around existing agent frameworks and LLMs, enabling developers to ship agents that survive process failures, deployments, and require human oversight.

Main Features

  1. Durable Execution & Crash Recovery: Agentspan's core innovation is persisting agent execution state on its server, not in the application process. How it works: When you call start(agent, prompt), it compiles the agent logic into a Conductor workflow (a battle-tested orchestration engine from Netflix/LinkedIn). If your application crashes, is killed (OOM), or is redeployed, the agent's state (e.g., "step 3: llm_call in-flight") remains safely stored. A new worker can reconnect at any time and resume execution from the exact interrupted step, ensuring no lost work.
  2. Human-in-the-Loop (HITL) Approvals: Integrates human oversight directly into agent workflows. How it works: Developers can decorate any tool with @tool(approval_required=True). When the agent calls this tool, it automatically pauses, holding its full state on the server indefinitely. Approval or rejection can be triggered via Slack, a web portal, or programmatically via handle.approve()/.reject(), after which the agent resumes cleanly from the waiting point. This is essential for safety-critical operations like refunds or content publishing.
  3. Comprehensive Observability & Testing: Provides deep introspection into every agent run. How it works: The Agentspan UI and CLI expose a complete audit trail: every LLM call (with timing and token counts), every tool call with inputs/outputs, handoffs between agents, and guardrail results. Furthermore, its mock_run testing utility allows for deterministic unit testing of agent logic by scripting exact sequences of mock tool calls and LLM responses, enabling fast, reliable CI/CD pipelines without needing live LLMs or the server.

Problems Solved

  1. Pain Point: Agent fragility in production. Traditional agent implementations run ephemerally within an application process. A crash, deployment, or infrastructure failure kills the agent and loses all in-flight state and context, making them unreliable for long-running or critical tasks.
  2. Target Audience: AI Engineers and Backend Developers building production AI applications; Platform Teams needing to provide a robust agent framework for their organization; Startups and Enterprises integrating autonomous workflows into customer-facing or internal operations.
  3. Use Cases: Long-running data analysis agents (e.g., "analyze 10k records") that must survive pod evictions; Multi-step customer support workflows requiring manager approval for escalations; Research and content generation pipelines where each step (research -> write -> edit) needs to be logged and resumable; AI coding agents that need to maintain state across multiple tool calls and potential interruptions.

Unique Advantages

  1. Differentiation: Unlike pure agent frameworks (e.g., LangChain, LlamaIndex) that focus on prompt/chaining logic, or orchestration tools that are generic, Agentspan specifically bridges the gap. It wraps existing frameworks (OpenAI SDK, Google ADK, LangGraph) to add durability, whereas competitors leave state management as an exercise for the developer. Compared to building on raw workflow engines, it provides a high-level, agent-native Python API.
  2. Key Innovation: The compilation of declarative agent definitions into durable Conductor workflows. This technical approach leverages an open-source orchestration engine proven at hyperscale (Netflix, LinkedIn, Tesla) to gain production-grade features—persistent state, per-step retries, full execution history, and replay—as primitives, rather than rebuilding them from scratch.

Frequently Asked Questions (FAQ)

  1. How does Agentspan handle agent state and crash recovery? Agentspan persists all execution state—including partial LLM responses, tool call arguments, and intermediate results—on its central server (backed by a database). If your application process disconnects, the workflow is simply paused. Upon reconnection, the state is reloaded, and execution resumes from the exact step that was in progress, ensuring durability.
  2. Can I use Agentspan with my existing OpenAI Agents or LangGraph setup? Yes, Agentspan is designed as a drop-in execution layer. For the OpenAI Agents SDK and LangGraph, you often only need to change your entry point from Runner.run_sync() or app.invoke() to Agentspan's run() function. Your agent definitions, tools, and graph structures remain identical while gaining durability and observability.
  3. Is the Agentspan server self-hosted or cloud-based? Agentspan is MIT-licensed and self-hostable. You can download and run the server (~50 MB) on your own infrastructure (e.g., agentspan server start). This provides full control over data and compliance. The team behind it (Orkes) may also offer a managed cloud service.
  4. What LLM providers and models does Agentspan support? Agentspan uses a provider-agnostic, unified format (provider/model-name). It supports Anthropic (anthropic/claude-...), OpenAI (openai/gpt-...), Google (google/gemini-...), Groq (groq/llama-...), and others. You can mix and match different providers and models within a single multi-agent pipeline.
  5. How do you test AI agents built with Agentspan? Agentspan provides a first-class testing SDK (agentspan.agents.testing). You can use mock_run to simulate specific tool call sequences and LLM responses without making actual API calls. This allows for fast, deterministic unit tests to validate agent logic, error handling, and tool routing in milliseconds, integrating seamlessly into CI/CD.

Subscribe to Our Newsletter

Get weekly curated tool recommendations and stay updated with the latest product news