Introducing ra

March 30, 2026

ra is an open-source agent harness. Give it a task, point it at an LLM, and let it work — reading files, running commands, calling APIs, and looping until the job is done. You stay in control through a config file, middleware hooks, and permission rules. Nothing is hidden behind abstractions you can't reach.

This post walks through the ideas behind ra and the features that make it useful.

The agent loop is the product

At the core of ra is a simple loop: call the model, execute the tool calls it returns, feed the results back, repeat. That's it. No planning graphs, no state machines, no orchestration DSL. The loop runs until the model says it's done — no arbitrary iteration caps.

User message → model → tool calls → execute → model → … → done

What makes it powerful is what you can hook into. Nine middleware hooks span the entire lifecycle — beforeModelCall, afterModelResponse, beforeToolExecution, afterToolExecution, and more. Every hook can inspect, modify, or block the step it wraps. Want to enforce a token budget? Write a beforeModelCall hook. Want to redact sensitive output? Write an afterToolExecution hook. Want to halt the loop after a certain duration? afterLoopIteration.

typescript

// middleware/budget.ts — stop the agent when tokens run out
export default async function ({ context }) {
  if (context.totalTokens > 500_000) {
    return { stop: true, reason: 'Token budget exceeded' }
  }
}

Middleware is just TypeScript files. No plugin API to learn, no registration boilerplate.

Context engineering as a first-class concern

Most agent frameworks treat the context window as a bucket you throw messages into until it overflows. ra treats it as a resource to be managed.

Smart compaction. When the conversation grows too large, ra compacts it — either by truncating older messages (preserving cached prefixes for cost savings) or by summarizing them into a condensed history. The compaction strategy is configurable and respects provider-specific prompt caching (Anthropic, OpenAI, Google) so you don't blow away cache hits unnecessarily.

Dynamic context window learning. If a provider rejects a request because it exceeded the real context limit, ra learns the actual size and adapts future compaction thresholds automatically.

Automatic discovery. Drop a CLAUDE.md, AGENTS.md, or .cursorrules file in your project and ra picks it up — convention files are injected into the system prompt automatically.

Pattern resolution. Reference files inline in your prompts with @src/auth.ts or @src/**/*.ts, and ra expands them into the context. Same for URLs: url:https://example.com/docs fetches and inlines the content.

See what the agent is doing

Every ra session produces structured logs, trace spans, and per-iteration token metrics — automatically, with no instrumentation code.

The built-in Inspector is a web dashboard that shows you every model call, every tool execution, every thinking block, and exactly how tokens were spent across the session. When an agent does something unexpected, you don't guess — you look.

Traces form a hierarchy: agent.loop → agent.iteration → agent.model_call / agent.tool_execution, each with duration, status, and attributes. Pipe the JSONL logs to your own tooling or just read them directly.

The config is the agent

A single ra.config.yml turns any directory into a purpose-built agent. Define the provider, model, tools, permissions, middleware, and skills — all in one file.

yaml

agent:
  provider: anthropic
  model: claude-sonnet-4-6
  thinking: adaptive
  maxTokenBudget: 500_000

  permissions:
    rules:
      - tool: Bash
        command:
          allow: ["^git ", "^bun "]
          deny: ["--force", "--hard"]

  middleware:
    beforeModelCall:
      - "./middleware/budget.ts"

  skillDirs:
    - ./skills

Permissions are regex-based allow/deny rules per tool, per field. The config above lets the agent run git and bun commands but blocks --force and --hard flags. No code required — just patterns.

Adaptive thinking deserves a mention. ra supports extended thinking where the model reasons deeply before responding. In adaptive mode, the agent uses high thinking budget for the first iterations (when the problem is being understood) and dials it back as execution progresses. You get deep reasoning when it matters without burning tokens on routine follow-up.

Skills: composable agent roles

Skills are reusable instruction bundles — a directory with a SKILL.md file and optional scripts or reference files. Think of them as roles you can assign to the agent.

bash

ra --skill code-review "Review these changes"
ra --skill debugger --file error.log "Why is this failing?"

Skills use progressive disclosure. The model initially sees only skill names and one-line descriptions. When it decides a skill is relevant, the full instructions are loaded. This keeps the context lean until depth is needed.

You can install skills from GitHub, npm, or local directories. And here's the interesting part: the agent can write new skills at runtime — extending its own capabilities as it works.

Recipes: shareable agent configurations

A recipe is a complete agent setup — config, skills, and middleware — packaged as a directory. Run a recipe and you get a fully configured agent:

bash

ra --recipe coding-agent "Fix the failing tests and open a PR"
ra --recipe code-review-agent "Review this diff"

Recipes layer on top of your existing config. Skills and middleware from a recipe prepend to yours rather than replacing them. This means you can start from a recipe and customize from there.

Runs anywhere

ra is a single binary. The same agent runs across multiple interfaces:

CLI — one-shot prompts, piping, chaining. cat error.log | ra "Explain this error" just works.
REPL — interactive sessions with history, slash commands, file attachments.
HTTP API — sync and streaming endpoints for building on top of ra.
MCP server — ra --mcp-stdio exposes the agent to Cursor, Claude Desktop, or any MCP-compatible editor.
Cron — scheduled autonomous jobs with isolated sessions and logs.
GitHub Actions — run ra in CI/CD with no install step.

And it works with every major provider: Anthropic, OpenAI, Google, Ollama, AWS Bedrock, and Azure. Switch with a flag.

Sessions and memory

Conversations persist as JSONL files, scoped per-project. Start a session in the REPL, resume it later from the HTTP API. Sessions are the same format everywhere.

For longer-lived knowledge, ra has a built-in memory system backed by SQLite with full-text search. Agents can save facts, search them across sessions, and forget them when they're no longer relevant. This is how an agent remembers project conventions, past decisions, or user preferences across runs.

Get started

bash

curl -fsSL https://raw.githubusercontent.com/chinmaymk/ra/main/install.sh | bash
export ANTHROPIC_API_KEY="sk-..."
ra "Summarize the key points of this file" --file report.pdf

Read the docs or browse the source on GitHub.

Introducing ra ​

The agent loop is the product ​

Context engineering as a first-class concern ​

See what the agent is doing ​

The config is the agent ​

Skills: composable agent roles ​

Recipes: shareable agent configurations ​

Runs anywhere ​

Sessions and memory ​

Get started ​

Introducing ra

The agent loop is the product

Context engineering as a first-class concern

See what the agent is doing

The config is the agent

Skills: composable agent roles

Recipes: shareable agent configurations

Runs anywhere

Sessions and memory

Get started