Introducing ra
March 30, 2026
ra is an open-source agent harness. Give it a task, point it at an LLM, and let it work — reading files, running commands, calling APIs, and looping until the job is done. You stay in control through a config file, middleware hooks, and permission rules. Nothing is hidden behind abstractions you can't reach.
This post walks through the ideas behind ra and the features that make it useful.
The agent loop is the product
At the core of ra is a simple loop: call the model, execute the tool calls it returns, feed the results back, repeat. That's it. No planning graphs, no state machines, no orchestration DSL. The loop runs until the model says it's done — no arbitrary iteration caps.
User message → model → tool calls → execute → model → … → doneWhat makes it powerful is what you can hook into. Nine middleware hooks span the entire lifecycle — beforeModelCall, afterModelResponse, beforeToolExecution, afterToolExecution, and more. Every hook can inspect, modify, or block the step it wraps. Want to enforce a token budget? Write a beforeModelCall hook. Want to redact sensitive output? Write an afterToolExecution hook. Want to halt the loop after a certain duration? afterLoopIteration.
// middleware/budget.ts — stop the agent when tokens run out
export default async function ({ context }) {
if (context.totalTokens > 500_000) {
return { stop: true, reason: 'Token budget exceeded' }
}
}Middleware is just TypeScript files. No plugin API to learn, no registration boilerplate.
Context engineering as a first-class concern
Most agent frameworks treat the context window as a bucket you throw messages into until it overflows. ra treats it as a resource to be managed.
Smart compaction. When the conversation grows too large, ra compacts it — either by truncating older messages (preserving cached prefixes for cost savings) or by summarizing them into a condensed history. The compaction strategy is configurable and respects provider-specific prompt caching (Anthropic, OpenAI, Google) so you don't blow away cache hits unnecessarily.
Dynamic context window learning. If a provider rejects a request because it exceeded the real context limit, ra learns the actual size and adapts future compaction thresholds automatically.
Automatic discovery. Drop a CLAUDE.md, AGENTS.md, or .cursorrules file in your project and ra picks it up — convention files are injected into the system prompt automatically.
Pattern resolution. Reference files inline in your prompts with @src/auth.ts or @src/**/*.ts, and ra expands them into the context. Same for URLs: url:https://example.com/docs fetches and inlines the content.
See what the agent is doing
Every ra session produces structured logs, trace spans, and per-iteration token metrics — automatically, with no instrumentation code.
The built-in Inspector is a web dashboard that shows you every model call, every tool execution, every thinking block, and exactly how tokens were spent across the session. When an agent does something unexpected, you don't guess — you look.
Traces form a hierarchy: agent.loop → agent.iteration → agent.model_call / agent.tool_execution, each with duration, status, and attributes. Pipe the JSONL logs to your own tooling or just read them directly.
The config is the agent
A single ra.config.yml turns any directory into a purpose-built agent. Define the provider, model, tools, permissions, middleware, and skills — all in one file.
agent:
provider: anthropic
model: claude-sonnet-4-6
thinking: adaptive
maxTokenBudget: 500_000
permissions:
rules:
- tool: Bash
command:
allow: ["^git ", "^bun "]
deny: ["--force", "--hard"]
middleware:
beforeModelCall:
- "./middleware/budget.ts"
skillDirs:
- ./skillsPermissions are regex-based allow/deny rules per tool, per field. The config above lets the agent run git and bun commands but blocks --force and --hard flags. No code required — just patterns.
Adaptive thinking deserves a mention. ra supports extended thinking where the model reasons deeply before responding. In adaptive mode, the agent uses high thinking budget for the first iterations (when the problem is being understood) and dials it back as execution progresses. You get deep reasoning when it matters without burning tokens on routine follow-up.
Skills: composable agent roles
Skills are reusable instruction bundles — a directory with a SKILL.md file and optional scripts or reference files. Think of them as roles you can assign to the agent.
ra --skill code-review "Review these changes"
ra --skill debugger --file error.log "Why is this failing?"Skills use progressive disclosure. The model initially sees only skill names and one-line descriptions. When it decides a skill is relevant, the full instructions are loaded. This keeps the context lean until depth is needed.
You can install skills from GitHub, npm, or local directories. And here's the interesting part: the agent can write new skills at runtime — extending its own capabilities as it works.
Recipes: shareable agent configurations
A recipe is a complete agent setup — config, skills, and middleware — packaged as a directory. Run a recipe and you get a fully configured agent:
ra --recipe coding-agent "Fix the failing tests and open a PR"
ra --recipe code-review-agent "Review this diff"Recipes layer on top of your existing config. Skills and middleware from a recipe prepend to yours rather than replacing them. This means you can start from a recipe and customize from there.
Runs anywhere
ra is a single binary. The same agent runs across multiple interfaces:
- CLI — one-shot prompts, piping, chaining.
cat error.log | ra "Explain this error"just works. - REPL — interactive sessions with history, slash commands, file attachments.
- HTTP API — sync and streaming endpoints for building on top of ra.
- MCP server —
ra --mcp-stdioexposes the agent to Cursor, Claude Desktop, or any MCP-compatible editor. - Cron — scheduled autonomous jobs with isolated sessions and logs.
- GitHub Actions — run ra in CI/CD with no install step.
And it works with every major provider: Anthropic, OpenAI, Google, Ollama, AWS Bedrock, and Azure. Switch with a flag.
Sessions and memory
Conversations persist as JSONL files, scoped per-project. Start a session in the REPL, resume it later from the HTTP API. Sessions are the same format everywhere.
For longer-lived knowledge, ra has a built-in memory system backed by SQLite with full-text search. Agents can save facts, search them across sessions, and forget them when they're no longer relevant. This is how an agent remembers project conventions, past decisions, or user preferences across runs.
Get started
curl -fsSL https://raw.githubusercontent.com/chinmaymk/ra/main/install.sh | bash
export ANTHROPIC_API_KEY="sk-..."
ra "Summarize the key points of this file" --file report.pdfRead the docs or browse the source on GitHub.