OpenMonoAgent.ai

Introduction: (BETA) AI shouldn't have a meter. Unlimited tokens. Forever. Your machine. Your agent. Use it from anywhere. Terminal-native coding agent powered by local LLMs — 100% open source, free forever, and installed with a single command. Proudly built on C#/.NET, because AI tooling should be infrastructure, not a subscription.
More: Author   ReportBugs   
Tags:
OpenMonoAgent
Open-source coding agent. Local-first. Zero cost. Zero cloud.
Built to democratize AI. Powered by .NET.

Quickstart · How it compares · What's inside · Hardware · Docs · Roadmap · Contributing

Status: Beta

.NET 10 GNU AGPL-3.0 License Docker llama.cpp Self-hosted Linux

OpenMono is a coding agent that runs entirely on your hardware — no subscriptions, no data leaving your network, no per-token billing. It pairs a .NET 10 CLI with its own llama.cpp inference server, giving you a full agentic loop with 20 built-in tools, Docker sandboxing, and deep code intelligence. GPU or CPU, it auto-configures itself. You own the model, the compute, and the data.


bash <(curl -fsSL https://raw.githubusercontent.com/StartupHakk/OpenMonoAgent.ai/refs/heads/main/get-openmono.sh)

Then from any project:

cd your-project/

openmono agent          # TUI mode (default)
openmono agent --classic    # classic scrolling terminal
OpenMono TUI

[!NOTE] TUI mode is the default for interactive terminals. Use openmono agent --classic for CLI.

Full command reference — daily commands, setup flags, GPU/CPU options


How it compares

Most coding agents are cloud products wearing an open-source label. Your prompts, your code, and your context hit someone else's servers on every keystroke. You pay per token, forever, with no ceiling.

OpenMono runs the model on your hardware via llama.cpp — an RTX 3090 or a workstation NUC is all you need. After the one-time setup, inference costs nothing. Your code never leaves the machine. No account, no usage dashboard, no API key.

It's a full agentic loop: 20 tools, sub-agents, Docker sandboxing, LSP code intelligence, native Roslyn C# analysis, MCP integration, and playbooks. Runs at ~45 tok/s on GPU, ~20 tok/s on CPU.

OpenMono Claude Code OpenCode
Inference cost Zero per token (local) Per-token billing Per-token billing
Data privacy Fully offline capable Cloud only Depends on provider
Default inference llama.cpp bundled, zero config Anthropic API required BYO provider, no bundled inference
Sandboxing Docker-native Host process Host process
Code intelligence LSP + Roslyn + MCP graph tools File reads LSP (30+ servers)
Extensibility Playbooks (typed, composable) Skills (markdown) Plugins (TS SDK)
MCP Client (stdio) Full client Full client
UI TUI + CLI Web, Desktop, VS Code, CLI TUI, Desktop, Web

What's inside

01 · Bundled inference — zero config, zero cost
llama.cpp ships inside Docker. Installer detects your hardware and picks the right model. After setup, every token is free.

GPU Qwen3.6-27B dense · ~60 tok/s
CPU Qwen3.6-35B-A3B MoE · ~20 tok/s

Models & reasoning mode

02 · Agentic loop that earns its name
25 iterations per turn. Doom-loop detection aborts if the same tool sequence repeats 3×. Checkpoints at 65% context fill, compacts at 80%. Runs until done — then stops.

03 · 20 tools, 12-step pipeline
Every call: parse → schema validate → path sanity → plan-mode guard → capability check → cache → pre-hook → execute → post-hook → artifact store. Read-only tools run in parallel. Nothing bypasses the pipeline.

04 · 5 specialist sub-agents
Isolated sessions with locked tool sets and turn budgets:

Explore · read-only discovery · 15 turns
Plan · architecture, no writes · 10 turns
Coder · full file access · 30 turns
Verify · adversarial + Roslyn · 20 turns
general-purpose · everything · 25 turns

05 · Docker sandbox
Project mounts as /workspace. The agent reads and writes your real files — that's the blast radius. Nothing outside that mount is visible or reachable.

06 · Deep code intelligence
Roslyn: type hierarchy, blast-radius, cross-file symbol search, callers, diagnostics — 5-min compilation cache. LSP for TypeScript, Python, Go, Rust, lazy-started on first use.

Auto-detects graphify (semantic concept graph, 25+ languages) and code-review-graph (structural call graph via MCP, ~22 tools) if installed — no config needed.

07 · Playbooks
YAML workflows with typed parameters, conditional gates, and checkpoint/resume. Composable — one playbook can call another.

08 · 4 providers, hot-swappable
Local llama.cpp is the default and fully supported. OpenAI, Anthropic, and Ollama are available but WIP — see Models for details.

09 · Distributed inference
Agent on your laptop, inference on a separate GPU machine. No port forwarding — tunnel is established outbound from the inference box. Free relay at app.openmonoagent.ai.

Dual-box setup guide

10 · Vision
Attach images in chat with @screenshot.png or ask the agent to read any image file. The multimodal projector (mmproj) is downloaded automatically at setup. Supported formats: PNG, JPG, GIF, WebP. Large images are auto-resized to fit within VRAM budget. Enable with OPENMONO_VISION_ENABLED=1.

Vision setup & usage

Distributed inference: agent on laptop, inference on GPU machine

Supported Hardware

VRAM / RAM Model Accuracy Speed
GPU 24 GB+ Qwen3.6-27B-Q4_K_M Full ~45–70 tok/s
GPU 16 GB Qwen3.6-27B-UD-IQ3_XXS Lower ~20–42 tok/s (4060 Ti → 4080)
GPU 12 GB Qwen3.5-9B-Q4_K_M Lower ~38–40 tok/s (RTX 3060)
CPU 24 GB RAM Qwen3.6-35B-A3B-UD-Q4_K_XL Full ~17–20 tok/s

[!NOTE] The installer detects your hardware and selects the right model automatically — no config needed. 12 GB and 16 GB GPU cards are supported but run lower accuracy models. For best results, use a 24 GB card. Requires Ubuntu 26.04 LTS (recommended) or 25.10.

Architecture

A .NET 10 CLI driving a local llama.cpp inference server over HTTP, everything sandboxed in Docker. The agent streams tokens, dispatches tool calls through a 12-step pipeline, and loops until done.

Full architecture + diagram

Configuration

Settings load from ~/.openmono/settings.json (user-level) or .openmono/settings.json (project-level) — reference, providers, permissions, MCP servers

Full configuration reference

Commands & shortcuts

14 slash commands including /think, /undo, /resume, and /export. Full keyboard shortcut reference for TUI mode.

Commands, slash commands & keyboard shortcuts

Docs

[!NOTE] OpenMono is in Public Beta. Early access is open, and we're shipping updates fast. Try it out and tell us what you'd like to see next.

Contributing

OpenMono is early and moving fast. Contributions are welcome — new tools, providers, LSP servers, playbooks, bug fixes, or docs.

Read the contributing guide before opening a PR.



"AI shouldn't be a subscription you rent. It should be infrastructure you own —
sitting on your desk, serving your code, answering only to you."


— Startup Hakk

StartupHakk
GNU AFFERO GENERAL PUBLIC LICENSE v3.0 · © 2026 StartupHakk
Apps
About Me
GitHub: Trinea
Facebook: Dev Tools
AI Daily Digest