OpenMonoAgent.ai (BETA) AI shouldn't have @codeKK c-sharpOpen Source Website

OpenMonoAgent.ai

Project Url: StartupHakk/OpenMonoAgent.ai

Introduction: (BETA) AI shouldn't have a meter. Unlimited tokens. Forever. Your machine. Your agent. Use it from anywhere. Terminal-native coding agent powered by local LLMs — 100% open source, free forever, and installed with a single command. Proudly built on C#/.NET, because AI tooling should be infrastructure, not a subscription.

More: Author ReportBugs

Tags:

Open-source coding agent. Local-first. Zero cost. Zero cloud.
_{Built to democratize AI. Powered by .NET.}

Quickstart · How it compares · What's inside · Hardware · Docs · Roadmap · Contributing

OpenMono is a coding agent that runs entirely on your hardware — no subscriptions, no data leaving your network, no per-token billing. It pairs a .NET 10 CLI with its own llama.cpp inference server, giving you a full agentic loop with 20 built-in tools, Docker sandboxing, and deep code intelligence. GPU or CPU, it auto-configures itself. You own the model, the compute, and the data.

bash <(curl -fsSL https://raw.githubusercontent.com/StartupHakk/OpenMonoAgent.ai/refs/heads/main/get-openmono.sh)

Then from any project:

cd your-project/

openmono agent          # TUI mode (default)
openmono agent --classic    # classic scrolling terminal

[!NOTE] TUI mode is the default for interactive terminals. Use openmono agent --classic for CLI.

→ Full command reference — daily commands, setup flags, GPU/CPU options

How it compares

Most coding agents are cloud products wearing an open-source label. Your prompts, your code, and your context hit someone else's servers on every keystroke. You pay per token, forever, with no ceiling.

OpenMono runs the model on your hardware via llama.cpp — an RTX 3090 or a workstation NUC is all you need. After the one-time setup, inference costs nothing. Your code never leaves the machine. No account, no usage dashboard, no API key.

It's a full agentic loop: 20 tools, sub-agents, Docker sandboxing, LSP code intelligence, native Roslyn C# analysis, MCP integration, and playbooks. Runs at ~45 tok/s on GPU, ~20 tok/s on CPU.

	OpenMono	Claude Code	OpenCode
Inference cost	Zero per token (local)	Per-token billing	Per-token billing
Data privacy	Fully offline capable	Cloud only	Depends on provider
Default inference	llama.cpp bundled, zero config	Anthropic API required	BYO provider, no bundled inference
Sandboxing	Docker-native	Host process	Host process
Code intelligence	LSP + Roslyn + MCP graph tools	File reads	LSP (30+ servers)
Extensibility	Playbooks (typed, composable)	Skills (markdown)	Plugins (TS SDK)
MCP	Client (stdio)	Full client	Full client
UI	TUI + CLI	Web, Desktop, VS Code, CLI	TUI, Desktop, Web

What's inside

01 · Bundled inference — zero config, zero cost llama.cpp ships inside Docker. Installer detects your hardware and picks the right model. After setup, every token is free. `GPU` Qwen3.6-27B dense · ~60 tok/s `CPU` Qwen3.6-35B-A3B MoE · ~20 tok/s → Models & reasoning mode	02 · Agentic loop that earns its name 25 iterations per turn. Doom-loop detection aborts if the same tool sequence repeats 3×. Checkpoints at 65% context fill, compacts at 80%. Runs until done — then stops.
03 · 20 tools, 12-step pipeline Every call: parse → schema validate → path sanity → plan-mode guard → capability check → cache → pre-hook → execute → post-hook → artifact store. Read-only tools run in parallel. Nothing bypasses the pipeline.	04 · 5 specialist sub-agents Isolated sessions with locked tool sets and turn budgets: `Explore` · read-only discovery · 15 turns `Plan` · architecture, no writes · 10 turns `Coder` · full file access · 30 turns `Verify` · adversarial + Roslyn · 20 turns `general-purpose` · everything · 25 turns
05 · Docker sandbox Project mounts as `/workspace`. The agent reads and writes your real files — that's the blast radius. Nothing outside that mount is visible or reachable.	06 · Deep code intelligence Roslyn: type hierarchy, blast-radius, cross-file symbol search, callers, diagnostics — 5-min compilation cache. LSP for TypeScript, Python, Go, Rust, lazy-started on first use. Auto-detects graphify (semantic concept graph, 25+ languages) and code-review-graph (structural call graph via MCP, ~22 tools) if installed — no config needed.
07 · Playbooks YAML workflows with typed parameters, conditional gates, and checkpoint/resume. Composable — one playbook can call another.	08 · 4 providers, hot-swappable Local llama.cpp is the default and fully supported. OpenAI, Anthropic, and Ollama are available but WIP — see Models for details.
09 · Distributed inference Agent on your laptop, inference on a separate GPU machine. No port forwarding — tunnel is established outbound from the inference box. Free relay at app.openmonoagent.ai. → Dual-box setup guide	10 · Vision Attach images in chat with `@screenshot.png` or ask the agent to read any image file. The multimodal projector (mmproj) is downloaded automatically at setup. Supported formats: PNG, JPG, GIF, WebP. Large images are auto-resized to fit within VRAM budget. Enable with `OPENMONO_VISION_ENABLED=1`. → Vision setup & usage

Distributed inference: agent on laptop, inference on GPU machine

Supported Hardware

VRAM / RAM	Model	Accuracy	Speed
GPU 24 GB+	Qwen3.6-27B-Q4_K_M	Full	~45–70 tok/s
GPU 16 GB	Qwen3.6-27B-UD-IQ3_XXS	Lower	~20–42 tok/s (4060 Ti → 4080)
GPU 12 GB	Qwen3.5-9B-Q4_K_M	Lower	~38–40 tok/s (RTX 3060)
CPU 24 GB RAM	Qwen3.6-35B-A3B-UD-Q4_K_XL	Full	~17–20 tok/s

[!NOTE] The installer detects your hardware and selects the right model automatically — no config needed. 12 GB and 16 GB GPU cards are supported but run lower accuracy models. For best results, use a 24 GB card. Requires Ubuntu 26.04 LTS (recommended) or 25.10.

Architecture

A .NET 10 CLI driving a local llama.cpp inference server over HTTP, everything sandboxed in Docker. The agent streams tokens, dispatches tool calls through a 12-step pipeline, and loops until done.

→ Full architecture + diagram

Configuration

Settings load from ~/.openmono/settings.json (user-level) or .openmono/settings.json (project-level) — reference, providers, permissions, MCP servers

→ Full configuration reference

Commands & shortcuts

14 slash commands including /think, /undo, /resume, and /export. Full keyboard shortcut reference for TUI mode.

→ Commands, slash commands & keyboard shortcuts

Docs

Roadmap
Setup & commands — daily commands, TUI vs classic, flags
Architecture — .NET CLI + llama.cpp + Docker, full diagram
Models & reasoning mode
Configuration — settings.json, providers, permissions, MCP servers
Tools
Playbooks
graphify — semantic code graph, 25+ languages
code-review-graph — structural call graph via MCP
Contributing

[!NOTE] OpenMono is in Public Beta. Early access is open, and we're shipping updates fast. Try it out and tell us what you'd like to see next.

Contributing

OpenMono is early and moving fast. Contributions are welcome — new tools, providers, LSP servers, playbooks, bug fixes, or docs.

Read the contributing guide before opening a PR.

"AI shouldn't be a subscription you rent. It should be infrastructure you own —
sitting on your desk, serving your code, answering only to you."

_{— Startup Hakk}

Apps

Android Developer Tools

Android Developer Tools Pro

About Me

Tools: TimeShining

GitHub: Trinea

Facebook: Dev Tools

AI Daily Digest

Daily AI News & Insights

JSON Format, Support error correction

MD5/SHA Encode, Support batch

Text Process

CSS Format and Compress