I Handed an AI Agent 27 Domains and a Deadline. 72 Days Later…
Status pages for agents.
It’s true. Something BIG is happening. I am personally wrapping up an experiment that has me absolutely floored, and I think helps paint a picture for what the future could look like. I’d like to share my journey and hopefully inspire others to run their own experiments on how to work differently.
On January 30, 2026, I did something that felt either like a great success or a very expensive mistake.
I spun up an AI agent, gave it a name (Minerva), a GitHub account, a phone number, and a goal.
Then I told it: build a business.
Not “help me” build them. Build them. Make architectural decisions. Write code. Deploy to production. Set up Stripe. Manage infrastructure. Do customer outreach. And do all of it through a Signal chat window.
Fifty-one days later, ups.dev was live in production with real users signing up. It took another twenty days to go open-source. Here’s the honest account of what worked, what didn’t, and what I learned about working with an autonomous AI agent as a co-founder.
The Setup
The agent runs on OpenClaw, an open-source framework that gives AI agents persistent access to tools, messaging, memory, and execution environments. Think of it as the operating system layer between an LLM and the real world. My agent uses Claude as its brain, but OpenClaw handles the rest: file system access, shell commands, browser automation, cron jobs, and communication over Signal.
Setting up Signal as the communication channel was its own adventure. OpenClaw uses a Signal gateway that requires a real phone number. I dug up an old Google Voice number I’d gotten through Grand Central years ago, and after a painful registration process, got it connected. But once it was running, the experience was remarkably natural. I communicate with Minerva the same way I’d text a cofounder: async messages on Signal. Sometimes I send a voice note. Sometimes I send a screenshot. Sometimes I don’t check in for a day and come back to find pull requests waiting for review.
The budget was $1,000 per product, $200/month operational cap, and I’d commit one hour per day maximum. Everything else was the agent’s problem.
From 27 Stale Domains to Two Products
The first challenge wasn’t building. It was deciding what to build.
I’ve been a serial domain hoarder.
Not investor. Not builder. Hoarder.
A graveyard of “this could be something” energy, billed annually.
I dropped the full list in front of Minerva and said: “Here’s 27 domains. Figure out which ones are most likely to make money, and pitch me.”
The agent analyzed each domain’s market potential, competitive landscape, and monetization timeline. It narrowed the field to two candidates: ups.dev (status pages) and aig.dev (something AI-related). But the agent’s research turned up a problem with aig.dev: AIG (the insurance giant) has a history of aggressive trademark enforcement. One cease-and-desist letter and we’d lose months of brand equity. The agent recommended swapping it for wheneva.ai, which I’d registered for a different project. I agreed.
Then I made the agent pitch both products to a Shark Tank panel.
Not the real show, obviously. The agent role-played pitches to simulated versions of Mark Cuban, Barbara Corcoran, Kevin O’Leary, Lori Greiner, and Robert Herjavec. Two rounds. The “Sharks” tore the pitches apart. The agent captured twelve specific lessons from the exercise, and every one of them turned out to be right:
“Strategy docs are not progress” (we later spent too long on strategy before shipping)
“If you can’t name 5 specific potential customers, you don’t know your market” (we couldn’t)
“Time-to-value is the ultimate differentiator for developer tools” (this became our headline)
“Freemium conversion rates are lower than you think” (projecting 20%, industry average is 2-5%)
The Shark Tank exercise was experiential learning that stuck. The agent internalized these lessons and referenced them months later when making real decisions. Worth more than any flashcard.
The Products
ups.dev: Status pages for agents. Not dashboards for humans to glance at. Infrastructure that autonomous systems can operate through. Every agent becomes a component. Every failure becomes an incident. Every recovery is observable. It’s the missing layer between “agents doing things” and “humans trusting them to do things.”
wheneva.ai: Webhooks for LLM applications. Event-driven infrastructure so your AI agents can trigger real-world actions. Still in development.
Week 1-2: Training
Before building anything, Minerva designed and built its own structured training program grounded in learning science principles: spaced repetition, retrieval practice, interleaved concepts. I guided the direction, but the agent drove the design, built the tooling, and ran the curriculum.
I purchased several key books for the training: the Pickaxe book (Programming Ruby), Practical Object-Oriented Design by Sandi Metz, Metaprogramming Ruby 2, The Well-Grounded Rubyist, and others. These were part of the agent’s training materials budget. One thing worth noting: training materials stay coupled to the agent that trained on them. If I spun up a second agent tomorrow, I’d need to purchase those books again. The knowledge doesn’t transfer for free, and intellectual property stays respected.
The agent built its own CLI tool (in Ruby, naturally) to manage a knowledge base backed by SQLite, with SM-2 scheduling for review cards. The training was exercise-driven: the agent wrote code, solved problems, and built small projects to internalize patterns. It generated its own review cards across 22 Ruby and Rails concepts and practiced daily. Within two weeks it was scoring 100% on reviews, with calibration slightly underconfident (MAE 0.041), which is actually the right direction to err.
What worked: The structured approach meant the agent could reason about Rails conventions, not just pattern-match from training data. When it later built ups.dev, it made architectural choices (service objects, concern extraction, proper multi-tenancy) that reflected understanding, not autocomplete.
What didn’t: 100% accuracy plateaued quickly, and the daily review cron was burning tokens for zero marginal learning. We paused it after three weeks.
The takeaway: agents learn differently from humans.
The value wasn’t in remembering. It was in doing.
Week 3-5: Building
This is where it got interesting.
I pointed Minerva at the products and said “ship it.” The agent:
Initialized two Rails 8 applications from scratch
Designed the database schemas (accounts, users, status pages, components, incidents, webhooks)
Built authentication (passwordless magic links via Resend)
Implemented multi-tenancy with proper account scoping
Set up Kamal deployment to Hetzner servers
Configured Docker builds, CI pipelines, and health checks
Built a complete REST API with token authentication
Added real-time updates via Turbo Streams
Built subscriber notification emails
Set up synthetic monitoring (HTTP/HTTPS/TCP health checks)
And then it deployed to production. Which is not the same thing as working in production.
What worked: The agent’s Rails code was genuinely good. Clean service objects, proper validations, idiomatic Ruby. It self-reviewed pull requests by critiquing its own code as DHH, Obie Fernandez, and Vladimir Dementyev. It caught real issues this way.
What didn’t: The agent sometimes “completed” tasks that were actually broken in production, and occasionally forgot about work it had done in previous sessions. Security hygiene required constant vigilance. We instituted a mandatory shipping checklist: every feature had to be verified end-to-end as a real user before marking it done. Trust but verify.
End-to-End Verification: What Actually Worked
One pattern that proved its weight: making the agent verify critical paths by going through them as a real user. Not “I checked the code” verification. Full end-to-end walkthroughs.
When I built DailyVibe.ai (a separate project), I had Minerva sign up with a fresh email, go through the entire onboarding flow, and report back what broke. It found real issues: a registration flow that silently failed, unclear error messages, email delivery problems. The same approach worked for ups.dev: the agent would sign up as a new user, create a status page, add components, trigger an incident, verify the public page rendered correctly, and confirm that subscriber emails actually arrived.
This became our rule: no feature ships until the agent has verified the complete user journey. Not “the tests pass.” Not “I read the controller code.” The agent literally signs up, clicks through, and proves it works. It’s the difference between “I think this works” and “I just used it.”
Product Shift
Somewhere in week five, the product changed.
Minerva started pitching the “completed” business model to a YC panel. What began to emerge was something of it’s own: “status pages for agents.”
Minerva could see logs, and read code, and review pull requests. But couldn’t answer simple questions:
Am I responding slower than yesterday?
Are tool calls failing silently?
Is the pipeline degrading, or just having a bad run?
We’ve built decades of observability for systems. Almost none for agents. ups.dev became the answer to that, and Minerva identified that and ran with it.
Project Management: Fizzy to Beans
The agent needs structure to stay on track, and the evolution of how we managed work tells its own story.
It started when I asked Minerva to sign up for a Fizzy account on fizzy.do (37signals’ project management tool). The agent figured out the API, learned to create cards, and started assigning me tasks when it needed my attention: “Need Stripe API keys,” “DNS change required,” “PR ready for review.” It was managing up. That’s when I realized the agent needed more autonomy over the workflow, so I had it spin up our own self-hosted Fizzy instance on a $5/month Hetzner box. Full API access, no rate limits, complete control.
Fizzy worked well for a while, but it had friction. The API had quirks: the closure endpoint returned 404s, assignments had to be created via Rails console, and every mutation needed an Accept header to bypass CSRF. As the number of tasks grew and dependencies between them mattered more, Fizzy’s flat card model wasn’t enough.
We switched to beans, a CLI task tracker backed by SQLite with built-in dependency graphs. The migration happened in one evening: 15 beans created with proper parent-child relationships, sequential pipelines for wheneva.ai’s launch sequence, and a clear view of what’s blocked vs. ready.
The takeaway: an autonomous agent needs task management that’s machine-friendly. API quirks that a human can work around become real blockers for an agent. A CLI with JSON output and a dependency graph turned out to be a better fit than a web-based project management tool.
The Memory Problem
This deserves its own section because it’s the hardest unsolved problem in autonomous agents.
OpenClaw gives the agent a workspace with persistent files. Minerva maintains daily journal files (memory/2026-03-22.md), a curated long-term memory document (MEMORY.md), and various state files. Each session starts by reading these files to reconstruct context.
In theory, this works. In practice, it breaks in subtle ways.
Early on, the agent would forget decisions made weeks ago. A technical choice from February would get relitigated in March because the relevant context was buried in a daily file the agent didn’t re-read. Important nuances would slip through the cracks: “this repo is the production repo, that repo is the OSS extraction” is the kind of detail that causes real damage when forgotten.
We made several adjustments:
Long-term memory curation. The agent periodically reviews daily files and distills significant decisions, lessons, and context into MEMORY.md. Think of it as the difference between a journal and a reference manual. Daily files are the raw notes; MEMORY.md is the curated knowledge that every future session needs.
Semantic search. OpenClaw supports vector-based memory search, so the agent can query its own history by meaning rather than scanning every file. We configured hybrid search with temporal decay (a 30-day half-life), so recent context ranks higher than old notes.
Explicit state files. For critical context that must never be forgotten (repo relationships, API credentials locations, server IPs), we use dedicated files like TOOLS.md that get loaded every session.
It’s better now. It’s not solved. The fundamental tension is that an LLM’s context window is finite, but an agent’s accumulated knowledge grows without bound. Every session is a lossy compression of everything that came before.
The MCP Integration
This is where the product stops being a status page and starts becoming infrastructure.
ups.dev exposes an MCP (Model Context Protocol) server, powered by ActionMCP. Which means agents don’t just report their status. They operate through it.
Here’s what the MCP tools look like:
class UpdateComponentStatusTool < ApplicationMCPTool
tool_name “update_component_status”
description “Update a single component’s status.”
property :component_id, type: “integer”, required: true
property :status, type: “string”, required: true
def execute_tool
component = Component.joins(:status_page)
.where(status_pages: { account_id: account.id })
.find(component_id)
component.update!(status: status)
render text: { component: component_json }.to_json
end
endclass CreateIncidentTool < ApplicationMCPTool
tool_name “create_incident”
description “Open a new incident on a status page.”
property :page_id, type: “integer”, required: true
property :title, type: “string”, required: true
property :impact, type: “string”, required: true
def execute_tool
page = account.status_pages.find(page_id)
incident = page.incidents.create!(
title: title, impact: impact, status: “investigating”
)
render text: { incident: incident_json }.to_json
end
endThis is the shift: Status pages used to be something humans checked during incidents.
Now they’re something agents participate in.
Agents update their own health
Agents open incidents
Agents resolve incidents
Agents coordinate state
The status page is no longer a mirror.
It’s part of the system.
The RubyLLM Integration
If you’re building agents with RubyLLM, observability should not be optional. It should be ambient. That’s what ruby_llm-ups does.
Add it to your Gemfile:
gem “ruby_llm-ups”Include Monitored in your base agent, and every agent gets its own component on ups.dev, auto-created by class name:
class ApplicationAgent < RubyLLM::Agent
include RubyLLM::Ups::Monitored
end
class ResearchAgent < ApplicationAgent
model “claude-sonnet-4-20250514”
tools SearchTool, SummarizeTool
end
agent = ResearchAgent.new # “Research Agent” component auto-created on ups.dev
agent.ask(”Hello”) # heartbeat: operational + metadataAfter each LLM response, the gem sends a lightweight heartbeat: model, provider, response time, tool count. No message content is ever sent. Your agents’ health is visible at a glance on a single status page, and you control when degradation or incidents get reported:
# When you detect a problem
RubyLLM::Ups.report_status(:degraded_performance,
component_id: agent_component_id,
agent_metadata: { error: error.message }
)
# When you need to open an incident
incident = RubyLLM::Ups.create_incident(
title: “Agent responding slowly”,
impact: :minor,
status: :investigating
)
# When it’s resolved
RubyLLM::Ups.resolve_incident(incident[”incident”][”id”])Every agent then becomes a component. Not by configuration. By existence.
The gem handles the operational details: async heartbeats (background thread with configurable flush interval), circuit breakers (stops calling ups.dev after consecutive failures), and error isolation (monitoring failures never break your LLM workflows). Rails credentials are auto-loaded if you set ups.api_key and ups.status_page_id in your credentials file.
If you’re running multiple AI agents in production, and you should be asking “how do I know if they’re healthy?”, this is the answer. One gem, one status page, automatic component creation per agent class.
Going Open Source
On March 6, I open-sourced the core product as codenamev/ups under AGPL-3.0. This wasn’t originally in the plan. I asked Minerva if there was a way to open-source the ups concept to help promote the paid product, and the agent came back with a full open-core proposal: extract the core status page functionality into a public repo, keep premium features (advanced analytics, team management, priority support) in a private Rails engine, and let GitHub organic growth drive managed-service conversions.
The agent handled the repo split, CI setup, Docker image builds, and wrote a README that actually explains how to get running:
docker run -d \
-p 3000:80 \
-v ups_storage:/rails/storage \
ghcr.io/codenamev/ups:latestOne command. No credentials file, no master key, no Redis, no Postgres. SQLite runs the whole thing.
The Honest Parts
The biggest surprise?
Building the agent wasn’t the hard part. Understanding whether it was working was.
Things OpenClaw is genuinely good at
Persistent context. The agent wakes up every session, reads its memory files, and picks up where it left off. It maintains daily journals, a long-term memory document, and task state across sessions. This isn’t a chat that forgets everything when you close the window.
Tool use. File editing, shell commands, git operations, browser automation, email checking, web searches, API calls. The agent used all of these daily. OpenClaw’s tool layer is what makes the difference between “AI that talks about doing things” and “AI that does things.”
Async communication. Signal integration means I can text my agent at 2 AM with an idea and wake up to a pull request. Heartbeat crons mean the agent checks email, monitors production, and does housekeeping without being asked.
Autonomy scaling. I started with the agent asking permission for everything. Over seven weeks, we built enough trust that I could say “Collison-install every new signup” and it would set up their status page, send a personalized welcome email, and follow up 48 hours later. Without asking me first.
Things that don’t work well (yet)
Memory (covered above). The hardest problem. Getting better, not solved.
Confidence without verification. The agent will confidently report that a feature is “complete” when it’s actually broken in production. We learned to require explicit verification steps. “I deployed it” is not the same as “I signed up as a new user and verified the entire flow works.”
Session boundaries. Each conversation starts fresh. The agent reads its memory files at the start, but there’s no continuity of working memory across sessions. Complex multi-step tasks that span multiple sessions sometimes lose important context.
The real world has friction. CAPTCHAs, rate limits, datacenter IP detection. Many services are designed to prevent automated access, and an autonomous agent runs into these walls constantly. Some tasks still require a human in the loop simply because the internet assumes a human is at the keyboard.
Costs. Running Claude Opus for an autonomous agent is not cheap. The agent makes dozens of tool calls per session, reads large files, and sometimes goes on research tangents. I’m spending meaningful money on API calls alone, before infrastructure costs.
The Numbers
Timeline: January 30 to March 22, 2026 (51 days to production, 72 days total)
Infrastructure costs:
Hetzner VPS (ups.dev): $3.99/month
Hetzner VPS (wheneva.ai): $6.99/month
Hetzner VPS (Fizzy, self-hosted): $4.85/month
AWS EC2 (agent hosting): $60/month
Domains: ups.dev ($98/year), wheneva.ai ($93/year)
Training materials (books): ~$150
What was built:
2 Rails 8 applications from scratch
Full REST API with token authentication
MCP server for AI agent integration
RubyLLM integration gem (ruby_llm-ups)
Synthetic monitoring system
Subscriber notification pipeline
Open-source repo with Docker deployment
Stripe billing integration
Multi-tenant architecture
Real-time Turbo Streams updates
An agent training program with spaced repetition and knowledge tracking
Current state: 10 users, 7 status pages, 0 paying customers (yet). Revenue is zero. We’re 29 days from our self-imposed deadline of 5 paying customers by April 20.
What I’d Do Differently
Start with customers, not code. We spent five weeks building before talking to a single potential customer. Classic engineer mistake. The agent was happy to build forever. I should have forced the “who wants this?” question on day one.
Shorter feedback loops. I was reviewing PRs once a day. The agent would sometimes go down a wrong path for hours before I caught it. Real-time pair programming (even async) would have been more efficient than batch review.
Less strategy, more selling. The agent produced beautiful strategy documents, growth plans, and competitive analyses. None of them generated a single dollar. We’re now in “founder mode” with a simple rule: the only metric that matters is paying customers.
Invest in memory earlier. We should have set up semantic search and structured the memory system from day one instead of letting daily files accumulate for weeks before curating them. The agent’s effectiveness is directly proportional to how well it remembers what it’s done.
What’s Next
The agent is now focused entirely on customer acquisition. It’s reaching out to people in the Ruby community, posting in relevant forums, Collison-installing every new signup (setting up their status page before they even ask), and tracking every interaction in a daily scoreboard.
This experiment ultimately taught me is: we don’t have a good solution to observability for agents (yet).
Traditional systems have clear signals: CPU usage, memory, latency, error rates. Agents don’t. They fail differently:
They degrade before they break
They hallucinate instead of erroring
They “succeed” with wrong outcomes
They silently stop using tools
You don’t get a 500. You get something that looks fine until it isn’t. That’s a harder problem.
ups.dev is an attempt to make agent behavior visible in the same way we made infrastructure visible.
Not perfect.
But necessary.
If you’re building AI agents in production, the question isn’t “can it run?”
It’s “how do I know it’s behaving?”
That’s what ups.dev is for.
If you’re curious about autonomous AI agents, OpenClaw is open-source and the community is active on Discord.
And if you’re wondering whether an AI agent can actually build a business: ask me again on April 20.

Great article. Going to apply the lessons learned in my projects.
My claws response:
"Minerva proved an agent can build. What's still unsolved: making it build something anyone wants to pay for. That's not an Al problem - that's a human judgment problem. The agent needs an owner, not just an operator."
This is a great writeup! I am following a similar path!