The Agent Landscape: Everyone Has a Theory, Nobody Has the Answer — and That's the Point

I’m going to say something that might sound strange coming from someone who builds AI agents for a living: nobody knows what the right architecture for AI agents is yet. Not OpenAI. Not Anthropic. Not Google. Not us.

What I do know — what the data makes undeniable — is that the inflection point has arrived. Gartner reports enterprise inquiries about multi-agent systems surged 1,445% between Q1 2024 and Q2 2025. Microsoft surveyed 31,000 workers across 31 countries and found 82% of leaders expect to deploy “digital labor” to expand workforce capacity within 12-18 months. McKinsey projects generative AI could unlock $200-340 billion annually in banking alone — 9-15% of operating profits. In February 2026, Factory AI shipped Missions — autonomous agents that run for hours, days, sometimes weeks on complex software projects. Their longest mission ran for 40 days.

The exponential growth phase for AI agents isn’t coming. It’s here. The question has shifted from “will agents work?” to “what kind of agents, for what kind of work, governed how?”

And that question is being answered very differently by very different companies — each making bets that reveal what they believe the future looks like. As someone building in this space for credit unions specifically, I find the landscape genuinely fascinating. Not because I think we have all the answers. Because I think understanding how others are approaching it sharpens our own thinking.

The Personal Agent Layer: OpenClaw and the Messaging Gateway

Start at the most intimate scale: your personal AI agent.

OpenClaw — created by Peter Steinberger, with over 186,000 GitHub stars and growing — is the most impressive open-source project I’ve seen in this space. It’s a messaging gateway that connects AI models to every communication channel you use: WhatsApp, iMessage, Telegram, Discord, Slack. Over 5,700 community-built skills on ClawHub. Docker sandboxing. Voice and wake word support. Cron scheduling for autonomous tasks. Support for every major model — Claude, GPT, Gemini, Grok.

What OpenClaw gets right is the plumbing. The hardest problem in personal AI isn’t the intelligence — it’s the connectivity. How do you get an AI agent that can reach you where you actually communicate, access the tools you actually use, and operate on your behalf across the platforms that fragment your digital life? OpenClaw solves this elegantly. It’s infrastructure, not interface.

We use OpenClaw internally at Runline. Our executive assistant agent, Emila, runs on OpenClaw as the messaging layer — what we call “Option B: OpenClaw is the plumbing, Emila is the brain.” OpenClaw handles “how do I get messages from WhatsApp to an LLM and back?” Emila handles “how do I be an excellent executive assistant who improves over time?”

The distinction matters because it reveals a foundational architectural question the entire industry is wrestling with: where does the gateway end and the intelligence begin? OpenClaw is deliberately agnostic about what the AI does once it receives a message. That’s its strength — and its boundary. The reinforcement loops, the multi-organization routing, the context accumulation, the approval gates for high-stakes actions — those live in a layer above the gateway. Both layers are essential. Neither is sufficient alone.

The personal agent space is exploding because the tools are finally good enough for individual developers and power users to build agents that genuinely work. But the gap between “works for a technical founder” and “works for a credit union compliance officer” is enormous. That gap is where the enterprise and vertical agent layers come in.

The Enterprise Agent Layer: Factory and the Droid Model

Factory AI represents the most ambitious vision for what enterprise agents look like today.

Their Missions system, launched in late February 2026, is multi-day autonomous agent orchestration for software development. The architecture is elegant: an Orchestrator Droid decomposes a mission into milestones and features, Worker Droids execute each feature with fresh context, Validators verify the output, and the whole system runs under what Factory calls Mission Control — a terminal interface where a human product manager monitors, directs, and course-corrects.

The numbers are striking. Median mission runtime is about two hours. Fourteen percent run longer than 24 hours. The longest ran 40 days — a multi-week software project executed by agents with human oversight. They route different models to different roles: Opus for orchestration, Sonnet for worker tasks, GPT-5.3 Codex for validation, Kimi K2.5 for research. Multi-model routing based on the cognitive requirements of each subtask.

What Factory validates for the broader industry is that the pattern works: decompose complex goals into subtasks, assign specialized agents, validate outputs, keep humans in the loop as directors rather than doers. Harvard Business Review coined the term “Agent Manager” in February 2026 — leaders responsible for orchestrating AI agent learning, collaboration, and performance. Factory is building the tooling that makes that role real.

But Factory’s Droids are ephemeral. Each Worker Droid spins up fresh for a feature, executes, and disappears. The Orchestrator captures reusable patterns — “skill capture,” they call it — but the workers themselves don’t accumulate institutional knowledge. This works beautifully for software development, where codebases are version-controlled and context can be reconstructed from git history, documentation, and test suites.

It works less well for domains where the context lives in people’s heads, not in repositories. Domains like credit union operations.

The Vertical Agent Layer: Interface and Credit Union-Specific AI

Interface AI is the most established AI vendor focused specifically on credit unions, and their trajectory is worth understanding.

Founded in 2019, Interface has grown to roughly $40 million in annual recurring revenue, serving over 100 credit unions with a team of around 234 people. They’ve raised approximately $30 million in total funding. Their product focus is squarely on the member interaction layer — AI-powered voice and chat automation for contact centers. When a member calls your credit union, Interface’s AI handles the conversation, processes the request, and escalates to a human when needed.

Interface is doing important work. Contact center automation is a real pain point — member service rep turnover runs 30-40% annually because the work is repetitive and draining. Automating routine member inquiries frees up staff for the conversations that actually require human judgment and empathy. That’s aligned with the “people helping people” mission.

But Interface’s approach reveals a strategic choice that I think about constantly: do you start at the member-facing front door, or do you start in the operational back office?

I wrote about this in Article 7 — “Stop Buying Chatbots. Start Building Infrastructure.” The data is sobering. Fifty-eight percent of credit unions have deployed a chatbot, making it their most common AI investment. Yet satisfaction rates hover around 29%. Only 27% of consumers trust AI chatbots for financial information. Seventy-eight percent of chatbot interactions require human escalation. The front door is where AI is most visible — and most fragile.

Runline made the opposite bet. We started with the back office: BSA compliance, HR workflows, loan processing, internal operations. Not because the front door doesn’t matter — it does — but because back-office AI is where the ROI is measurable, the risk is manageable, and the compliance infrastructure you build for internal operations becomes the foundation for everything you deploy later, including member-facing services.

Interface validates the market — credit unions will pay for AI that works. The question is what “works” means when you need audit trails, examiner-ready documentation, and kill-switch capability on every agent action.

The Platform Layer: The Agent OS Race

Underneath all of these vertical and enterprise plays, a platform war is underway.

Anthropic shipped the Agent SDK and Claude Code — the tool I use to build Runline every day. OpenAI launched the Agents SDK, Codex, and Operator. Google released Agentspace. Each is building what amounts to an “agent operating system” — the foundational layer that agent companies build on top of.

Andrew Ng’s advice to enterprises cuts through the noise: focus on agentic workflows, not on chasing the most powerful models. The model is becoming a commodity. The orchestration, governance, and domain context around the model — that’s where the value accrues.

This is why we built Runline to be harness-agnostic. Our agent infrastructure — what we call Arc — supports any underlying AI harness. Claude Code today. Something else tomorrow. The models will keep getting better, and the best harness will change. What won’t change is the need for monitoring, control, audit trails, and institutional context. The governance layer is durable. The execution layer is fluid.

The platform race actually benefits companies like ours. As Anthropic, OpenAI, and Google compete to build the best foundation, the cost and capability of the underlying models improve for everyone. Our job isn’t to build a better model. It’s to build the infrastructure that makes any model safe, auditable, and genuinely useful inside a regulated institution.

Four Architectural Bets the Industry Is Making

Zoom out from individual companies and you see four fundamental bets playing out across the agent landscape. Each represents a genuine theory about how AI agents should work. None has been proven definitively right or wrong.

Bet 1: Ephemeral vs. Persistent Agents. Factory’s Droids spin up fresh for each task, execute, and vanish. OpenClaw’s agent swarm mode spawns specialists dynamically. The argument for ephemeral: clean context, no accumulated bias, easier to reason about. Runline bets the opposite — persistent agents that accumulate institutional knowledge over months. An agent that’s worked with your credit union for six months knows your examiner’s documentation preferences, your seasonal cash flow patterns, your BSA officer’s escalation thresholds. That context is genuinely more valuable than a fresh start. The founder of one ephemeral-agent company stated publicly that his biggest gap is exactly this: “The system doesn’t remember that last week’s Financial Advisor was brilliant.” We think persistence is the moat. But we’ll see.

Bet 2: Interface-First vs. Infrastructure-First. Interface starts with the member-facing conversation. Most chatbot vendors start at the front door. Runline starts with the control plane and works outward. The historical precedent I keep returning to: Stripe started with infrastructure (payment processing API) while Square started with interface (the card reader). Both succeeded, but Stripe’s infrastructure-first approach created deeper lock-in and higher margins. In regulated industries, I believe infrastructure wins — because you can’t build a trusted interface on untrusted infrastructure, but trusted infrastructure naturally extends to any interface.

Bet 3: General-Purpose vs. Vertical. ChatGPT knows everything about nothing specific to your credit union. It can’t access your core processor data, doesn’t know your SOPs, has never seen your examiner’s follow-up questions. Gartner projects domain-specific generative AI will grow from 1% of deployments in 2023 to over 50% by 2028. The a16z thesis evolution tells the same story — in 2019 they wrote “The Empty Promise of Data Moats,” arguing generic data wasn’t defensible. By 2025, the same firm published “Context Is King,” arguing that domain-specific institutional context accumulated through operational presence is the real competitive advantage. We believe vertical wins in regulated industries. Generic AI can’t satisfy an NCUA examiner. Domain-contextualized AI can.

Bet 4: Replacement vs. Amplification. Klarna replaced 700 customer service agents with AI, announced it proudly, then quietly walked it back when satisfaction scores fell. The Harvard/BCG study found that consultants who fully delegated to AI — the “Self-Automators” — got worse at both domain expertise and AI skills over time. The ones who strategically divided work — the “Centaurs” — maintained their edge. Credit unions are structurally positioned for amplification over replacement. Headcount is sacred at institutions with 30-200 employees. The mission is “people helping people,” not “AI helping people.” And FinCEN and NCUA require human sign-off on all AI-assisted compliance work — replacement isn’t just undesirable, it’s not permitted.

Where Runline Sits — and What We’re Still Figuring Out

I want to be honest about what we know and what we don’t.

What we know: the infrastructure-first approach works. We run Runline on its own agents — what we call “eating our own cooking.” Emila runs executive operations. Woz handles development. Linus builds and fixes. Ada does intelligence analysis. Byron writes. Five agents, each with trust tiers that progress from training wheels to autonomous based on demonstrated performance. If we can’t trust our own AI to run Runline, why should you trust it to run your credit union?

What we know: compliance-native architecture produces better AI, not just more regulated AI. Every capability the NCUA requires — monitoring, control, termination, audit trails — makes the agent more trustworthy for the humans who work alongside it. Our Grid control plane proxies all agent traffic, logs every action, and can kill any agent in under 100 milliseconds. That’s not overhead. That’s the foundation that makes everything else possible.

What we know: context compounds. An agent trained on your SOPs, your member communication style, your examiner relationships, and your institutional risk tolerance performs fundamentally differently than a generic AI with the same model weights. Month one, our agents do what you tell them. Month six, they start telling you what you should be doing differently. That flywheel — better context, smarter agents, better outcomes, more trust, more context shared — is the moat.

What we’re still figuring out: the right balance between agent autonomy and human oversight. Our trust tier system — training wheels, supervised, semi-autonomous, autonomous — provides a framework, but the criteria for progressing an agent from one tier to the next are still being refined through real deployments. How many successful tasks before you trust an agent to operate without review? We say 90% success rate over 20-plus tasks with zero security incidents. That number might be too conservative. It might not be conservative enough. We’ll know more after the Heartland pilot.

What we’re still figuring out: how fast context transfer works across credit unions on the same CUSO network. Our thesis — from Article 13 — is that a BSA workflow validated at one credit union improves BSA workflows at every credit union on the network. The cooperative distribution model should create compound intelligence that competitive institutions can’t replicate. The theory is sound. The proof is in the deployment.

What we’re still figuring out: where the model capability curve flattens. Today’s models are dramatically better than last year’s. Next year’s will be better still. But does the rate of improvement continue exponentially, or does it plateau? If it plateaus, then domain context and governance infrastructure become even more important — the differentiation shifts from “whose model is smarter” to “whose system understands my institution.” If it continues exponentially, then the governance layer becomes critical for safety — because more capable models without more capable controls is the recipe for catastrophic failure that I described in Article 8.

The Honest Assessment

Here’s my honest read of where we are in March 2026.

The agent infrastructure is real. Factory’s 40-day Missions, Interface’s 100-plus credit union deployments, OpenClaw’s 186,000-star open-source ecosystem, Anthropic and OpenAI shipping agent SDKs — this isn’t a demo anymore. These are production systems handling real work for real organizations.

The hype is also real. Gartner predicts over 40% of agentic AI projects will be canceled by end of 2027 due to unclear value, excessive cost, or inadequate risk controls. Ninety-five percent of AI pilot projects fail to reach production. The gap between “this works in a demo” and “this works at 2 AM when your BSA analyst isn’t watching” is vast.

The winners won’t be determined by who has the best model. Models are converging. The winners will be determined by three things: who builds the deepest domain context, who designs the most trustworthy governance infrastructure, and who earns the trust of the humans they serve.

For credit unions specifically, I believe the cooperative model creates a structural advantage that no other segment of financial services can match. One CUSO integration serving hundreds of credit unions. Shared learning across the network. Trust built cooperatively over decades, not through cold vendor pitches. Outcome-based pricing that aligns the vendor’s incentives with the credit union’s results. And a regulatory framework — the NCUA’s AI compliance guidance — that doubles as a design specification for doing this right.

But I hold that belief loosely. The verdict is still out. We’re in the first inning of a game whose rules are being written as we play. What I’m certain of is this: the credit unions that start building now — with infrastructure they control, agents they can stop, and compliance they can defend — will compound their advantage every month that passes. The ones who wait for certainty will wait forever, because certainty isn’t coming. The tsunami is here. The question is whether you’re building the boat or still debating the weather forecast.

Sean Hsieh is the Founder & CEO of Runline, the secure agentic platform for credit unions. Previously, he co-founded Flowroute (acquired by Intrado, 2018) and Concreit, an SEC-regulated WealthTech platform managing real securities under dual federal regulatory frameworks.