The Company Context Layer: A Practitioner's Guide to Making AI Agents Actually Useful in Regulated Financial Services

How one team built the knowledge infrastructure that turns generic AI into institutional intelligence — and what credit union leaders need to know before their next board meeting.

Every AI vendor promises intelligence. None of them ship context.

This paper describes the architecture, deployment, and compliance implications of a Company Context Layer — a semantic search service that indexes institutional knowledge and makes it available to every AI agent in an organization. We built one. Here’s what we learned.

1. The $4,000 Cash Deposit

Maria owns a flower shop on Main Street. Every Tuesday, she deposits roughly $4,000 in cash. Your BSA analyst knows this. She glances at the alert, recognizes the pattern, and clears it in three seconds. She’s done it hundreds of times.

Now ask ChatGPT: “Is a $4,000 weekly cash deposit suspicious?”

It gives you a textbook answer. Structuring thresholds. CTR requirements. Red flags for money laundering. Accurate. Comprehensive. Useless.

The answer your BSA analyst gives — “That’s Maria, she runs the flower shop, this is her normal Tuesday deposit” — requires something ChatGPT doesn’t have. Not more intelligence. More context.

This gap between AI that knows everything and AI that knows you is the central challenge of deploying AI in regulated financial services. And it’s a gap that no amount of model improvement will close, because the knowledge that matters most — your SOPs, your examiner’s preferences, your members’ patterns — isn’t on the internet. It’s in your institution.

In our experience working with credit unions, we estimate 80% or more of operational knowledge is undocumented. It lives in people’s heads. In institutional habits. In the way Linda in compliance has always handled wire transfer reviews. In the fact that your examiner flagged weak CTR documentation last cycle, so your team has been over-documenting ever since.

The most valuable knowledge for AI to have is precisely the knowledge that generic AI cannot have.

This paper introduces a specific piece of infrastructure — what I call the Company Context Layer — that bridges this gap. It’s not a product pitch. It’s an architecture we built, deployed, and tested against real credit union operations. I’ll walk through why it matters, how it works, what we learned, and what you can do about it starting Monday morning.

2. Why Generic AI Fails in Regulated Industries

Generic AI fails in credit unions for three structural reasons. Not because the models are bad — they’re extraordinary. But because intelligence without context is liability in a regulated environment.

The Knowledge Gap

Foundation models are trained on the internet. Your BSA policy, your lending guidelines, your member communication templates, your examiner’s prior findings — none of that is on the internet. The most capable model in the world can’t reference a document it’s never seen.

This isn’t a solvable problem at the model layer. OpenAI can’t train GPT on your internal procedures. Anthropic can’t include your examiner’s preferences in Claude’s weights. The knowledge gap is structural, and it requires infrastructure — not better models — to close.

Morgan Stanley understood this. They indexed 350,000 internal documents — research reports, product guides, regulatory filings, client communication templates — and gave 16,000 financial advisors AI-powered access. Adoption hit 98% within months. Research that used to take 30 minutes took seconds. The AI wasn’t smarter than ChatGPT. It was contextualized.

The Hallucination Liability

In unregulated domains, AI hallucination is annoying. In financial services, it’s a compliance violation.

Production AI hallucination rates range from 3% to 27% depending on the model and context. Researchers have documented over 480 cases of lawyers submitting AI-hallucinated citations to courts — fake case law that sounded authoritative but didn’t exist. Over 120 lawyers have been sanctioned.

In credit union compliance, a plausible-sounding-but-wrong policy interpretation is worse than no answer at all. When your AI agent tells a BSA analyst that a transaction pattern is consistent with your exemption policy — and it’s wrong because it doesn’t actually know your exemption policy — you don’t have a technology problem. You have a regulatory exposure.

The Air Canada case made this concrete: a chatbot told a customer about a nonexistent bereavement fare discount. The court held Air Canada liable. The credit union retains the liability, not the AI vendor.

TD Bank’s $3.09 billion penalty in October 2024 — the largest BSA/AML enforcement action in U.S. history — wasn’t about bad actors. It was about bad infrastructure. $18 trillion in transactions went unmonitored. $671 million in money laundering flowed through unchecked. The penalty was for the monitoring gap, not the criminal activity. TD Bank is a $400 billion institution, but the NCUA applies the same BSA/AML examination standards to a $200 million credit union. The infrastructure gap that cost TD $3 billion exists at every asset tier — it’s just less visible until the examiner finds it.

The Retirement Cliff

11,200 Americans turn 65 every day through 2027. When your 20-year BSA analyst retires, she takes with her every pattern she recognizes, every examiner quirk she’s learned, every judgment call she makes in three seconds that would take a new hire thirty minutes.

That knowledge — the deep institutional context that takes years to accumulate — walks out the door. No exit interview captures it. No training manual documents it. And no generic AI replicates it.

The question isn’t whether AI can help. It’s whether you’ll have the institutional knowledge infrastructure in place to make AI useful before the people who hold that knowledge leave.

3. The Five Layers of Institutional Context

Not all context is created equal. I think about institutional knowledge in five layers, each progressively harder to capture, more valuable to have, and more at risk of being lost.

Layer 1: SOPs and Policies

Written procedures — BSA policy, lending guidelines, HR handbook, member service protocols. This is the context most people think of first, and it’s the easiest to index.

But “easy” is relative. At most credit unions I’ve worked with, SOPs are scattered: PDFs on shared drives, Word documents on someone’s desktop, a binder in the compliance office that hasn’t been updated since 2019. A CUSO partner described their situation as SOPs “sprinkled across people’s computers, tribal knowledge in people’s heads.” No centralized, searchable, AI-accessible library. This is the norm, not the exception.

Layer 2: Communication Style

How your credit union talks to members is a competitive differentiator. “Dear Member” or “Hi Sarah”? Warm and casual or professional and precise? Sign off with “Your CU Team” or individual names?

An AI agent drafting member communications without absorbing your voice produces generic financial-services boilerplate. Members notice. Communication style is context that shapes every member interaction, and most CUs have never documented it.

Layer 3: Operational Patterns

Maria’s Tuesday deposits. The construction company’s seasonal revenue cycle. The university town where student loan disbursements spike every August and January.

These patterns don’t exist in any database. They’re observations accumulated over years of experience — the reason a 20-year BSA analyst can glance at an alert and clear it in three seconds while a new hire would spend thirty minutes investigating. Layer 3 is where AI stops being a filing cabinet and starts being a colleague.

Layer 4: Regulatory Relationships

Every credit union has a relationship with its examiner. Examiners have preferences, areas of focus, and expectations shaped by prior findings. Your examiner flagged weak CTR documentation last cycle? Your AI should know that. Your examiner cares more about SAR narrative quality than CTR timeliness because of a finding from three years ago? That context shapes every compliance decision your team makes.

No generic AI vendor can deliver Layer 4 context. It’s unique to your institution.

Layer 5: Risk Tolerance and Institutional Values

Board appetite for indirect lending. Conservatism on real estate concentration. Commitment to small-dollar consumer loans that larger institutions won’t touch. These aren’t written policies — they’re cultural values that shape every operational decision.

An AI agent making recommendations without understanding your institutional risk tolerance is like a financial advisor who’s never met the client. Technically capable. Contextually blind.

Here’s the structural insight: each layer is harder to replicate, more valuable, and more at risk of retirement loss. Anyone can index your written SOPs — that’s Layer 1. But understanding that your examiner cares more about SAR narrative quality than CTR timeliness because of a finding from three years ago? That’s Layer 4. Most AI-for-credit-unions vendors stop at Layer 1. Layer 5 is the moat.

a16z — the most prominent venture capital firm in technology — published “Context Is King” in August 2025, arguing that AI itself is not a moat but context is. Generic foundation models are commoditizing. What’s defensible is the proprietary context layer that makes AI useful for a specific organization. Their own thinking evolved: in 2019, they wrote “The Empty Promise of Data Moats,” arguing that generic data isn’t defensible. By 2025, they recognized that domain-specific institutional context — accumulated through operational presence — absolutely is.

Gartner projects domain-specific AI deployments will grow from 1% in 2023 to over 50% by 2028. The market is moving from generic intelligence to contextualized intelligence. The question is whether your institution will have the context infrastructure in place when it does.

4. Why “Just RAG” Isn’t Enough

If you’ve been following AI developments, you’ve heard of RAG — Retrieval-Augmented Generation. The concept is straightforward: instead of relying solely on a model’s training data, retrieve relevant documents from a knowledge base and include them in the prompt. The model gets context it wouldn’t otherwise have.

RAG is the right starting direction. But naive RAG — dump your documents into a vector database, embed everything, retrieve the top chunks by similarity — fails for regulated industries in specific, predictable ways.

The Needle Problem

Vector similarity search finds semantically related content, not necessarily the right content. Ask “What’s our CTR exemption policy?” and a vector search might return the BSA training manual’s general description of CTRs instead of your specific exemption list — because the training manual has more text about CTRs and scores higher on semantic similarity.

Meanwhile, a keyword search for “CTR exemption” would find the right document instantly.

We tested this empirically. In a benchmark of 2,001 queries against a 103-file credit union knowledge base spanning BSA/AML, lending, governance, IT security, and operations, vector-only search achieved an NDCG@10 (a standard measure of search quality where 1.0 is perfect and 0.0 is random) of 0.183. Plain keyword search (BM25) scored 0.232. But hybrid search scored 0.245 — beating both baselines. Neither retriever alone is sufficient. In a domain-specific corpus where regulatory acronyms matter as much as semantic meaning, you need both signals working together.

Regulatory content is full of specific identifiers — “31 CFR 1020.320(d),” “Form 8300,” “SAR-DI” — where exact matching matters more than semantic similarity. You need both vector search (for conceptual understanding) and keyword search (for precision), but the weighting matters enormously. Get it wrong — weight vectors too heavily — and you destroy the keyword signal that actually finds the right documents.

The Stale Document Problem

Your BSA policy from 2019 and your BSA policy from 2024 both match the query “BSA policy.” Without document versioning or supersession detection, the AI might cite the 2019 version. In compliance, citing an outdated policy isn’t just unhelpful — it’s a specific, auditable risk.

Naive RAG treats all documents as equally current. In a regulated environment, document status — active, superseded, draft — isn’t metadata decoration. It’s a compliance control.

The Context Window Problem

Retrieve 10 chunks and stuff them into a prompt. The model now has more context — and more noise. It has to figure out which of the 10 chunks actually answers the question, which are tangentially related, and which are contradictory because one is from 2019 and another from 2024.

Pre-search filtering — narrowing the search to documents of a specific type, with a specific status, tagged with specific topics — reduces noise before it reaches the model. When a BSA Runner asks for “current CTR exemption policy,” filtering to type: sop, status: active before the search eliminates the entire class of “found the right topic but wrong version” errors.

The Single-Agent Problem

One agent with one vector database works. Five agents across three departments — BSA, lending, member service — sharing a common organizational knowledge base but maintaining per-agent private memory? That’s not a prompt engineering problem. That’s an infrastructure problem.

The BSA Runner needs access to compliance procedures. The lending Runner needs access to underwriting guidelines. Both need access to the same member data policies. Neither should see the other’s private session history or working notes. Managing these access patterns across a fleet of agents requires a service layer, not a shared file.

Here’s the reframe: RAG is a retrieval technique. A Company Context Layer is an organizational capability. The difference is infrastructure — indexing pipelines, hybrid search, metadata filtering, multi-agent access, graceful degradation, versioning, and audit trails. Technique gets you a demo. Infrastructure gets you production.

5. Architecture of a Company Context Layer

What follows is not a theoretical framework. It’s a real architecture that we built, deployed, and tested with our initial partner institutions — credit unions ranging from $200 million to $2 billion in assets. I’m including specific parameters and design decisions because I think the credit union industry deserves practitioner-level detail, not vendor abstractions.

5.1 Three-Tier Knowledge Model

Before you build search, you need to decide what gets searched. We organize institutional knowledge into three tiers:

Tier 1: Agent Memory (private, per-agent). Each AI agent maintains its own workspace — persona, learning history, session notes, working state. Only that agent reads it. This is the equivalent of a new hire’s personal notebook. The BSA Runner’s observations about alert patterns stay in the BSA Runner’s memory. The HR Runner’s notes about benefits inquiries stay in the HR Runner’s memory.

Tier 2: Company Knowledge (shared, version-controlled). Organizational truth that any agent or human should access. SOPs, policy documents, architecture decisions, playbooks, stakeholder profiles, meeting notes, regulatory context. This tier is the heart of the Company Context Layer — version-controlled in Git, with structured metadata on every document.

Tier 3: Cross-Agent Coordination (shared state). Task boards, project status, handoff notes between agents. When your BSA Runner flags a transaction for the lending Runner to review, that handoff happens through Tier 3. This tier matters at scale — when you’re running five or ten agents across departments — but the architecture accounts for it from day one.

Three design decisions deserve explanation:

First, Markdown as the source of truth. Not a database. Not a proprietary format. Plain Markdown files in a Git repository. This means every document is human-readable, Git-diffable, and natively consumable by language models. Your compliance officer can read it. Your AI agent can read it. Your version history shows exactly what changed, when, and by whom.

Second, YAML frontmatter for structured metadata. Every document carries its own classification — document type (SOP, policy, decision record, meeting notes), status (active, superseded, draft), tags, and an optional agent_context field that tells agents what this document is for:

---
title: "CTR Exemption Policy"
type: sop
status: active
tags: [bsa, ctr, exemptions]
agent_context: "Reference for BSA Runner when evaluating CTR filing exemptions"
---

This metadata travels with the document. When someone updates the policy, they update the frontmatter. When the old version is superseded, changing status: active to status: superseded propagates to every agent’s search results automatically.

Third, tier separation prevents leakage. An agent’s private session history (Tier 1) never contaminates shared organizational knowledge (Tier 2). This isn’t just data hygiene — it’s a compliance control. When an examiner asks “what knowledge did the AI use to make this recommendation?” the answer comes from Tier 2: documented, version-controlled, auditable organizational knowledge.

5.2 The Indexing Pipeline

Raw Markdown files become searchable knowledge through a five-stage pipeline:

Stage 1: File Discovery. The system scans configured content directories for Markdown files. It tracks file modification times (mtime) so subsequent runs only re-index changed files. On a 103-file knowledge base, a full index takes about 90 seconds. An incremental re-index after editing one file takes under a second.

Stage 2: Frontmatter Extraction. Before chunking the content, the pipeline parses YAML frontmatter to extract structured metadata — document type, status, tags, title, and agent context. This metadata is stored separately from the content chunks, enabling pre-search filtering.

Stage 3: Paragraph-Aware Chunking. The content (minus frontmatter) is split into chunks of approximately 500 tokens with 50 tokens of overlap between chunks. Critically, the chunker respects paragraph boundaries. A compliance policy that says “Exception: if the member has filed Form X within 30 days, this requirement does not apply” stays in the same chunk as the rule it excepts. Fixed-size chunking — split every 512 tokens regardless of content — would frequently separate exceptions from their rules. In compliance documents, that separation creates exactly the kind of context loss that leads to wrong answers.

Stage 4: Dual Indexing. Each chunk gets indexed twice, into two different systems optimized for different types of retrieval:

DuckDB with BM25 full-text search — keyword precision. When someone searches for “31 CFR 1020.320(d),” BM25 finds the exact regulatory citation. No embedding API required. Works offline. Fast.
LanceDB with vector embeddings — semantic understanding. When someone searches for “what are our obligations when a member’s transaction looks like structuring,” vector search understands the conceptual meaning even if the document doesn’t contain the word “structuring.” We use Voyage AI’s multimodal-3.5 model, which produces 1024-dimensional embeddings.

Stage 5: FTS Index Rebuild. After bulk inserts, DuckDB’s full-text search index is rebuilt for optimal query performance.

5.3 Hybrid Search — What We Tested and What Won

When an agent queries the knowledge base, both search systems run in parallel:

The query is embedded using the same model used for indexing (Voyage multimodal-3.5, in query mode).
LanceDB returns the top N chunks ranked by vector similarity.
DuckDB returns the top N chunks ranked by BM25 keyword relevance.
The two ranked lists are merged using a fusion algorithm.

The question is which fusion algorithm and what weighting. We didn’t guess. We benchmarked.

We generated 2,001 test queries across five categories — exact section headings, entity lookups (form numbers, regulatory acronyms, regulatory citations), exact phrases from document bodies, opening sentences of chunks, and regulatory questions. Each query has a known ground truth: the document the query was generated from should appear in the results. We tested 14 configurations across two fusion algorithms: Reciprocal Rank Fusion (RRF) and Convex Combination. That’s 28,014 individual retrieval evaluations.

The results validated hybrid search — and revealed that weighting matters more than algorithm choice.

Hybrid search beats both baselines. The best hybrid configuration (Convex α=0.4) achieved NDCG@10 of 0.245 — outperforming BM25-only (0.232) by 5.5% and vector-only (0.183) by 33.6%.

BM25 is strong but incomplete. Keyword search scores well on regulatory acronyms (CTR, SAR, OFAC) and exact policy titles, but misses semantically related content. Vector search captures meaning but struggles with the precise terminology that compliance queries demand.

BM25-heavy weighting is optimal. RRF with 2:1 vector-to-BM25 weighting scored 0.217. Flipping to BM25-heavy weighting (Convex α=0.3–0.4, 60–70% BM25) raised NDCG@10 to 0.243–0.245. In compliance content, keywords are the dominant signal and vectors are the supplement.

Convex Combination consistently outperforms RRF. Every Convex configuration beat its RRF equivalent. The best RRF scored 0.230; the best Convex scored 0.245 — a 6.5% improvement with no additional compute cost. This validates Bruch et al. (ACM Transactions on Information Systems, 2023), who demonstrated that convex combination outperforms RRF on the BEIR benchmark suite.

The critical insight is why convex combination wins: it preserves the confidence level of each search engine’s results, while the industry-standard approach (RRF) throws that information away. When your keyword search is highly confident it found the right document — as it is for compliance queries with specific form numbers and regulatory citations — that confidence signal survives the fusion. The standard approach treats a high-confidence match the same as a marginal one. (See Technical Appendix A for the mathematical details.)

In practice, hybrid search catches what either system alone misses. A query about “currency transaction reporting obligations” finds the right policy through semantic similarity (vector) even though the document uses “CTR” not “currency transaction reporting.” Simultaneously, a query for “Form 8300” finds the exact regulatory form through keyword matching (BM25) even though the concepts are semantically distant from the broader compliance context. The key is weighting BM25 heavily — not equally, and certainly not subordinate to vectors.

The lesson for any team deploying RAG in regulated industries: benchmark your retrieval before tuning your prompts. A mid-tier model with excellent retrieval will outperform a frontier model with broken retrieval. Every time. The best model doesn’t win. The best context wins.

Pre-search filtering makes this even more precise. Before running either search, the system can filter by document type, status, or tags:

GET /search?q=CTR+exemption+policy&type=sop&status=active

This query only searches active SOPs — eliminating superseded policies, draft documents, meeting notes, and everything else that might be semantically similar but operationally irrelevant. Filtering happens at the database level, before embeddings are computed, so it’s fast and reduces noise at the source.

5.4 Multi-Agent Access

The Company Context Layer runs as a containerized HTTP service. Any agent on the network can query it. This is a deliberate architectural choice: knowledge is a service, not a library that each agent bundles internally.

In our deployment, the service runs in a container on the primary workstation. Agents on the same machine query it directly. Remote agents — running in cloud containers — reach it through a secure mesh VPN that provides encrypted inter-machine communication without exposing any ports to the public internet.

Every agent queries the same shared Tier 2 knowledge base, but each maintains its own private Tier 1 memory. The BSA Runner’s session history — its observations, its drafts, its working notes — stays in the BSA Runner’s private workspace. The Company Context Layer doesn’t store per-agent state. It serves organizational knowledge.

Graceful degradation is built in. If the embedding API (Voyage AI) is unavailable — network issue, rate limit, service outage — the system falls back to BM25-only search automatically. The BM25 fallback (NDCG@10: 0.232) scores within 5% of the hybrid configuration (0.245) on our compliance corpus. You lose semantic recall for paraphrased queries and conceptual searches, but keyword matching remains strong for the regulatory acronyms and exact terms that dominate compliance work. In a regulated environment, “the AI couldn’t answer because a third-party API was down” is not an acceptable explanation. With this architecture, it’s also not a realistic scenario.

6. Compliance Implications — Why This Architecture Is Examiner-Ready

Every credit union CEO I talk to asks the same question behind closed doors: “How do we get AI past the examiner?” Wrong question. The right question is: “How do we build AI infrastructure that the examiner wishes every credit union had?”

Drawing from NCUA supervisory priorities, risk management guidance, and emerging examination expectations, we identify five areas of focus for AI in credit unions. Each maps directly to specific architectural features of the Company Context Layer.

Risk Management — The three-tier knowledge model isolates risk by design. Agent memory (Tier 1) is private and ephemeral. Company knowledge (Tier 2) is version-controlled in Git — every change tracked, every revision accessible. Document supersession is metadata, not a manual process. When a policy is updated, marking the old version as status: superseded ensures no agent cites outdated guidance.

Monitoring and Control — Every search query is loggable. Which agent queried which knowledge, when, with what search terms, and what results were returned. The audit trail exists at the infrastructure level, not bolted on after the fact. When an examiner asks “what knowledge did the AI reference when it drafted this SAR narrative?” the answer is a database query, not a guess.

Termination Capability — The Company Context Layer is a service. It can be shut down entirely, restricted to specific agents, scoped to specific document types, or rate-limited. Combine this with the Grid — Runline’s AI Control Plane that routes all agent traffic — and you get kill-switch capability in under 100 milliseconds. The examiner doesn’t have to trust the AI. They can verify exactly what it accessed and stop it instantly.

Governance — YAML frontmatter with status: active | superseded | draft means knowledge governance is metadata, not a manual review process. When a compliance officer updates a policy, she updates the frontmatter status. That change propagates to every agent’s search results automatically. No separate workflow. No email notification chain. No risk that one agent is still citing the 2019 version.

Vendor Transparency — All knowledge is in your Markdown files, in your Git repository, on your infrastructure. You can read every document, audit every search result, and understand exactly why the AI said what it said. There’s no black box. No proprietary knowledge format. No vendor lock-in on the knowledge layer. If you switch vendors tomorrow, your knowledge base — and every version of every document — stays with you.

The GAO has noted that NCUA currently lacks examination authority over third-party AI systems used by credit unions. This means the credit union retains full responsibility for AI outcomes, regardless of which vendor provided the technology. Building the Company Context Layer on your own infrastructure, with your own documents, under your own version control isn’t just good architecture. It’s regulatory self-defense.

Research consistently shows the cost of non-compliance runs 2.7 times higher than the cost of compliance — $14.82 million versus $5.47 million on average. The Company Context Layer isn’t a compliance cost. It’s a compliance investment that pays for itself by making every AI interaction auditable, every knowledge source traceable, and every document version recoverable.

A note on member privacy. The Company Context Layer indexes SOPs, policies, decision records, and operational playbooks — not member PII. Transaction patterns and member behaviors referenced in Layer 3 and above are anonymized observations captured by agents during their work, not raw data exports from your core system. The architecture runs on your infrastructure — no member data leaves your network. This is a deliberate design constraint, not a limitation. The knowledge that makes AI useful in compliance is institutional knowledge about how your team works, not personal information about who your members are.

The NCUA’s AI requirements aren’t a burden on your Context Layer. They’re a design specification for building one correctly.

7. What We Learned Deploying This

Theory is clean. Deployment is messy. Here’s what we learned that I wish someone had told us before we started.

Hybrid search matters more than model choice — but weighting matters more than both. We tested 14 configurations across two fusion algorithms — 28,014 individual evaluations on 2,001 queries. Our original production configuration (RRF with 2:1 vector weighting) scored an NDCG@10 of 0.217. The best hybrid (Convex α=0.4, meaning 60% BM25 / 40% vector) scored 0.245 — a 33.6% improvement over vector-only and 5.5% over BM25-only. The lesson isn’t that you don’t need vectors. It’s that in domain-specific compliance corpora, keywords are the dominant signal and vectors are the supplement, not the other way around.

Frontmatter filtering is the highest-leverage optimization. We spent weeks tuning embedding models, chunk sizes, and RRF weights. The single change that improved result quality the most was adding pre-search filtering by document type and status. When a BSA Runner asks for “current CTR exemption policy,” filtering to type: sop, status: active before the search eliminates the entire class of “found the right topic but wrong version” errors. It’s not sophisticated. It’s metadata. And it works better than any re-ranking algorithm we tested.

Paragraph-aware chunking preserves compliance context. This one surprised us. Our first implementation used fixed-size chunking — split every 500 tokens regardless of content structure. It worked fine for narrative documents. It failed for policy documents. A compliance policy that says “All cash transactions over $10,000 require CTR filing. Exception: if the member has a Phase II exemption on file, follow the exemption procedures in Appendix B” — that exception needs to stay in the same chunk as the rule. Fixed-size chunking would split them apart about 30% of the time. Paragraph-aware chunking solved it.

Graceful degradation is non-negotiable in regulated environments. Our embedding provider had two outages during our first month of deployment. Without BM25 fallback, our agents would have lost access to institutional knowledge for hours. With fallback, they switched to keyword-only search automatically. Less intelligent, but reliable. In compliance, “the AI couldn’t answer because Voyage AI had a rate limit issue” is not an explanation your examiner will accept.

The indexing pipeline matters more than the search algorithm. We spent 70% of our engineering time on indexing — file discovery, frontmatter parsing, chunk quality, relationship detection — and 30% on search. That ratio felt wrong at first. In retrospect, it was exactly right. The quality of what goes into the index determines the quality of what comes out. No search algorithm compensates for poorly chunked documents with missing metadata.

Context accumulates — and that accumulation is the moat. Month one, the Company Context Layer has your SOPs. Month six, it has your SOPs plus six months of decisions, meeting notes, examiner feedback, and operational patterns captured by agents during their work. A BSA Runner with 1,000 SAR investigations behind it starts surfacing patterns: “This member’s deposit behavior matches three previously confirmed fraud cases.” “This alert category has a 98% false positive rate — recommend adjusting the threshold.”

The switching cost of a Company Context Layer isn’t vendor lock-in. It’s accumulated institutional intelligence. The same reason you don’t casually replace a 20-year employee — not because of a contract, but because of everything they know that no replacement can replicate overnight.

8. The Compounding Effect — Why Context Is the Moat

The Company Context Layer creates a flywheel that accelerates over time:

Better context leads to smarter agents. Smarter agents produce better outcomes. Better outcomes build trust. More trust means more institutional knowledge gets captured and indexed. More knowledge makes agents even smarter.

Month one, your agents do what you tell them. Month six, they start telling you what you should be doing differently.

This is the retirement preservation play. You don’t capture Linda’s 20 years of BSA knowledge by sitting her down for an exit interview — that captures maybe 20% of what she knows, the parts she can articulate. You capture it by running AI agents alongside her for twelve months, observing her decisions, learning her patterns, and indexing that learning into the shared knowledge base. When she retires, Layers 3 through 5 of her institutional context don’t walk out the door. They’re in the system.

a16z’s evolution on this topic is instructive. In 2019, they wrote “The Empty Promise of Data Moats” — the argument that merely having data isn’t defensible because data can be replicated. By 2025, they published “Context Is King” — the recognition that domain-specific institutional context, accumulated through operational presence, is defensible precisely because it can’t be replicated. It’s unique to the institution. It compounds over time. And it makes every AI interaction more valuable.

Morgan Stanley’s experience validates the pattern. They didn’t build smarter AI. They built contextualized AI. 350,000 documents indexed. 98% adoption within months. Research that took 30 minutes reduced to seconds. The AI wasn’t more intelligent than what anyone else could deploy. It was more contextualized. And that contextualization — accumulated over years of institutional document production — is what made it valuable.

Six months from now, when Maria makes her Tuesday deposit, your BSA Runner clears the alert in three seconds — not because it memorized a rule, but because it learned the pattern from the analyst who used to do it by hand. That’s context working.

The credit union that starts building its Company Context Layer today will have six months of accumulated institutional intelligence by the time its competitor starts evaluating vendors. That gap widens every month. Not because of technology differences — the models are available to everyone — but because of context differences that can only be built through time and operational presence.

9. Getting Started — A Framework for Your Next Board Meeting

I don’t believe in white papers that end with theory. Here’s what you can do starting Monday morning.

Phase 1: Audit Your Knowledge (Weeks 1-4)

Start by understanding what you have and where it lives. Where are your SOPs? How many are current? How many are scattered across shared drives, email attachments, and binders that haven’t been opened since the last exam?

Map your five context layers. For each, ask: What’s documented? What’s in people’s heads? What’s at risk of being lost to retirement?

Identify your top three retirement-risk employees — the people who hold the most institutional knowledge and are closest to leaving. Not to pressure them. To start the knowledge capture process while they’re still here.

This phase requires no technology. No vendor. No budget approval. Just honesty about the current state.

Phase 2: Build the Foundation (Months 2-4)

Centralize your documents. Move SOPs, policies, and procedures into a single, version-controlled repository. Add frontmatter metadata: what type of document is this? Is it current? What topics does it cover?

This is organizational hygiene that pays off whether you use Runline, another vendor, or no AI at all. A centralized, classified, version-controlled knowledge base makes your compliance team more effective immediately. It makes examiner documentation easier. And it creates the substrate that a Company Context Layer needs to function.

Phase 3: Deploy the Context Layer (Months 3-6)

Index your centralized knowledge. Deploy hybrid search. Connect your first AI agent. Start with one department.

I recommend BSA. It’s the best candidate for three reasons: the processes are well-defined (SOPs exist, even if scattered), the manual burden is high (95% false positive alerts, 60-hour weeks, 125% capacity), and the audit requirements make every improvement measurable. When your BSA Runner processes an alert in seconds that used to take an analyst minutes, you can quantify the value. When it cites the specific, current policy it used to make a determination, you can show the examiner.

Phase 3 is where you typically need a technology partner. Phase 1 requires zero budget — just time from your compliance lead. Phase 2 is a documentation project your existing team can handle. Phase 3 is an infrastructure project, and the investment scales with scope — a single-department pilot can be operational in weeks, not months, at a fraction of what you’d spend on a traditional core system integration.

Three Questions for Your Board

If your board is evaluating AI strategy — and they should be — ask these three questions:

1. If our top BSA analyst retired tomorrow, how much of her knowledge is captured in a system an AI agent can access?

If the answer is “very little,” you have a context gap that no model purchase will close. Start with Phase 1.

2. When we deploy AI, will it know our SOPs, our examiner’s preferences, and our risk tolerance — or will it give us generic internet answers?

If the answer is “generic answers,” you’re buying a chatbot, not building infrastructure. The Company Context Layer is the difference.

3. Can we show our examiner exactly what knowledge our AI used to make a recommendation, and verify that knowledge is current?

If the answer is “no,” you have an audit problem. The architecture described in this paper — version-controlled documents, frontmatter metadata, queryable search logs — makes the answer “yes” by design.

The gap between AI that knows everything and AI that knows you isn’t closing on its own. Models will keep getting smarter. But smarter models with generic context will always lose to adequate models with excellent institutional context. The best model doesn’t win. The best context wins.

Building a Company Context Layer is how you make sure your context wins.

Technical Appendix A: Architecture Reference

Three-Tier Knowledge Model

Tier 1: Agent Memory          Tier 2: Company Knowledge        Tier 3: Coordination
(Private, per-agent)          (Shared, version-controlled)      (Cross-agent state)

 BSA Runner Memory             Git Repository                    Task Boards
 - Session history             - SOPs & Policies                 - Handoff Notes
 - Alert observations          - Decision Records                - Project Status
 - Working notes               - Playbooks                       - Run Progress
                               - Meeting Notes
 HR Runner Memory              - Regulatory Context
 - Benefits inquiries
 - Onboarding state            Each doc has YAML frontmatter:
                               type | status | tags | agent_context
 Lending Runner Memory
 - Application drafts
 - Underwriting notes

Indexing Pipeline

Markdown Files (with YAML frontmatter)
    │
    ▼
[1] File Discovery ──── mtime tracking (incremental re-index)
    │
    ▼
[2] Frontmatter Extraction ──── type, status, tags, title, agent_context
    │
    ▼
[3] Paragraph-Aware Chunking ──── ~500 tokens, 50 overlap
    │
    ├──────────────────────────────────┐
    ▼                                  ▼
[4a] DuckDB (BM25 FTS)           [4b] LanceDB (Vectors)
     Keyword precision                 Semantic understanding
     Exact matches                     Conceptual similarity
     No API required                   Voyage multimodal-3.5, 1024 dims
    │                                  │
    └──────────────┬───────────────────┘
                   ▼
[5] Convex Combination Fusion
    score = α · norm(vector) + (1-α) · norm(bm25)
    α = 0.3–0.4 (60–70% BM25, 30–40% vector)
    Benchmarked: NDCG@10 0.245 (vs 0.217 for RRF 2:1)
                   │
                   ▼
            Merged Results

Configuration Parameters

Parameter	Value	Purpose
Chunk size	~500 tokens	Balance between context preservation and retrieval precision
Chunk overlap	50 tokens	Cross-chunk continuity at paragraph boundaries
Fusion algorithm	Convex Combination	Score-based fusion; preserves score magnitude (Bruch et al., 2023)
Convex α	0.3–0.4	60–70% BM25, 30–40% vector — benchmarked optimal for domain-specific corpora
Embedding model	Voyage multimodal-3.5	1024-dimensional text embeddings
Embedding dimensions	1024	Balance between quality and storage/compute cost
Batch size (indexing)	20 chunks	Embedding API efficiency within rate limits
Candidate limit	40 per retriever	Pre-fusion retrieval depth; top 40 from each system before fusion

Note: We originally deployed RRF K=60 with 2:1 vector weighting. Benchmarking across 28,014 evaluations revealed Convex Combination outperforms RRF by 6.5% (NDCG@10: 0.245 vs 0.230). Always benchmark your retrieval.

Technology Stack

Component	Technology	Role
Full-text search	DuckDB + FTS extension	BM25 keyword search (NDCG@10: 0.232), metadata storage, frontmatter filtering
Vector search	LanceDB	Approximate nearest neighbor search on embeddings
Fusion	Convex Combination (α=0.3–0.4)	Score-based hybrid fusion (NDCG@10: 0.245); validated per Bruch et al., 2023
Embeddings	Voyage AI (multimodal-3.5)	Text → 1024-dimensional vector conversion
API server	Fastify (Node.js)	HTTP service layer for agent consumption
Containerization	Docker / Podman	Portable, reproducible deployment
Networking	Secure mesh VPN	Encrypted multi-machine agent access
Version control	Git	Document history, change tracking, audit trail
Document format	Markdown + YAML frontmatter	Human-readable, LLM-native, Git-diffable

Graceful Degradation Chain

Convex Hybrid (60–70% BM25 + 30–40% Vector)   ← Default: NDCG@10 0.245
        │
        │  Embedding API unavailable?
        ▼
BM25-Only (Keyword Search)                     ← Fallback: NDCG@10 0.232 (5% lower)
        │
        │  DuckDB unavailable?
        ▼
Cached Results                   ← Emergency: last-known-good responses

NCUA Compliance Mapping

NCUA Requirement	Architecture Feature	Evidence
Risk management	Tiered access model; Git versioning; supersession detection	Document history in Git; status field in frontmatter
Monitoring & control	Query logging; per-agent tracking	Search logs: agent, query, results, timestamp
Termination capability	Service-level controls; kill-switch integration	Container stop; API restrictions; Grid kill-switch (<100ms)
Governance	Frontmatter status propagation; council review gates	`status: active/superseded/draft` on every document
Vendor transparency	All knowledge on CU infrastructure; open formats	Markdown in Git; no proprietary knowledge formats

Technical Appendix B: Glossary

Runner — A purpose-built AI agent aligned to a specific team or domain. Not a generic chatbot — a specialized worker trained on relevant SOPs and institutional context.

Playbook — A complex workflow or standard operating procedure encoded for AI execution. Defines the sequence of skills, approval gates, and compliance checkpoints for a given process (e.g., SAR investigation, loan pre-screening).

Skill — An individual executable capability. The smallest unit of agent work — a single, composable action (e.g., “pull credit report,” “draft SAR narrative,” “check OFAC list”).

The Grid — Runline’s AI Control Plane. All agent traffic traverses the Grid. Provides per-agent authentication, rate limiting, kill-switch capability, and comprehensive audit logging.

The Tower — The command surface where staff observe, direct, and intervene in agent activity. Timeline-based visibility into every Runner’s work, costs, and outcomes.

Run — A live execution of a Playbook — stateful, time-bounded, with validation gates and human oversight. A SAR investigation is a Run. A loan processing workflow is a Run.

Company Context Layer — The semantic search infrastructure described in this paper. Indexes institutional knowledge and makes it available to every agent via HTTP API.

This paper is part of a series on AI infrastructure for credit unions. Previous articles: “Stop Buying Chatbots. Start Building Infrastructure” (Article 7), “Context Is King” (Article 9), “The Agentic Workforce: Your Department-by-Department AI Strategy” (Article 12), and “Examiner-Ready by Design” (Article 14).

The complete benchmark data — 28,014 evaluations across 14 configurations — is available as an interactive visual with sortable tables, per-category breakdowns, and methodology details.

For questions or to discuss how a Company Context Layer applies to your institution, contact sean@runlineai.com.

On This Page