Deep Research Report: Knowledge Architecture in Agentic Systems
The transition from human-driven software engineering to autonomous and semi-autonomous AI-augmented development necessitates a fundamental restructuring of how organizations manage knowledge, state, and context. Historically, documentation was authored by humans, for humans, optimizing for pedagogical onboarding, narrative comprehension, and visual layout. However, in systems where artificial intelligence agents operate as the primary consumers of documentation, the economic, structural, and architectural realities of knowledge transfer are entirely inverted. Human readers are high-context, slow-reading consumers; AI agents are zero-context, extraordinarily fast-reading consumers heavily constrained by inference compute costs and strict context-window token limits.
To resolve the friction between legacy documentation practices and the operational realities of agentic workflows, modern software systems are increasingly adopting a strict three-layer knowledge architecture: the Source, the Lens, and the Snapshot. The Source represents the immutable, definitive origin of truth—pure state, canonical event logs, and raw code, optimized for machine accuracy. The Lens serves as the deterministic transformation mechanism—a derivation recipe, query, or context-engineering prompt that extracts and filters data. Finally, the Snapshot is the ephemeral, point-in-time view generated by the Lens, optimized entirely for the immediate consumer's context window. This report provides an exhaustive analysis of the thirty-year historical evolution, the quantitative scaling challenges, the contemporary agentic methodologies, and the formal architectural frameworks that inform, validate, and necessitate the Source/Lens/Snapshot paradigm.
1. Classical Knowledge Management History (1995–2025)
The conceptual foundation of the Source/Lens/Snapshot model is not a novel invention of the artificial intelligence era; rather, it is the culmination of a thirty-year trajectory in software engineering and information systems design. Over three decades, the industry has relentlessly attempted to move human-readable semantic knowledge closer to machine-executable deterministic logic.
The concept of a Single Source of Truth (SSoT) originated in the rigorous mathematics of database normalization proposed by Edgar F. Codd in the 1970s, which sought to eliminate data anomalies and update friction through strict schema design.1 In Codd's normalized databases, a piece of information existed in exactly one place; any redundant copy was considered a structural vulnerability. By the early 1990s, as the Gartner Group coined the term Enterprise Resource Planning (ERP), organizations faced catastrophic data fragmentation across siloed departments.3 In 1995, working alongside Ralph Kimball's Red Brick Systems, Business Intelligence (BI) vendors like Pilot Software began heavily promoting the "Single Version of the Truth" (SVOT).4 The objective was to establish a centralized database where complex executive queries would yield mathematically consistent answers, regardless of which department asked the question.4 As information technology matured, SSoT evolved from a strict data engineering principle into a broader organizational knowledge strategy. SSoT architecture dictates that every data element is mastered in exactly one place; all other locations merely reference it, whether through transclusion, foreign keys, or dynamic queries.2 If tribal knowledge is kept in ephemeral chat applications like Slack or isolated spreadsheets, it violates the SSoT, forcing new developers into "archaeological digs" through unstructured history to find the truth.5 In the context of the three-layer knowledge model, the SSoT is the absolute, defining characteristic of the Source layer.
Building upon this philosophy, in 1999, Andrew Hunt and Dave Thomas published The Pragmatic Programmer, introducing the DRY (Don't Repeat Yourself) principle. Hunt and Thomas declared: "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system".6 While novice developers initially interpreted DRY merely as an instruction to avoid copy-pasting source code, Hunt and Thomas explicitly applied the principle much more broadly to include database schemas, test plans, build systems, and crucially, documentation.8 They recognized that "knowledge isn't stable"—it changes rapidly due to shifting business requirements, regulatory updates, or algorithmic discoveries.7 Duplicating knowledge across a codebase and a separate documentation wiki invites a severe maintenance nightmare.7 The DRY principle provides the primary philosophical justification for the separation of the Lens and Snapshot layers. Because knowledge changes continuously, writing static documentation (a manual Snapshot) violates DRY if the underlying system (the Source) already contains that knowledge. Instead, a Lens must dynamically generate the Snapshot from the Source, ensuring that the knowledge is authored and maintained in only one place.
Long before the Docs-as-Code movement, Donald Knuth introduced the concept of "Literate Programming" in 1984.10 Knuth envisioned a paradigm in which computer programs are interleaved directly with natural language documentation. Knuth's original WEB tool utilized two distinct processors: a WEAVE processor to generate nicely typeset, human-readable documents, and a TANGLE processor to extract and compile the machine-executable code.11 For human engineers, literate programming largely failed to achieve mainstream dominance in general-purpose software development because human cognition struggles with the constant context-switching between declarative prose and imperative code syntax. While it survived in scientific computing through the Jupyter computational notebook paradigm 10, it was deemed too cumbersome for standard enterprise engineering. However, the agentic era has resurrected Knuth's vision. AI agents possess the topological mapping capabilities to effortlessly parse interleaved logic and prose without cognitive fatigue. Modern implementations of this concept manifest as "Briefing Packs"—information-dense, markdown-based executable knowledge graphs where human-readable intent and machine-executable code coexist natively.14 In our model, Literate Programming represents the earliest mechanical attempt to physically fuse the Source and the Snapshot, relying on crude compilers to act as early Lenses.
As software engineering scaled, the industry recognized that decoupling documentation from code led to immediate obsolescence. The early 2010s saw the rise of the "Docs-as-Code" movement, heavily championed by technical writers like Tom Johnson and the expansive Write the Docs community.16 This movement recognized that traditional documentation platforms, such as isolated Word documents or decoupled enterprise wikis, were structurally isolated from the software development lifecycle. Docs-as-code mandated that documentation be written in plaintext markup languages (such as Markdown or AsciiDoc), stored in the exact same Git version control repositories as the source code, and deployed automatically via Continuous Integration and Continuous Deployment (CI/CD) pipelines.16 This enabled a culture where technical writers and developers shared ownership of the knowledge architecture, utilizing the exact same pull request and review mechanisms.16 Docs-as-Code effectively shifted documentation from an unstructured, unversioned afterthought into a structured artifact, making it computable and prepared for programmatic manipulation by Lenses.
Taking the Docs-as-Code philosophy to its logical extreme, Cyrille Martraire published Living Documentation in 2019.20 Martraire argued that documentation should not be written manually at all, but rather generated dynamically from "established formalisms" and the code itself.22 Grounded in Domain-Driven Design (DDD), Martraire proposed that if a domain model utilizes a mathematical structure—such as a Monoid or a Space Vector—the code should simply be annotated with @Monoid.22 Because the mathematical properties of a Monoid are universally established and unchanging, the documentation system can automatically generate exhaustive explanations and constraints without human intervention.22 Martraire explicitly relies on the concept of a "Lens" to project different views of the underlying domain model based on the specific needs of the viewer.20 This represents the exact conceptual precursor to our modern Lens layer: algorithmic extraction mechanisms acting upon a well-annotated Source to synthesize highly accurate, human-readable or machine-readable Snapshots on demand.
Finally, the classical era recognized that not all knowledge is represented in live code. In 2011, Michael Nygard popularized the concept of Architecture Decision Records (ADRs). As software systems became too complex to document exhaustively in static architectural overviews, the industry shifted toward documenting historical decisions rather than the current state. ADRs capture the specific context, the considered options, and the anticipated consequences of an architectural choice at a precise moment in time. Because ADRs are immutable historical logs—appended to the repository rather than edited over time—they do not suffer from documentation drift. They function as a pristine, decay-proof Source of historical context, allowing developers (and eventually, AI agents) to understand the underlying "why" behind a codebase's structure, rather than just observing "what" it currently is.23
| Claim | Source (Author, Title, Year, URL) | Relevance to Source/Lens/Snapshot Model |
|---|---|---|
| SSoT Origins: The Single Source of Truth originated in database normalization and 1990s ERP/BI systems before becoming a broader organizational knowledge principle. | Ralph Kimball / Pilot Software, Red Brick Systems, 1995 3; Edgar F. Codd.1 | SSoT defines the mandatory architectural prerequisite for the Source layer: a definitive, un-duplicated repository of state. |
| DRY Expansion: DRY applies not just to code, but to schemas, testing, and documentation; "Every piece of knowledge must have a single representation." | Andrew Hunt & Dave Thomas, The Pragmatic Programmer, 1999.6 | DRY strictly necessitates the Lens layer; manually updating Snapshots violates DRY by duplicating knowledge already embedded in the Source. |
| Literate Programming: Interleaving machine code and human documentation failed for human authoring but is highly effective for AI agent "Briefing Packs." | Donald Knuth, Literate Programming, 1984.10 | Demonstrates an early, static fusion of Source and Snapshot, which LLMs can now consume efficiently due to their topology. |
| Docs-as-Code: Treating documentation like software (using Git, CI/CD, Markdown) integrates technical writers with engineering workflows and prevents version mismatch. | Tom Johnson, Docs as Code, 2020.16 | Shifted documentation from unstructured, decoupled wikis to structured repository artifacts, making them computable for Lenses. |
| Living Documentation: Documentation should be generated dynamically from code and established mathematical formalisms rather than written manually. | Cyrille Martraire, Living Documentation, 2019.20 | Acts as the exact mechanical precursor to the Lens—extracting contextual Snapshots directly from the underlying Source. |
| Architecture Decision Records: Documenting immutable architectural decisions reduces PR review cycles by 30-40% and perfectly preserves historical context without drift. | Michael Nygard, Architecture Decision Records, 2011; DX Institute.23 | ADRs serve as an immutable, append-only Source of truth for system context, impervious to the decay of state drift. |
2. The Documentation Problem at Scale
The primary catalyst for adopting a dynamic, agentic knowledge architecture is the mathematical and economic impossibility of maintaining static documentation at enterprise scale. When documentation is physically or temporally decoupled from execution, it becomes subject to an inescapable phenomenon known as "documentation drift." Documentation drift occurs when the actual implementation of a system steadily diverges from its outdated specifications, creating a dangerous delta between what the system does and what developers believe it does.25
Research from the Developer Experience (DX) Institute provides devastating quantitative evidence of the economic impacts stemming from this drift. Organizations drastically underestimate the ongoing maintenance cost of documentation, frequently viewing it as a one-time creation task associated with feature launches.23 According to DX benchmarking data across hundreds of tech organizations, static developer documentation begins to decay the exact moment the underlying code changes. Within a mere six months of creation, documentation is generally viewed by engineers as "suspect." After one year, it crosses a critical threshold and becomes "actively misleading".23 Actively misleading documentation is cognitively and economically far worse than having no documentation at all. It sends developers down false technical paths, causes them to implement deprecated patterns, and significantly extends Mean Time to Restore (MTTR) during critical production incidents because operators cannot rely on operational runbooks.23
The economic toll of documentation drift is severe and measurable. Developers spend between 3 and 10 hours per single work week merely searching for undocumented or poorly documented information, or verifying whether existing documentation is still accurate.23 For a mid-sized engineering team of 100 people, this equates to a staggering 300 to 1,000 lost hours weekly—the equivalent of 8 to 25 full-time engineers producing absolutely zero value.23 Consequently, poor documentation acts as a massive, silent technical debt that costs a mid-sized organization between $500,000 and $2,000,000 annually in lost productivity.23 Organizations attempting to fight this entropy manually find themselves trapped in a Sisyphean cycle; to sustain high-quality documentation, they must dedicate 5% to 10% of their total engineering capacity purely to documentation debt paydown rituals.23
The traditional enterprise response to the knowledge management crisis has been the deployment of centralized knowledge bases and wikis, most notably Atlassian Confluence.27 However, these systems inevitably devolve into what practitioners term "wiki graveyards." Because the benefits of comprehensive documentation are distributed broadly across the organization, but the intense maintenance burden falls on localized, individual authors, wikis suffer from a classic "tragedy of the commons".23 Pages are created with massive enthusiasm during project kick-offs, but they are rapidly abandoned as soon as feature urgency and deadline pressures overtake the desire for maintenance.23 Furthermore, without strict version control discipline tying the wiki to the live system, crucial operational changes—such as a weekend wiring fix, a database schema alteration, or a PLC tweak—are never recorded in the wiki, exponentially raising the costs of future modernization efforts.28
To mitigate this systemic failure, industry leaders have pioneered various approaches to mechanically or culturally enforce synchronization. Google engineered its way out of the documentation drift problem by physically coupling the Source and the Snapshot via its internal g3doc infrastructure.29 At Google, all documentation is written in a Markdown-like syntax and stored directly interleaved across the exact same directory structures as the monolithic source code repository.30 When a Google engineer alters the behavior of a codebase, they are required to update the associated documentation in the exact same commit. Because the code and the documentation share a single, unified version control history, rolling back a faulty code change automatically and simultaneously rolls back the documentation change. The g3doc files are then continuously rendered internally as a searchable website. This architectural constraint ensures that the human-readable Snapshot perfectly reflects the programmatic Source at any given point in history, eliminating temporal drift.30
Stripe, a company famous for possessing a world-class developer experience, open-sourced its internal documentation framework, Markdoc, in 2022 to address the complexities of multi-platform drift.31 Stripe recognized that maintaining separate, hand-written REST API docs, language-specific SDK docs (Python, Ruby, Node), and human-written integration guides across multiple isolated codebases leads to inevitable, cascading drift.33 Markdoc functions as a highly sophisticated Lens. It ingests a centralized OpenAPI specification (the Source) and dynamically generates interactive, language-specific code examples, error message references, and their famous three-column layouts (the Snapshot).32 By centralizing the Source logic, Stripe ensures that a single change in underlying API behavior immediately and automatically cascades to all user-facing documentation artifacts without manual technical writing intervention.
GitLab takes an organizational, rather than purely technological, approach to maintaining the Single Source of Truth. GitLab operates a massive, open-source, public-facing company handbook.34 To overcome the "commons problem" and prevent the handbook from becoming a wiki graveyard, GitLab weaves handbook maintenance forcefully into its core organizational culture. The company requires all new employees to make at least two successful merge requests to the handbook during their initial onboarding process, establishing the behavioral habit immediately.34 GitLab operates on a strict paradigm: if a policy, architecture, or process is not documented in the handbook, it effectively does not exist. While this handbook-first approach is highly effective at maintaining a cohesive organizational reality, it relies heavily on rigorous human discipline and continuous cultural enforcement to maintain the integrity of the Source.
| Claim | Source (Author, Title, Year, URL) | Relevance to Source/Lens/Snapshot Model |
|---|---|---|
| Cost of Drift: Documentation decays rapidly; after one year, it is "actively misleading." This drift costs mid-sized teams up to $2M annually in wasted search time. | Developer Experience (DX) Institute, Developer Documentation Maintenance Costs, 2025.23 | Validates the obsolescence of static, uncoupled Snapshots; highlights the economic necessity of automated, dynamic Lenses. |
| Commons Problem: Wikis decay rapidly because the systemic benefit is distributed but the maintenance burden is highly localized, leading to "wiki graveyards." | DX Institute 23; Atlassian Confluence usage reports.27 | Proves that human-maintained Snapshots fail at scale; updates to the Source must automatically and programmatically update the docs. |
| g3doc Integration: Google mandates that documentation lives in the same repository as code, updating, reviewing, and versioning in the exact same commit. | Google Internal Engineering, g3doc documentation.29 | Demonstrates physically coupling the Source and the Snapshot through shared version control to prevent temporal and state drift. |
| Markdoc Framework: Stripe generates REST API docs, SDK guides, and multi-language code samples dynamically from a single OpenAPI specification. | Stripe / Stainless Docs, Markdoc release, 2022.31 | Represents a textbook implementation of the Lens layer transforming a central, logic-based Source into multiple, varied presentation Snapshots. |
| Handbook-First: GitLab enforces a cultural SSoT by requiring all organizational knowledge to be merged into a public repository, starting at employee onboarding. | GitLab, Company Handbook.34 | Illustrates how strict organizational governance and behavioral conditioning are required to maintain the integrity of a human-authored Source. |
3. The Agentic Turn: Knowledge for AI Consumers
The introduction of autonomous and semi-autonomous AI agents into software ecosystems represents the most significant shift in knowledge management since the advent of distributed version control. Agents completely upend the traditional economics, structure, and delivery mechanisms of documentation.
In legacy software engineering, the primary economic cost of documentation was the human labor required to write and manually maintain it. The act of reading documentation was relatively "free" and unbounded. In agentic systems, this dynamic is sharply inverted. As Steve Yegge articulates in his "Software Survival 3.0" framework, AI agents are entirely capable of reading massive amounts of documentation, but doing so costs precious inference tokens and rapidly drains the agent's limited context window.36 If an AI coding agent must read fifty pages of narrative API documentation to understand how to use a tool, the "read tax" (or friction cost) becomes exorbitant. If the agent encounters edge cases, errors, or ambiguity while parsing dense, human-pedagogical documentation, it acts as though it is in a hurry; it will rapidly abandon the authorized tool and attempt to hallucinate workarounds, severely degrading system reliability.36
Therefore, the most successful tools designed for agentic consumption do not require the agent to read extensive documentation at all. Instead, forward-thinking engineers engage in a practice known as "hallucination squatting"—observing what an LLM naturally assumes a system's API looks like based on its pre-trained weights, and deliberately building the tool's interface to match that hallucinated expectation.36 By doing so, the tool's interface perfectly aligns with the agent's inherent knowledge, dropping the required context payload to near zero. In this inverted economy, the Snapshot provided to an agent cannot be a human-readable wiki page; it must be hyper-compressed, strictly formatted, highly deterministic, and stripped of all human-centric narrative.
As Large Language Models have matured, the industry has shifted away from the rudimentary practice of "prompt engineering" toward a more rigorous, systemic discipline known as "Context Engineering".37 Shopify CEO Tobi Lütke defines context engineering as "the art of providing all the context for the task to be plausibly solvable by the LLM".37 Researchers like Simon Willison note that in modern systems, agent failures are rarely underlying model failures anymore; they are almost exclusively "context failures".37 Context engineering recognizes that context is not a static string of text, but a dynamic, ephemeral system generated entirely on the fly.37 It involves assembling system instructions, user prompts, short-term conversational state, long-term memory, retrieved external data (RAG), and available tool schemas into a singular payload.37 In our architecture, the prompt and the retrieval mechanisms act collectively as the Lens. They function as a derivation recipe, filtering through the vast, noisy Source (vector databases, codebases, APIs) to project a highly targeted, ephemeral Snapshot (the context window) into the LLM's working memory. Willison identifies critical anti-patterns like "Context Poisoning" (where hallucinations enter the context and reinforce themselves) and "Context Distraction" (where the context window grows so large the model forgets its core training instructions), further emphasizing the absolute necessity of precision Lenses.38
When multiple agents collaborate on complex tasks, they require a mechanism to share state without blowing up their individual context windows. Researchers differentiate heavily between private agent memory and "shared-workspace" memory.39 In a shared workspace design, all agents read from and write to a common pool (often termed a World Model or Shared Memory).39 Because a raw shared pool quickly becomes impossibly noisy, systems must establish filtering mechanisms, such as the "Candidate Bus" seen in InteRecAgent, where tools repeatedly read the current candidate set and write back filtered, narrowed candidates.39 In this multi-agent paradigm, the shared workspace acts as the definitive Source, while the filtering algorithms act as specific Lenses generating localized, highly relevant Snapshots for individual agents, preventing prompt length overflow.39
A highly practical, real-world implementation of agentic context generation is Steve Yegge's beads CLI tool (bd).41 The beads system acts as a persistent, structured memory tracker built specifically for coding agents like GitHub Copilot and Claude Code.41 Rather than feeding an agent a sprawling Jira board or massive markdown project files, beads utilizes a specific command: bd prime. When an agent executes bd prime, the CLI runs instantly and outputs a perfectly structured, compressed payload of 1-2k tokens containing the exact workflow context, unblocked tasks, and operational rules the agent currently needs.41 The beads CLI is entirely hostile to human use; it contains over 100 subcommands and aliases built explicitly to accommodate agent "desire paths".36 The beads software is a perfect manifestation of a Lens. It sits directly on top of a graph-based issue database (the Source) and generates a minimal, high-signal Snapshot strictly on demand, vastly outperforming the latency and massive token costs associated with heavy Model Context Protocol (MCP) integrations.41
Jeremy Daly provides the most exact and profound architectural analogy for agentic knowledge systems: Materialized Views from relational database architecture.44 Daly notes that in commercial, multi-tenant agent systems, "almost nothing accidental survives".44 An agent's state must be completely reconstructible from its own persisted history, not dependent on ephemeral caches. Daly advocates for a "Canonical Event Log" that is append-only, immutable, replayable, and version-aware.44 This log records every single context retrieved, tool called, policy evaluated, and memory promoted. This log represents the ultimate Source. However, querying a massive raw event log is far too slow and computationally expensive for live, reacting agents. Therefore, the system generates "Materialized Views"—cost views, lineage trees, and evaluation harnesses mathematically derived from the event log.44 These materialized views do not redefine or alter the underlying data; they are merely computational projections. In our model, Daly's Canonical Event Log is the Source, the projection logic is the Lens, and the Materialized View is the exact equivalent of the Snapshot.
Further optimizing this retrieval is the concept of "pyramid summaries," a technique prominently utilized in advanced RAG and agentic systems, heavily popularized by infrastructure companies like strongDM.46 Because context windows are limited and expensive, knowledge must be structured hierarchically. A pyramid summary provides a high-level abstraction at the top of the document structure, allowing an agent to quickly assess relevance. The agent only traverses downward into deeper, more granular documents if its specific task logic requires it. This hierarchical traversal is achieved through specialized Lenses that adjust the resolution and depth of the Snapshot dynamically based on the agent's current cognitive requirement.
| Claim | Source (Author, Title, Year, URL) | Relevance to Source/Lens/Snapshot Model |
|---|---|---|
| Read/Write Inversion: Agents can read vast documentation, but the inference (token) cost is exceptionally high. Agent tools must minimize "read tax" by matching agent hallucinations. | Steve Yegge, Software Survival 3.0, 2026.36 | Snapshots designed for agents must prioritize token-efficiency and dense logic over human-readable narrative pedagogy. |
| Context Engineering: Providing an LLM with dynamic, properly formatted information is a distinct engineering discipline, completely superseding static prompt engineering. | Tobi Lütke / Simon Willison, Context Engineering, 2024/2025.37 | Confirms that prompts and retrieval systems act as dynamic Lenses projecting highly contextual, ephemeral Snapshots. |
| Shared Agent Workspaces: Multi-agent systems utilize shared memory pools, requiring strict filtering mechanisms to prevent context overflow and systemic noise. | Huang et al., InteRecAgent Candidate Bus, 2025.39 | The shared memory pool serves as a local Source, requiring Lenses to generate isolated, relevant agent Snapshots. |
| CLI-as-Docs (bd prime): The beads CLI dynamically generates a 1-2k token context payload for coding agents, bypassing heavy static documentation and MCP protocols. | Steve Yegge, Beads GitHub Repository, 2025.41 | bd prime acts as an executable Lens that queries raw state and prints a perfect, ephemeral Snapshot for the agent. |
| Materialized Views: Agent state must rely on an immutable Canonical Event Log, from which specific cost, lineage, and audit views are mathematically projected. | Jeremy Daly, Context Engineering for Commercial Agent Systems, 2024.44 | The most accurate database equivalent to the model: Event Log = Source, Projection Logic = Lens, View = Snapshot. |
| Pyramid Summaries: Knowledge must be structured hierarchically so agents can assess relevance at a high level before consuming tokens on granular details. | strongDM / Academic research on Expectancy Theory.46 | Hierarchical structuring acts as a variable-resolution Lens, allowing the Snapshot to expand only when necessary. |
4. Related Models and Frameworks
The Source/Lens/Snapshot architecture does not exist in an academic vacuum. It represents the agentic culmination of several highly formalized enterprise architecture frameworks that have, over decades, sought to separate underlying truth from situational presentation.
The international standard for software and system architecture description is ISO/IEC/IEEE 42010.49 Originating from architecture frameworks in the 1970s and heavily influenced by the Zachman Framework for enterprise architecture, IEEE 42010 provides a rigorous, standardized ontology for managing complex system descriptions.49 The core philosophical tenet of IEEE 42010 is the formal, unbreachable distinction between a Viewpoint and a View.50 A modern system architecture is far too massive and complex to be comprehended through a single diagram or document. Therefore, different stakeholders require different perspectives (e.g., a security perspective, a physical networking perspective, a logical data structure perspective). Under the IEEE standard, a Viewpoint is a formal specification—a precise recipe, rule set, or modeling convention—dictating exactly how a specific perspective should be constructed and what data it is allowed to include.52 A View, conversely, is the resulting physical artifact created by systematically applying the Viewpoint to the underlying system model.50 This maps flawlessly to our architecture. The underlying system model is the Source. The IEEE Viewpoint is the Lens (the established rules for extraction). The IEEE View is the Snapshot (the resulting diagram or context window).
Defined by the Object Management Group (OMG), Model-Driven Architecture (MDA) is a comprehensive approach to software design that strictly separates pure business logic from platform-specific technical implementation.54 MDA introduces three critical structural concepts. First, the Platform-Independent Model (PIM) represents business logic completely devoid of technological constraints (e.g., a pure UML model of a banking system representing accounts and transfers).54 Second, the Platform-Specific Model (PSM) is a model tailored to a specific technology stack (e.g., Java,.NET, or SQL).54 Third, the Model Transformation is the mechanism—often written in specialized languages like QVT (Query/View/Transformation)—that automatically and programmatically compiles the PIM into the PSM.54 In the agentic era, MDA provides a powerful conceptual mapping. The PIM is the Source (pure, untethered logic and knowledge). The Transformation is the Lens. The PSM is the Snapshot (executable code or formatted documentation generated specifically for a particular LLM context window). While MDA historically struggled with human adoption due to the extreme complexity of writing QVT transformations manually, modern LLMs now act as the ultimate dynamic transformation engines, fulfilling the original promise of MDA.
In 2019, Zhamak Dehghani (then at Thoughtworks) introduced the "Data Mesh" paradigm, completely rejecting the monolithic, centralized data lake architectures that had dominated the previous decade.57 Data Mesh is a decentralized, sociotechnical architecture built on four distinct principles, the most vital being "Domain Ownership" and treating "Data as a Product".59 In a Data Mesh architecture, a Data Product is the architectural quantum—the smallest independently deployable unit containing code, data pipelines, and metadata.59 Accountability for data quality strictly shifts upstream to the operational domain that generates it, rather than downstream to a centralized data team.59 Furthermore, access to these decentralized data products is mediated through federated computational governance, often manifested as strict Data Contracts or Policies.59 In our architecture, the Domain Data Product serves as the decentralized Source. The Data Contract or Policy engine acts as the Lens, ensuring that data requested by a consumer (whether a human analyst or an AI agent) is formatted, governed, and scrubbed correctly before being delivered as a trusted Snapshot.
Finally, in the realm of technical communication, IBM donated the Darwin Information Typing Architecture (DITA) to the OASIS standards body in 2005.61 DITA revolutionized technical documentation by abandoning monolithic, linear documents in favor of XML-based, topic-based authoring.61 DITA treats content fundamentally as an "interchangeable part in manufacturing".61 Rather than writing a cohesive manual from start to finish, a technical writer authors independent, self-contained "topics" classified as Tasks, Concepts, or References. These topics are entirely devoid of presentation logic. They are then assembled dynamically using "DITA Maps" to publish to various presentation formats simultaneously (e.g., a PDF manual, an HTML website, or augmented reality instructions for a Microsoft HoloLens).62 DITA is the earliest successful, large-scale implementation of Content-as-a-Service (the foundation of modern Headless CMS systems). It perfectly embodies the model: the raw XML topics are the pure Source, the DITA Maps and XSLT stylesheets are the Lenses defining assembly, and the published PDF is the ephemeral Snapshot.
| Claim | Source (Author, Title, Year, URL) | Relevance to Source/Lens/Snapshot Model |
|---|---|---|
| IEEE 42010 Viewpoints: System architecture strictly separates the "Viewpoint" (the rules/recipe for a perspective) from the "View" (the resulting diagram/artifact). | ISO/IEC/IEEE 42010:2011 standard.49 | A formalized 1:1 mapping of the architecture: Underlying Model = Source, Viewpoint = Lens, View = Snapshot. |
| Model-Driven Architecture: MDA separates pure business logic (PIM) from technology-specific implementation (PSM) via formal, programmatic Transformations. | OMG, Model Driven Architecture Guide.54 | Validates the concept of separating pure knowledge (Source) from specific execution contexts (Snapshots) via transformations (Lenses). |
| Data Mesh: Data is decentralized into "Data Products" owned by specific domains, accessed via strict computational contracts, rather than pooled in monolithic lakes. | Zhamak Dehghani, Thoughtworks, 2020.57 | Provides a modern model for treating the Source layer not as a single monolith, but as a federated network of domain-owned products governed by Lenses. |
| DITA Transclusion: XML-based, topic-based authoring treats content as highly reusable manufacturing components assembled dynamically by maps. | IBM / OASIS, DITA 1.0, 2005.61 | Demonstrates that separating content (Source) from presentation layout (Snapshot) via algorithmic maps (Lenses) achieves massive reuse and prevents drift. |
5. Knowledge Architecture Anti-Patterns
Understanding how to align systems correctly within the agentic era requires a deep understanding of the named failure states that occur when the Source/Lens/Snapshot model is violated, ignored, or improperly implemented. These anti-patterns represent severe organizational and architectural liabilities.
"Documentation Bankruptcy" is a devastating anti-pattern that occurs when an organization realizes its documentation has drifted so wildly from reality that the cost of untangling and updating it vastly exceeds the cost of deleting it and starting entirely over.65 This term borrows heavily from legal and financial bankruptcy concepts, where debts—in this case, immense technical debt in the form of actively misleading knowledge—are wiped clean by a court to allow a fresh start.66 Documentation bankruptcy is the direct, unavoidable result of maintaining manual Snapshots. Because the human-authored Snapshot was fundamentally decoupled from the programmatic Source, the drift accumulated compound interest until the cognitive load on the system caused it to collapse.
"Tribal Knowledge" refers to tacit, undocumented expertise held entirely within the minds of specific engineers, or buried deep in ephemeral, unstructured Slack threads.5 In the context of our architecture, Tribal Knowledge represents a critical, terminal failure of the Source layer. Because the knowledge has no explicit, durable, machine-readable representation, it is rendered entirely invisible to AI agents and newly hired engineers alike. This forces the organization to rely on oral tradition, severely capping velocity. Conversely, "Cargo Cult Documentation" occurs when an organization forces its engineers to write documentation simply because management dictates "we should," without defining a clear purpose, a specific consumer, or an effective Lens. This results in vanity metrics (e.g., high wiki page counts or generated lines of docs) that provide zero actual utility, masking the lack of a true Source beneath thick layers of useless, decaying Snapshots.23
The absolute antithesis of the DRY principle is WET, an acronym standing for "Write Everything Twice" (or cynically, "We Enjoy Typing").6 WET architectures emerge when business logic, definitions, or system state are duplicated across multiple disparate systems, codebases, and wikis without a canonical Single Source of Truth.6 In human systems, WET causes confusion; in the agentic era, WET systems induce catastrophic "Context Clash".38 When an agent utilizes a Lens (like a RAG retrieval system) to answer a prompt, a WET architecture will return two conflicting Snapshots representing the same concept. Unable to determine which Snapshot represents the true Source, the agent suffers a logic failure or hallucination, rendering the system untrustworthy.
Finally, "Lava Flow" is a well-known architectural anti-pattern characterizing dead code or outdated documentation that nobody within the organization dares touch, refactor, or delete because no one fully understands its original dependencies or intent.23 In knowledge management, a Lava Flow results in an "Immortal Snapshot." It is a document or diagram that was generated long ago but has entirely lost its connection to its Source. Because engineers fear deleting it—worrying it might contain the only surviving record of a critical system behavior—the obsolete Snapshot fossilizes. It remains forever in the system, clogging search results and silently poisoning the context windows of AI agents attempting retrieval for decades to come.
| Claim | Source (Author, Title, Year, URL) | Relevance to Source/Lens/Snapshot Model |
|---|---|---|
| Documentation Bankruptcy: Declaring documentation completely unrecoverable and starting over due to insurmountable technical debt and state drift. | Industry colloquialism / DX Institute.65 | The ultimate, fatal consequence of decoupling Snapshots from the Source; drift accumulates until system collapse. |
| WET (Write Everything Twice): The inverse of DRY; duplicating knowledge across multiple unlinked silos without a centralized authority. | Java community, circa 2002.6 | Directly causes "Context Clash" for AI agents during RAG retrieval; emphasizes the absolute need for a singular Source. |
| Tribal Knowledge: Keeping essential system knowledge tacit, stored solely in human memory or ephemeral chat applications like Slack. | Strapi, Single Source of Truth.5 | A failure to establish a physical Source layer, rendering knowledge completely invisible to agentic consumers. |
| Lava Flow / Immortal Snapshot: Outdated documentation that fossilizes because no one understands its origin or dares delete it for fear of losing critical data. | DX Institute.23 | An orphaned Snapshot that has lost its connection to a Lens or Source, acting as permanent poison in vector retrieval. |
| Cargo Cult Documentation: Writing documentation solely to satisfy vanity metrics without addressing discoverability, consumer needs, or establishing derivation logic. | DX Institute.23 | Focusing entirely on manually creating useless Snapshots without establishing robust Sources or purposeful Lenses. |
Works cited
- ESTABLISH AND GOVERN SINGLE SOURCE OF TRUTH (SSOT) PROTOCOL - DoH, accessed February 27, 2026, https://www.doh.gov.ae/-/media/4FAE557F01844A6089AB1042F8323ED6.ashx
- Single source of truth - Wikipedia, accessed February 27, 2026, https://en.wikipedia.org/wiki/Single_source_of_truth
- Erp Complete Book Draft | PDF | Enterprise Resource Planning - Scribd, accessed February 27, 2026, https://www.scribd.com/document/888390775/Erp-Complete-Book-Draft
- Single Version of the Truth - Not Optional, accessed February 27, 2026, https://datalere.com/articles/single-version-of-the-truth-not-optional
- What Is a Single Source of Truth and How to Build One for Seamless Data Management, accessed February 27, 2026, https://strapi.io/blog/what-is-single-source-of-truth
- The DRY Principle: Embracing Efficiency in Software Engineering | by Michael Egger, accessed February 27, 2026, https://medium.com/@mesw1/the-dry-principle-embracing-efficiency-in-software-engineering-56e8efa62c07
- Andrew Hunt David Thomas, accessed February 27, 2026, https://picture.iczhiku.com/resource/eetop/sYKdWjSKWHEehVcb.pdf
- Don't repeat yourself - Wikipedia, accessed February 27, 2026, https://en.wikipedia.org/wiki/Don%27t_repeat_yourself
- Orthogonality and the DRY Principle - Artima, accessed February 27, 2026, https://www.artima.com/articles/orthogonality-and-the-dry-principle
- Cocoa: Co-Planning and Co-Execution with AI Agents - arXiv, accessed February 27, 2026, https://arxiv.org/html/2412.10999v4
- Interactive Program Distillation - EECS at Berkeley, accessed February 27, 2026, https://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-48.pdf
- Technical dimensions of programming systems - Tomas Petricek, accessed February 27, 2026, https://tomasp.net/techdims/
- (PDF) Computational Notebooks for AI Education - ResearchGate, accessed February 27, 2026, https://www.researchgate.net/publication/271446724_Computational_Notebooks_for_AI_Education
- The Team of One: AI, Agent Coordination, and the Economic Inversion, accessed February 27, 2026, https://leverageai.com.au/wp-content/media/The_Team_of_One_Why_AI_Enables_Individuals_to_Outpace_Organizations_ebook.html
- Agentic Software Engineering: Foundational Pillars and a Research Roadmap - arXiv, accessed February 27, 2026, https://arxiv.org/html/2509.06216v1
- Docs as Code - Write the Docs, accessed February 27, 2026, https://www.writethedocs.org/guide/docs-as-code.html
- How to Set Up Documentation as Code with Docusaurus and GitHub Actions, accessed February 27, 2026, https://www.freecodecamp.org/news/set-up-docs-as-code-with-docusaurus-and-github-actions/
- Case study: Switching tools to docs-as-code | I'd Rather Be Writing ..., accessed February 27, 2026, https://idratherbewriting.com/learnapidoc/pubapis_switching_to_docs_as_code.html
- 2020-03-30-changing-roles.md - GitHub, accessed February 27, 2026, https://github.com/tomjoht/tomjoht.github.io/blob/main/_posts/2020/3/2020-03-30-changing-roles.md
- Visual Collaboration Tools, accessed February 27, 2026, http://103.203.175.90:81/fdScript/RootOfEBooks/E%20Book%20collection%20-%202024%20-%20D/CSE%20%20IT%20AIDS%20ML/Visual%20Collaboration%20Tools%20for%20teams%20building%20software.pdf
- Living Documentation: Continuous Knowledge Sharing by Design, accessed February 27, 2026, https://api.pageplace.de/preview/DT0400.9780134689425_A37710357/preview-9780134689425_A37710357.pdf
- DDD First 15 Years PDF - Scribd, accessed February 27, 2026, https://www.scribd.com/document/491356487/ddd-first-15-years-pdf
- Developer documentation: How to measure impact and drive ... - DX, accessed February 27, 2026, https://getdx.com/blog/developer-documentation/
- How the DRY Principle in Programming Prevents Duplications in AI-Generated Code, accessed February 27, 2026, https://www.faros.ai/blog/ai-generated-code-and-the-dry-principle
- Integration Complexity Scales Faster Than Business Systems - Stacksync, accessed February 27, 2026, https://www.stacksync.com/blog/integration-complexity-growth
- Capturing and Understanding the Drift Between Design, Implementation, and Documentation - USI, accessed February 27, 2026, https://www.inf.usi.ch/phd/raglianti/publications/Romeo2024a.pdf
- Agile and Holistic Medical Software Development - VTT's Research Information Portal, accessed February 27, 2026, https://cris.vtt.fi/ws/files/75730211/AHMED_final_report.pdf
- The Packaging Risk Nobody Budgets For: Retiring Expertise - Douglas Machine, accessed February 27, 2026, https://www.douglas-machine.com/the-packaging-risk-nobody-budgets-for-retiring-expertise/
- highway/g3doc/faq.md at master · google/highway - GitHub, accessed February 27, 2026, https://github.com/google/highway/blob/master/g3doc/faq.md
- Documentation as Code : r/devops - Reddit, accessed February 27, 2026, https://www.reddit.com/r/devops/comments/mmuk4j/documentation_as_code/
- llms-full.txt - SpecStory, accessed February 27, 2026, https://specstory.com/llms-full.txt
- Stripe's llms.txt has an instructions section. That's a bigger deal than it sounds. - Apideck, accessed February 27, 2026, https://www.apideck.com/blog/stripe-llms-txt-instructions-section
- Stainless Docs Platform is now available in early access, accessed February 27, 2026, https://www.stainless.com/blog/stainless-docs-early-access
- Ask HN: Organizing company knowledge? | Hacker News, accessed February 27, 2026, https://news.ycombinator.com/item?id=16811499
- Understanding barriers and enablers of inter-team knowledge sharing: A case study in a non-profit IT organisation - Aaltodoc, accessed February 27, 2026, https://aaltodoc.aalto.fi/bitstreams/b7b44811-33ae-4d19-88ed-bc70b8d8d7e7/download
- Software Survival 3.0. I spent a lot of time writing software… | by ..., accessed February 27, 2026, https://steve-yegge.medium.com/software-survival-3-0-97a2a6255f7b
- The New Skill in AI is Not Prompting, It's Context Engineering, accessed February 27, 2026, https://www.philschmid.de/context-engineering
- Simon Willison on context-engineering, accessed February 27, 2026, https://simonwillison.net/tags/context-engineering/
- Rethinking Memory Mechanisms of Foundation Agents in the Second Half: A Survey - arXiv, accessed February 27, 2026, https://arxiv.org/html/2602.06052v3
- TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems - arXiv, accessed February 27, 2026, https://arxiv.org/html/2506.04133v4
- beads/docs/INSTALLING.md at main · steveyegge/beads - GitHub, accessed February 27, 2026, https://github.com/steveyegge/beads/blob/main/docs/INSTALLING.md
- beads/docs/COPILOT_INTEGRATION.md at main · steveyegge/beads - GitHub, accessed February 27, 2026, https://github.com/steveyegge/beads/blob/main/docs/COPILOT_INTEGRATION.md
- Explore beads library and usage with AMP, accessed February 27, 2026, https://ampcode.com/threads/T-adc03ba9-db60-49e6-bae9-e5f9749f4312
- Context Engineering for Commercial Agent Systems - Jeremy Daly, accessed February 27, 2026, https://www.jeremydaly.com/context-engineering-for-commercial-agent-systems/
- MATERIALIZED VIEWS, accessed February 27, 2026, https://pages.iai.uni-bonn.de/manthey_rainer/Seminars2018/matView.pdf
- accessed January 1, 1970, https://www.strongdm.com/blog/pyramid-summaries-for-documentation
- Expectancy Theory as the Basis for Activity-Based Costing Systems Implementation by Managers - Academia.edu, accessed February 27, 2026, https://www.academia.edu/32372949/Expectancy_Theory_as_the_Basis_for_Activity_Based_Costing_Systems_Implementation_by_Managers
- Forgetful but Faithful: A Cognitive Memory Architecture and Benchmark for Privacy‑Aware Generative Agents - arXiv.org, accessed February 27, 2026, https://arxiv.org/html/2512.12856v1
- Expressing Architecture Frameworks Using ISO/IEC 42010 - MIT, accessed February 27, 2026, https://www.mit.edu/~richh/writings/emery-hilliard2009.pdf
- Viewpoints, accessed February 27, 2026, https://local.iteris.com/cvria/html/about/viewpoints.html
- Views and Viewpoints | IASA - BTABoK, accessed February 27, 2026, https://iasa-global.github.io/btabok/views.html
- Architecture documentation with viewpoints: Best practice and normative aspects - IMT AG, accessed February 27, 2026, https://www.imt.ch/en/expert-blog-detail/architecture-documentation-with-viewpoints-en
- ISO/IEC/IEEE 42010: Frequently Asked Questions (FAQ), accessed February 27, 2026, http://www.iso-architecture.org/ieee-1471/faq.html
- Richard Charlesworth Thesis.pdf - Open Research Online, accessed February 27, 2026, https://oro.open.ac.uk/105251/1/Richard%20Charlesworth%20Thesis.pdf
- Flexible Views for View-based Model-driven Development - KIT, accessed February 27, 2026, https://publikationen.bibliothek.kit.edu/1000043437/3288800
- UML 2 Toolkit, accessed February 27, 2026, https://nuleren.be/edocumenten/uml-2-toolkit.pdf
- What is Data Mesh? - Oracle, accessed February 27, 2026, https://www.oracle.com/integration/what-is-data-mesh/
- Data Fabric vs. Data Mesh: 2026 Guide to Modern Data Architecture - Alation, accessed February 27, 2026, https://www.alation.com/blog/data-mesh-vs-data-fabric/
- Data Mesh Principles and Logical Architecture - martinfowler.com, accessed February 27, 2026, https://martinfowler.com/articles/data-mesh-principles.html
- The 4 principles of data mesh | dbt Labs, accessed February 27, 2026, https://www.getdbt.com/blog/the-four-principles-of-data-mesh
- Authoring Content for Reuse: A Study of Methods and Strategies, Past and Present, and Current Implementation in the Technical Co - TTU DSpace Repository, accessed February 27, 2026, https://ttu-ir.tdl.org/server/api/core/bitstreams/3c858e2d-94e2-4c51-832f-b128ec179e7e/content
- Adobe DITAWORLD 2022, accessed February 27, 2026, https://www.adobe.com/content/dam/cc/us/en/products/one-adobe-solution-for-technical-content/customershowcase/ditaworld-2022-summary-booklet.pdf
- on identifying technical debt using bug reports in practice - Diva-Portal.org, accessed February 27, 2026, https://www.diva-portal.org/smash/get/diva2:1752643/FULLTEXT01.pdf
- (PDF) M&S within the model driven architecture - ResearchGate, accessed February 27, 2026, https://www.researchgate.net/publication/228859916_MS_within_the_model_driven_architecture
- Software documentation - Wikipedia, accessed February 27, 2026, https://en.wikipedia.org/wiki/Software_documentation#Anti-patterns
- 900Q0400.txt - epa nepis, accessed February 27, 2026, https://nepis.epa.gov/Exe/ZyNET.exe/900Q0400.txt?ZyActionW=Download&Client=EPA&Index=Prior%20to%201976&Docs=&Query=&Time=&EndTime=&SearchMethod=1&TocRestrict=n&Toc=&TocEntry=&QField=&QFieldYear=&QFieldMonth=&QFieldDay=&UseQField=&IntQFieldOp=0&ExtQFieldOp=0&XmlQuery=&File=D%3A%5CZYFILES%5CINDEX%20DATA%5C70THRU75%5CTXT%5C00000006%5C900Q0400.txt&User=ANONYMOUS&Password=anonymous&SortMethod=h%7C-&MaximumDocuments=1&FuzzyDegree=0&ImageQuality=r75g8/r75g8/x150y150g16/i425&Display=p%7Cf&DefSeekPage=x&SearchBack=ZyActionL&Back=ZyActionS&BackDesc=Results%20page
- For a New Approach to Credit Relations in Modern History | Cairn.info, accessed February 27, 2026, https://shs.cairn.info/journal-annales-2012-4-page-661?lang=en
- Foley AL Chapter 13 Bankruptcy - Attorneys in Mobile, AL - Padgett and Robertson, accessed February 27, 2026, https://www.hermandpadgett.com/foley-al-chapter-13-bankruptcy/