I design and build multi-agent AI operating systems with living memory, local inference, and human-in-the-loop governance. 25+ years of enterprise architecture — now building the agentic infrastructure that compounds knowledge across sessions, agents, and machines. The rare combination of deep enterprise experience and hands-on AI systems building.
Most organizations experimenting with AI agents are running single-agent demos against isolated problems. I design and build systems where multiple specialized agents collaborate under structured governance, building on each other's work through shared memory. I also ship practical AI-powered tools that real teams use every day.
Designed a memory architecture where organizational knowledge manages itself: agents file knowledge chunks as they learn, auto-promoted lessons compile into a living wiki, and stale claims decay. Not passive retrieval — active knowledge metabolism that compounds across every session. Inspired by Karpathy's "Software 2.0" and Cherny's context engineering.
The development OS behind every project here. Forge is a harness engineering framework — it doesn't add AI to an existing process, it replaces the process entirely. Twelve specialized agents (architect, builder, analyst, reviewer, tester, debugger, and more) operate under structured SOPs, a governed backlog, TDD enforcement, and persistent session memory. The loop: a planning agent decomposes work into tickets via the PDT API; Stewart grooms intake to ready with verified AC and an agent assignment; the dashboard auto-generates an agent-scoped prompt at the ready→in-progress transition; one click on Launch copies it to the clipboard, the agent runs in a fresh session, and submission triggers an automated multi-stage review gate (lint, 700+ test suites, Eval, code review, pre-commit) that auto-closes the ticket on pass or stages a "fix it" prompt via a Relaunch button on fail. When something fails, the fix is never "try harder" — it's "what capability is missing, and how do we make it legible to the agent?"
Documentation →Applied the Forge framework to Salesforce development workflows with bidirectional Jira integration, Confluence documentation sync, and automated requirements traceability from business need to deployed code.
Designed and built an internal media campaign management system used by the HR People Team. Multi-format content creation (AI-generated audio, avatar video, text), Slack distribution, branded media player with engagement analytics. In active daily use.
A product bet exploring AI-powered compounding learning. Students upload course notes (images, PDFs, docs) and an LLM extracts atomic concepts with prerequisites and cross-course connections into a personal knowledge graph. Builds scoped study guides, with 8 built-in skills (concept extraction, confusion pair detection, exam postmortem, bridge detection). The same knowledge-graph-plus-learning-layer pattern from the enterprise work, applied to student learning.
Designed a framework for synthesizing customer engagement signals across touchpoints into a unified context layer. Identity resolution, CRM architecture, behavioral data, and real-time orchestration powering personalization and support cost reduction.
An autonomous AI operating system where eight specialized agents collaborate through governed PostgreSQL to discover, evaluate, match, and help apply to jobs. Three-layer dedup eliminates 95% of noise before LLM scoring. Two public PyPI libraries (strata-match, strata-harvest). Budget: $10–36/month for full operation.
Documentation →A Mac Mini cluster running schedule-driven local inference across three machines. Gemma 4 31B (near-lossless q8 quantization) handles deep scoring overnight; Qwen3 30B MoE (60–70 tok/s) runs during dev hours — automatic launchd transitions between them. LiteLLM routes Strata's pipeline stages to the right model: fast and balanced stages to Qwen3, deep-score stages to Gemma 4. nomic-embed-text provides always-on 768-dim embeddings. GLM-OCR handles resume parsing. Shadow-mode calibration benchmarks local quality against cloud models continuously. Zero variable inference cost for production workloads.
Without memory, agency does the same work repeatedly. An agent fleet without shared memory is individually capable but collectively starts over every time. The architecture I designed solves this with three distinct layers, each with its own failure modes and quality signals.
The learning layer is the part most teams skip. It's where raw experience becomes structured understanding: what worked, what didn't, what should be applied next time. Without it, you have storage and retrieval but no compounding.
Speed without governance means fast in the wrong direction. The framework enforces bounded authority (each agent has a narrow, architecturally enforced scope), continuous approval gates, automatic audit trails, and session isolation with conflict detection. Governance isn't a policy layer. It's a first-class architectural concern.
The system is designed for real enterprise infrastructure: Salesforce with governor limits, Jira with all its workflow complexity, Confluence as a living documentation target. Most agent demonstrations run in isolation. This one operates where the constraints are real and the consequences matter.
This architecture didn't emerge in isolation. It synthesizes ideas from Andrej Karpathy (Software 2.0, LLM OS), Boris Cherny (context engineering as the core discipline), Program-Aided Language Models (PAL) (structured reasoning through code), and Demis Hassabis (systems that learn from experience, not just data). The key insight: most agentic systems bolt memory onto agents as an afterthought. We built memory as the foundation and agents as the consumers.
Memory platforms like Mem0, Zep, and LangMem solve recall. Orchestration frameworks like CrewAI and LangGraph solve coordination. Neither solves compounding — the ability for a fleet of agents to get measurably better over time. Our system does: knowledge chunks auto-promote through confidence gates into a living wiki. Stale claims decay. Agents start sessions by reading what the fleet has learned, not by re-researching. The learning layer is the differentiator.
The thread through my career: designing technology systems that help organizations understand their customers, make better decisions, and operate more intelligently. The actors have changed over 25 years. The architecture thinking hasn't.
Self-contained, interactive HTML presentations covering the architecture, the research, and the strategic vision. Each one is a complete narrative, not a slide deck.
Strategic pitch for an enterprise-wide knowledge architecture that connects organizational silos through a shared knowledge graph with curated claims.
StrategyHow multi-agent orchestration transforms the Salesforce delivery lifecycle: confidence evaluation, brownfield discovery, and human-agent co-development.
ArchitectureDeep technical walk-through of the three-layer memory architecture, the digestive pipeline, and how knowledge compounds across agent sessions.
StrategySynthesizing engagement signals across customer touchpoints into a unified context layer. Identity resolution, behavioral data, and AI-powered orchestration.
ResearchComparative analysis of the agentic memory landscape: academic papers, open-source projects, and commercial products evaluated against the architecture. 20+ sources with claim traceability and gap analysis.
ArchitectureInside the harness engineering framework: agent network, SOP enforcement, approval gates, TDD pipeline, session continuity, and the execution loop that runs every ticket from plan to done. The architecture that makes the rest of this possible.
Observations from building multi-agent systems for real enterprise work. No theory. Just what I've learned.
Why a dozen fast agents without shared memory is just expensive repetition.
The distinction most teams miss, and the three-layer architecture that solves it.
Approval gates, bounded authority, session isolation, and calibrating the human-in-the-loop.
Why retrieval-augmented generation solves the search problem but not the knowledge problem.
Decades of enterprise complexity aren't just relevant to the agentic stack — they're the part most teams are missing.
Why I built Strata: eight governed agents, three-layer dedup, local Gemma inference, and the architecture decisions that make it run for $10/month.
Deploying quantized models across a LAN cluster with Tailscale mesh, shadow-mode calibration, and the economics of cloud vs. local.
I'm looking for my next role leading AI-native engineering organizations — where the ability to actually build agentic systems, architect for enterprise reality, and ship production-grade AI infrastructure all matter. VP/Head of AI Engineering, AI Strategy, or Agentic Platform leadership.