← Andrew Crenshaw autogenous-synthesis Forge
Development Operating System

autogenous-synthesis Forge

A purpose-built development operating system where specialized agents decompose, implement, test, and validate work end to end, under governance, with continuity at scale.

What it is

autogenous-synthesis Forge is a purpose-built engineering framework that makes agent-driven software delivery reliable, auditable, and continuously improving.

The backlog operates as a live state machine with transactional guarantees. Agents register sessions, claim work, check for file conflicts, execute under structured procedures, generate and validate test suites, and submit for review, without a human orchestrating each step.

The human role: set direction, define acceptance criteria, review output. Everything between is Forge.

AI-assisted vs. AI-first

Most AI-augmented development keeps the human as the message bus: read the ticket, write the prompt, paste the output, update the board. The AI is a faster keyboard. Forge is structured differently: it is the surrounding system that makes agents reliable actors rather than capable tools.

Typical AI-augmented workflowForge
Work stateTicket in task manager; prompt in chat; output pasted back manuallyLive state machine: agents claim work, execute, and submit via API; all transitions logged
Acceptance criteriaProse description in ticket bodyStructured ISC table; at least one machine-verifiable check required; all-prose tickets rejected by the v2 validator
Code quality gateLinter and test suite; code review is unstructured human judgment, with no formal pass/fail criteriaLinter + quality test suite + blocking code review gate, with an LLM agent review layer added for scope-sensitive changes
TDDConvention; often skipped under time pressureMechanically enforced by the verification gate; a submission without a RED to GREEN trace fails before review
Context across sessionsContext limit means starting over; prior work is lost or repeatedStructured scratchpad persists the full execution checkpoint; the next session continues from the last completed step
Risk routingUniform treatment for all changesrisk_class (trivial / standard / sensitive) sets gate strictness, model assignment, and approval path per ticket

Four defining properties

Agent-first

Agents serve as primary builders, not developer assistants. Every component (session protocol, backlog schema, SOP library, work packages) is designed to maximize what agents accomplish reliably without hand-holding. Work is legible, bounded, and enforceable.

High throughput

A single engineer runs multiple agent sessions in parallel across unrelated concerns simultaneously. File-lock arbitration prevents collisions before agents are dispatched. Work that would serialize a solo engineer runs concurrently.

High confidence

Implementation is test-driven mechanically. RED to GREEN to REFACTOR is enforced by the verification gate, not just a guideline. Machine-verifiable acceptance criteria (ISC) make pass/fail binary, and the evaluator auto-resolves routine reviews. No work reaches completion without passing automated validation.

Continuous learning

Agent sessions generate structured scratchpads that persist across context limits. The living-memory system (digestive pipeline, auto-promotion gates, decay) means agents begin sessions knowing what the fleet has already learned. Knowledge compounds across every session, agent, and project.

The backlog: a live state machine

Every backlog item moves through five stages with automated transitions:

StageCount (example)What happens
Intake15 itemsNeeds acceptance criteria written
Ready8 itemsAC verified, work package auto-generated
In Progress1 itemAgent claimed, scratchpad active
Review2 itemsEval runs first; human only if ambiguous
Done8 itemsValidated, merged, session archived

All state changes route through a single HTTP API server with atomic writes, 10-retry exponential backoff, and file-lock conflict detection. Every transition is an API call with a full audit trail: no manual drag-and-drop, no ambiguous state.

Ticket schema v2

Every backlog ticket is validated against a structured schema before it can be claimed. Key required fields beyond title and description:

Legacy v1 tickets (prose-only description, no structured fields) are read-only. Attempting to claim one returns HTTP 400 LEGACY_V1_BLOCKED.

Acceptance criteria: machine-verifiable

Every ready-stage item includes an ISC (Ideal State Criteria) table: binary pass/fail conditions with explicit verification methods. No subjective assessments.

AC1: Migrations 029-038 applied cleanly -> verified: alembic current == head AC2: All 6 new models importable -> verified: pytest -k test_models AC3: Rollback to 028 succeeds -> verified: alembic downgrade -1

Quality pipeline

TDD as infrastructure

All implementation work is test-driven mechanically, not by convention. Any submission without a RED to GREEN to REFACTOR trace fails the verification gate before reaching review. Stub implementations (throw new Error('Not implemented'), empty returns, TODO placeholders) in production files are caught by pre-commit hooks and rejected. If full implementation isn't possible in scope, a new backlog item is created, not a placeholder left in code.

3-tier code review gate

Code review is a blocking gate, not an advisory step. Every submission passes through three tiers:

The bypass.codeReview field exists for trivial tickets only and requires an explicit reason. It is enforced at the schema level, not by convention.

Human-in-the-loop governance

Speed with bounded authority. Governance is automation-first, not approval-first.

Guardrails are enforced, not suggested

Global guardrails in .agent-data/guardrails/global.md define hard mechanical rules: never write to .env* files, never execute rm -rf, never force-push to main, never commit secrets. These aren't reminders; they're validated by automated test suites running on every change. Agents violating guardrail rules fail the verification gate.

Sessions, scratchpads, and continuity

No work is lost when context runs out. Every agent session generates a structured scratchpad at .agent-data/scratchpads/active/. The scratchpad tracks the decomposition plan and sub-task status, what each sub-agent returned, whether results met acceptance criteria, and the accumulating synthesis (the actual output).

When agents hit context limits, they write state and stop cleanly. The next invocation reads the scratchpad, identifies the last completed checkpoint, and resumes from there. Zero restart cost. No duplicated work.

Work packages

When Stewart promotes backlog items to ready, Forge auto-generates work packages: self-contained handoff documents containing the problem statement and binary ISC table, exact file paths and modification requirements, quick-start commands to verify the environment, and hard constraints on what not to touch. Work packages bridge "human writes ticket" and "agent starts work": the automated prompt-engineering layer that makes agent output reliable rather than hopeful.

Observability

The system knows what it's doing and can show you.

What changes: the developer's role

"Forge doesn't make you faster. It makes you an architect of systems that develop."

Without Forge

Engineers work sequentially, one task at a time. Context switches are expensive. Tests get deferred. Documentation follows later, usually never. State lives in the engineer's head. When they're not working, nothing is.

With Forge

Multiple parallel agent sessions execute concurrently across unrelated concerns. Tests generate before implementation. Documentation produces alongside code. State persists and audits. The system works whether or not anyone is at a keyboard.

The key capability Forge creates isn't "AI writes my code." It's institutional execution capacity: the ability to run more work, in parallel, with higher confidence, than any individual developer could sustain. The system gets measurably better over time because the memory system compounds what the agent network learns across every session.

Project status

Forge is the framework powering development of Strata (an autonomous job-search operating system) and several other projects in the workspace. It is under active development.

12+
specialist agent skills operational
5
lifecycle stages, automated transitions
150+
automated test suites enforcing gates
80%
review load reduction via auto-eval

Recent work