Building an AI Job Search OS from Scratch

Why I built Strata: eight governed agents, three-layer dedup, local Gemma inference, and the architecture decisions that make it run for $10/month.

Job searching in 2026 is broken in a specific, structural way. There are hundreds of career pages, dozens of ATS platforms (each with different APIs, or no API at all), and the signal-to-noise ratio is terrible. You either spend hours scrolling job boards, or you spray applications and hope. Neither is a strategy.

I built Strata to solve this for myself — and to prove out a set of architecture ideas I'd been developing in a completely different context.

The architecture bet

Strata isn't a script that scrapes LinkedIn. It's an operating system with eight specialized agents collaborating through a governed PostgreSQL database. Each agent has a specific job, scoped database access, and a clear boundary with every other agent.

Agent	Job	Human Gate
Profile	Resume → Candidate Intelligence Profile	Yes
Target	Manage company list with priority tiers	Yes
Scraper	Daily career page sweeps via strata-harvest	No
Match	Two-stage scoring (vector gate → LLM tiers)	No
Apply	Tailored resumes + cover letters	Yes (never auto-submits)
Skills	Gap analysis + learning recommendations	No
Network	Contact intelligence + outreach timing	Partial
Brand	Content generation + positioning	Yes

The governance model isn't optional. The Scraper Agent can't modify your profile. The Profile Agent can't submit applications. Access is enforced at the repository layer through ScopedRepository, not by convention.

The cost problem (and how we solved it)

The breakthrough insight: you don't need to LLM-score every listing. Most matches are obvious non-matches. The architecture uses a three-layer dedup funnel that eliminates 95%+ of noise before any LLM call happens:

Layer	Method	Cost	What It Catches
L1: URL/ID	Exact match	Free	Same listing scraped twice
L2: Content Hash	SHA-256 of normalized text	Free	Same listing, different URL
L3: Vector Similarity	pgvector cosine distance	~$0.0001	Same job reworded or reposted

After dedup, a vector floor filter eliminates listings that are clearly irrelevant (cosine distance > 0.30 from your profile embedding). Only then does the Match Agent invoke LLM scoring — and even that uses 80/15/5 model routing: easy matches go to GPT-4o-mini (80%), medium to Haiku (15%), hard to Sonnet (5%).

The numbers: From ~800-1,200 raw listings discovered daily, the funnel reduces to ~15-25 deep-scored and ~3-5 applications per week. Total LLM cost with optimizations: $7-10/month. Full operating cost including hosting: $10-36/month.

The three-repo strategy

Strata splits into three repositories. The private repo contains the full application — agents, governance, pipelines, dashboard. Two public repos are standalone, stateless libraries published to PyPI:

strata-harvest — Career page scraping and ATS parsing. Point it at any company careers page, get back structured job listings. Auto-detects Greenhouse, Lever, Ashby; falls back to LLM extraction for unknown ATS platforms. pip install strata-harvest.

strata-match — Two-stage vector + LLM job-to-profile matching. Feed it a job listing and a candidate profile, get back a nuanced match score with detailed reasoning. pip install strata-match.

Public libraries return data. The private repo decides what to persist and how to govern it. This separation means anyone can use the scraping and matching engines independently.

What I learned

Governance is the feature, not the constraint. I came into this project with governance patterns from the multi-agent orchestration framework I'd been building. Strata validated them in a completely different language and stack (Python/FastAPI vs TypeScript/Node). The ScopedRepository pattern — where each agent declares which tables it can write to at the class level — is the single best architectural decision in the project. It prevents entire categories of bugs.

The dedup funnel is where the real engineering lives. The LLM calls are the easy part. The hard part is building the filtering pipeline that ensures you only invoke expensive models when it actually matters. Three free/cheap layers before any LLM call is the pattern.

Local inference changes the economics entirely. Running Gemma on a Mac Mini via MLX means the medium-tier deep scoring is effectively free. Shadow-mode calibration (running both cloud and local models, comparing results) lets you validate quality before switching. We're tracking Pearson R correlation and routing-tier agreement to know exactly when local is good enough.

Build it for yourself first. Strata exists because I needed it. That's the best forcing function for honest architecture. When you're the user, you can't hand-wave over UX problems or defer the hard parts. Every agent governance gap I found, I found because it affected my own job search.

The source for strata-harvest and strata-match is public. The architecture patterns — governed multi-agent systems, cost-conscious LLM pipelines, three-layer dedup — are applicable to any domain where you need agents to collaborate reliably at scale.