Extract
Extract turns documents into atoms. It runs whenever a document is
uploaded or re-uploaded, and re-running on the same document
produces zero new atoms (idempotency by identityKey).
How it works
- Document is uploaded;
documentParserow is created - The cron worker
/api/cron/process-extractionspicks it up extractAgreementGraphcalls the LLM with prompts from
src/lib/substrate/extraction/prompts/
- Each returned atom is validated against the kind's Zod schema in
src/lib/substrate/schemas/ then written via createAtom
- Edges are written via
createEdgeto record the typed relations
(e.g. defines, references)
Source quotes are mandatory
Every atom carries sourceQuote, sourcePage, sourceCharStart,
sourceCharEnd. The validator rejects atoms without; "I think this
is in there" is not a Nebula atom.
Domain pack effects
Active packs (defence, commonwealth_compliance, infrastructure)
contribute extra prompt instructions, custom edge types, and
post-extraction validators. See /docs/security/residency for the
defence pack's classification atom emission rules.
Cross links
/docs/concepts/atoms: what an atom is/docs/api/atoms: API endpoints over atoms