Principia — Explainer & FAQ

One sentence: Principia is a continuously-updated, source-graded registry of organizational science that doesn't just catalog the research — it synthesizes it into Bayesian priors you can plug straight into your own analysis.

What is it?

Principia is a canonical registry of organizational measurement: the theories, constructs (e.g. work engagement, psychological safety), instruments and survey items that measure them, the citations behind them, the effect sizes between them, and — the part nobody else publishes — the synthesized Bayesian priors over those relationships.

Think of it less as a database and more as a reference work of record — in the lineage of a pharmacopoeia, APA PsycTests, or arXiv: serious, addressable, citable, and alive. It is "two-shaped": a queryable database (REST + MCP, live now) and a book manuscript ("The Principia of Organization Measurement," auto-built from the same registry).

Where it stands today (live numbers move; check npm run vision:scoreboard):

Why is it different?

  THE SHIFT                catalog  ──▶  calculator
  "here's a meta-analysis"      ──▶   Normal(μ, σ) you query + compute with
  a number on a page            ──▶   the number + its k, N, I², and every source
  built once, then frozen       ──▶   continuously expanded + re-synthesized
  what's been published         ──▶   + a living layer from real deployments

Five things, in order of importance:

  1. It exposes the prior layer. Other resources stop at "here's a meta-analysis." Principia takes the pooled evidence for a relationship (e.g. autonomy → work engagement) and synthesizes it into a usable prior distributionNormal(0.43, 0.09), with k, N, contributing studies, a credibility interval and heterogeneity (I²) so you can see how consistent the evidence is, and a freshness score — that drops directly into Stan / PyMC / brms. And you can fuse it with your own data: POST /api/v1/posterior combines a published prior with your observed result into an updated posterior (Bayesian normal-normal). Nobody else publishes this. It's the difference between a library and a calculator.
  1. Every row traces to a primary source, and the source is graded. No fabricated numbers. A value is either read from a real results table (with the page/quote in its provenance) or marked unverified. Citations are cross-validated (CrossRef for DOIs; multi-model agreement otherwise; Scite citation-context for evidence verdicts). The grade travels with the data.
  1. It's continuously updated, not batch-then-frozen. Principia runs a loop — expand, verify, enrich, augment — as standing infrastructure. New meta-analyses get pulled in, priors re-synthesize as evidence arrives, psychometrics deepen as instruments accrue citations. A static catalog reads as a frozen idea; Principia is visibly growing.
  1. Breadth across the whole field, not one pet construct. Coverage spans leadership, justice, OCB, burnout/strain, work–family, POS, PsyCap, P-E fit, personality, performance, turnover, cultural values, and more — 96 families and climbing — with the gaps explicit rather than hidden.
  1. It can be checked against reality, not just other papers. Published research has a structural blind spot — the file drawer: studies that find nothing often go unpublished, so any synthesis of the published record is built on a selected sample. That's everyone's problem, not ours. But because Principia connects to a toolbox that runs validated measures inside real organizations, it has a path most reference works don't: a stream of primary evidence that can confirm and refine the published priors. (See "The living layer" below.)

How does it work?

Three jurisdictions, one contract. Everything federates through a shared schema package (@measurement/core), so no consumer ever inlines its own definitions:

The pipeline, concretely:

  1. Source — watchlists (Scholar/OpenAlex/Scite) and parallel sourcing agents read pooled effect sizes from real meta-analytic tables and write proposals. Agents never write to the store and never fabricate.
  2. Curator-gate (policy D4) — a human promotes a proposal via a CLI (promote-effect-size) that requires a real DOI. Automation proposes; a human commits. This is the integrity backbone.
  3. Synthesize — promoted effect sizes are pooled into a CanonicalPrior with full provenance.
  4. Serve — the prior is queryable at a stable URL (/api/v1/priors/{from}/{predicate}/{to}) and as an MCP tool, with a link back to its contributing studies and citations.
  SOURCE ─────────▶ CURATOR-GATE ─────▶ SYNTHESIZE ──────▶ SERVE
  agents read real   a human promotes    pool the effect    REST /api/v1 +
  meta-analysis      each proposal       sizes into a       MCP — every prior
  tables → write     (needs a real DOI;  CanonicalPrior     links back to its
  proposals only     policy D4)          w/ full provenance studies + citations
  ·never fabricate   ·automation         ·credibility
                      proposes, a         interval + I²
                      human commits

Storage & surface: a JSON store today (bundled into the app), migrating to Postgres/Neon for the standalone deploy. The same store feeds the REST API, the MCP gateway, the reader UI, and the book build.

A prior, end to end (the thing nobody else gives you)

Ask Principia one question and here is the actual answer — not a description of one:

  GET /api/v1/priors/work_engagement/predicts/task_performance

  prior     Normal(μ = 0.49, σ = 0.021)      ← drop straight into Stan / PyMC / brms
  evidence  k = 6 meta-analyses · N = 84,331 people
  spread    I² = 0.89  (heterogeneous — so read the credibility interval, not just μ)
  grade     highly_informative
  sources   Christian, Garza & Slaughter 2011   ρ = .43
            Neuber et al. 2022                   ρ = .48  (k = 179)
            …each links to its citation, its quality grade, and an evidence verdict

That is the whole pitch in one card: a usable distribution, the evidence behind it, how much the studies disagree, and a trail back to every source. A meta-analysis library gives you the paper; Principia gives you the number you can compute with — and the receipts.

What does it enable?

One map for everything you can measure about people at work

Organizations measure their people in three traditions that grew up separately — and Principia is the place they finally sit on one map:

A metric is not measured the way a survey is, so they don't collapse into one table — but they attach to the same construct layer, and that's the point. Because a concept like turnover can be measured both by a survey (intention to quit) and by an operational metric (the actual attrition rate), the relationship network lets the two worlds talk: a leader holding operational data can walk into the construct map and pull in the survey-based and meta-analytic knowledge — and trace an HR program → metric → firm outcome. What you can measure connects to what you care about, and to what moves it. (Direction: measurement-ontology-direction.md.)

The living layer — real organizations, not just published research

Most reference works are libraries of what's been published. Principia is built to be more than that, because of how it connects to the People Analytics Toolbox.

When organizations work with us, validated measures get deployed in real workplaces, and the results — reliabilities, response distributions, relationships between things — come back anonymized and aggregated (never raw, never identifiable). Crucially, that evidence enters the record regardless of how it turns out. There's no file drawer: a deployment that finds "nothing interesting" counts exactly as much as one that finds a strong effect. That is the same design that lets clinical-trial registries and large replication projects sidestep publication bias — applied, for the first time, to the measurement of work.

This creates a genuine flywheel, and it's bidirectional:

The honest boundaries matter, and we keep them visible: client organizations aren't a random sample of all organizations, most of this data is observational rather than experimental, and the value compounds as more organizations participate. We frame our role as contributing primary evidence to a problem the whole field shares — not as having the last word.

Benchmarks are the first, most tangible payoff. We can offer reference ranges for approved items starting from the published literature on day one — wide and clearly caveated where the literature is thin — and every deployment makes those ranges tighter and more trustworthy. A benchmark dataset that grows itself, before we've ever sold a standalone benchmarking engagement.

How are we using it in the People Analytics Toolbox?

The toolbox consumes Principia through a dedicated principia-connector spoke — the same pattern it already uses for BLS / O*NET / NAICS.

Shipped:

Planned / in flight:

The division of labor: Principia is the science of record; the toolbox is where that science gets operationalized on real workforce data.

How can it be commercialized?

The architecture is deliberately set up to support tiered access; concrete pricing is still open (SPEC §15), but the surfaces exist:

  1. Tiered API / MCP access — the planned posture is free / researcher / commercial consumer key tiers (per-consumer auth, scopes, and rate limits are already built into /api/v1). Free for browsing and light academic use; paid for commercial volume and write-back.
  2. The priors-as-a-service angle — the prior layer is the unique, defensible asset. Commercial Bayesian/analytics products (in-house data-science teams, survey vendors, HR-tech) can license programmatic prior access rather than rebuilding meta-analytic synthesis themselves.
  3. The book — "The Principia of Organization Measurement" as a paid artifact (print/PDF/EPUB), with the registry as the living companion.
  4. Toolbox differentiation — Principia is a moat for the People Analytics Toolbox: citation-grounded, prior-backed analytics that competitors can't easily replicate. It raises the toolbox's value even if Principia itself were never separately monetized.
  5. Embedded / OEM — other people-analytics or research platforms embedding Principia lookups (white-labeled construct/instrument/prior resolution) under a commercial license.
  6. Editorial / audience — the peopleanalyst.com editorial surface argues the science and drives reach; the registry is the object it points at. Audience → credibility → API/partnership pipeline.
  7. Benchmarks & the evidence network — item-level reference ranges ("is this score typical?") are a wedge people-analytics buyers already pay for, and we can offer them bootstrapped on day one. As more organizations participate, the anonymized, aggregated evidence base becomes a defensible asset no literature-only competitor can replicate — and bidirectional participation (clients contribute aggregate evidence, get the field's outside view in return) is itself the relationship that deepens the moat.

IP posture: the registry surfaces instrument restrictions (public-domain / open / permission-required / proprietary) and never republishes item text that violates licensing — which keeps the commercial surface clean and defensible.

What's the big vision?

Become the world's most comprehensive abstracted-research library for organizational science — and the canonical source of Bayesian priors used in organizational research and applied people analytics.

Not "an engagement database." The destination is a standard of record for how organizations are measured: a reference whose rows are cited in papers, whose priors are plugged into analyses, whose MCP tools are how AI agents reason about org science, and whose coverage spans the field with the loop visibly keeping it current. Three public faces, by job:

And it grows a living layer alongside the published one: a primary-evidence network where real organizations (in aggregate, with consent) help confirm and refine the field's research — the thing a literature-only reference can never become.

The launch bar is deliberately high: the standalone surface goes fully public only when the expansion loop is observably running — "world's most comprehensive" can't credibly launch as a static catalog.

What's the current work at hand?