Eval Labs Canon

# Release and Validation History > [!summary] > This page captures the major Eval Labs hardening steps that made the system ready for real Lucia development. --- ## April 2026 milestone Eval Labs became usable for active Lucia development after the custom prompt, review, tester identity, environment, CORS, and persistence work landed. --- ## Key product changes ### Custom prompt launcher Commit: ```text 884f194 evals: add custom prompt launcher and saved suites ``` Added: - fork landing page - custom 1–10 prompt launcher - saved custom suites - custom run history - `runSource: custom` - shared Review Queue reuse --- ### Review completion and navigation Commit: ```text 3ef6fb5 evals: improve review completion and navigation ``` Added: - final prompt button changes from **Save & Next** to **Save** - completion action area - top-left brand home navigation - clickable breadcrumbs - `dist/` removed from Git tracking --- ### Tester identity exports Commit: ```text 16f53cd evals: attach tester identity to exports ``` Added: - `TesterIdentity` - prompt-level `savedBy` - top-level `exportedBy` - reviewer identity in CSV/Markdown exports - Clerk identity normalization --- ### Supabase run item conflict target Commit: ```text d51f9b1 evals: fix run item upsert conflict target ``` Changed initial `eval_run_items` upsert target from `id` to `run_id,item_index`. --- ### Supabase row identity reconciliation Commit: ```text fd8a366 evals: reconcile run item ids before upsert ``` Fixed primary-key collisions by reusing existing run item row IDs for the same logical `run_id + item_index` slot. --- ## Environment hardening ### Netlify Changed Eval Labs endpoint to: ```text https://api-dev.hellolucia.ai/admin/operator-focus ``` ### Render dev Engine Updated `ADMIN_ALLOWED_ORIGINS` to include: ```text https://evaluationlabs.ai https://www.evaluationlabs.ai ``` --- ## Validation outcome Validated: - Eval Labs deployed site calls `api-dev` - Engine returns 200 - custom prompt run succeeds - Supabase persistence succeeds - no CORS error - no `eval_run_items` 409 after latest bundle - exported identity metadata works --- ## Current status ```text Eval Labs is ready for active Lucia dev refinement and has passed the AI-reviewed platform readiness gate. ``` Use custom prompt suites as the primary tool for behavior-family refinement. Do not overclaim this as human Lucia-quality approval. --- ## May 2026 review-layer milestone Eval Labs gained a full layered review architecture: - adjudication-ready review schema - guided Employee Review fields - suggested review layer - Human Guidance Evaluation - Quick Review UX for non-expert reviewers - review state controls and routing - adjudication queue filters - canon-candidate workflow - JSON, CSV, and Markdown export parity for structured review evidence - lifecycle finalization - Supabase `promptRecord` payload persistence - dirty/completion state preservation - semantic stepped rating sliders - native-feeling confidence bar visual design Current doctrine: ```text Employee reviewers capture reaction. Senior adjudication assigns canonical meaning. ``` This should be treated as a major product and Canon milestone, not a cosmetic UI change. --- ## May 2026 product-surface and access milestone Eval Labs was refined into a more complete internal product surface: - top app shell owns page identity - in-page blog-style mastheads were removed from the app - Custom, Auto-generated, and Controlled Batch Runner surfaces were split - `/lucia/auto-generated` became the canonical normal generated tester route - `/lucia/automated` remains a legacy alias - `/analysis` became the canonical Global Analysis route - `/experiments` remains a legacy alias - Single Run Analysis was added at `/analysis/runs/:sessionId` - Run rows were standardized with two-zone layout and Copy dropdown patterns - Copy Session ID / Copy Deep Link controls were added across key surfaces - Global Analysis loading was fixed to show immediately - role-gated owner/admin/evaluator behavior was added as the initial product gate Historical access limitation at this milestone: ```text Backend/RLS enforcement still required before external evaluator rollout. ``` Current role and RLS posture is documented in [[03 - Current System State|Current System State]] and [[08 - Eval Labs Roles and Access Matrix|Eval Labs Roles and Access Matrix]]. --- ## May 2026 AI-reviewed platform readiness gate Final gate result: ```text 60 / 60 completed runs 3,000 expected prompts 3,000 eval_run_items 3,000 Lucia responses 3,000 reviews ``` Supabase verification result: ```text ready | 60 | 3000 | 3000 | 3000 | 3000 ``` localStorage verification: ```text sessionCount = 60 persistedLocalFullPayloadSessionCount = 0 persistedLocalHasItemLevelData = false persistedLocalItemLevelDataSessionCount = 0 ownedSessionCount = 60 otherOwnerSessionCount = 0 ownerlessSessionCount = 0 rawByteSize around 68,815 ``` This proves: - run creation - Lucia response capture - review generation - review persistence - run finalization - Run History truth - Global Analysis truth - Supabase/UI count agreement - localStorage compactness - controlled batch lifecycle - no visible cross-owner local leak in the tested owner context This does not prove: - Lucia is human-approved - Lucia is ready for real operator use - employee rollout is complete - human evaluators agree with AI scoring - backend/RLS permissions are complete by themselves Read next: [[04 - AI-Reviewed Platform Readiness Gate|AI-Reviewed Platform Readiness Gate]].