Eval Labs Canon

# Current System State > [!summary] > Eval Labs is an implemented, role-based human evaluation platform for Lucia. It supports controlled human onboarding, persisted evidence, owner/admin oversight, and evaluator-safe workflows, while some UX and rollout areas remain in active hardening. --- ## Current platform truth Status: implemented. Eval Labs is no longer only a founder or AI-agent testing tool. It is Lucia's role-based human evaluation platform: - Clerk auth works. - Clerk public metadata drives frontend role behavior through `eval_labs_role`. - The Clerk session token includes `eval_labs_role` so Supabase RLS can recognize privileged owner/admin access. - Supabase RLS protects persisted evidence. - Real runs must persist to Supabase. - Owner/admin should see shared persisted Eval Labs evidence. - Evaluator and tester data remains scoped to their own work except where owner/admin oversight applies. - Team Review exists as the owner/admin oversight surface. - Staged hydration loads run summaries first, then recent and deeper evidence, so dashboards can render faster without fake metrics. --- ## Current roles Status: implemented. Current roles: - `owner` - `admin` - `evaluator` - `tester` - unassigned or missing role Read the canonical matrix: [[08 - Eval Labs Roles and Access Matrix|Eval Labs Roles and Access Matrix]]. --- ## Current test surfaces Status: implemented. Current test surfaces: 1. Custom Prompt Test 2. Auto-generated Prompt Test 3. Guest Facing Agent Verification Check 4. Controlled Batch Runner Tester access is intentionally narrower than evaluator access. Tester is for clean prompt-testing onboarding cohorts. Evaluator is for the full evaluator workbench and evaluator-safe test types. --- ## Oversight and analysis Status: implemented. Owner/admin have full platform access, shared persisted evidence, Team Review, Global Analysis, and all test surfaces. Team Review exists for owner/admin oversight of human evaluation work: evidence quality, reviewer activity, review gaps, and escalation readiness. Global Analysis is owner/admin-only platform-wide evidence inspection. It is not a tester or evaluator onboarding surface. --- ## Human onboarding posture Status: active hardening. Eval Labs is ready for controlled human onboarding by role and assignment. Do not describe the platform as broadly production-mature or open-access. Do not describe Lucia as human-approved because the AI-reviewed platform readiness gate passed. Use: ```text implemented active hardening deferred future ``` Avoid softer labels that imply more maturity than the source state proves. --- ## Active hardening These areas are implemented but still being tightened, polished, or verified for rollout: - evaluator onboarding/workspace polish - first human cohort instructions - role-specific route verification - Clerk-to-Supabase role-claim verification after role or RLS changes - staged hydration behavior across large evidence sets - clear tester-vs-evaluator assignment guidance --- ## Deferred Deferred means intentionally outside the current access model: - tester access to Verification Check - tester access to Verification Results - tester access to Controlled Batch Runner - tester access to Team Review - tester access to Global Analysis - tester access to Registry Diagnostics - tester access to Behavioral Observatory - evaluator access to Team Review - evaluator access to Global Analysis --- ## Future Future means possible later, not current behavior: - broader public or external evaluator rollout - expanded assignment management - additional owner/admin management tooling - deeper cohort analytics beyond current oversight surfaces - more final evaluator UX polish --- ## First human onboarding readiness criteria Before a first human onboarding cohort starts: 1. Confirm every participant has Clerk auth access. 2. Confirm `eval_labs_role` is set in Clerk public metadata. 3. Confirm the session token carries the role claim used by Supabase RLS. 4. Confirm visible routes match the access matrix. 5. Run a real prompt test and verify Supabase persistence. 6. Confirm owner/admin can see shared persisted evidence where oversight applies. 7. Keep testers limited to Custom Prompt Test and Auto-generated Prompt Test. 8. Give evaluators only assignments that match evaluator-safe surfaces. 9. Name any active-hardening caveats before the work begins. 10. Repeat that AI-reviewed platform readiness is not human Lucia-quality approval.