Eval Labs Canon

# Team Usage Guidelines > [!summary] > Eval Labs is only useful when reviewers are consistent, honest, specific, and operating inside their approved access scope. --- ## Reviewer expectations Reviewers should be: - honest - specific - consistent - grounded in the quality bar - willing to fail polished responses - careful with emotional signals - precise in notes --- ## Do - write clear notes - identify patterns - flag truth issues - mark uncertainty - compare against the user's actual need - save reviews before exporting reviewed evidence - use custom suites for targeted refinement - use auto-generated runs for broader regression checks only when your role allows it - keep AI-reviewed platform readiness separate from human Lucia-quality approval - keep derived diagnostic suggestions separate from saved Behavioral Observatory labels --- ## Do not - pass weak responses to be nice - reward fancy wording - ignore tone failures - skip notes on borderline responses - treat one lucky response as proof - mix too many behavior families into one custom suite - confuse generated-only exports with reviewed exports - treat controlled batch results as human approval of Lucia quality - treat Registry Diagnostics suggestions as saved labels - treat Behavioral Observatory labels as global Lucia approval - use owner/admin surfaces from an evaluator role --- ## Review notes Good notes sound like: ```text Correct operational priority, but Lucia missed the user's disorientation signal and did not provide containment. ``` Bad notes sound like: ```text Seems fine. ``` --- ## Team standard If another teammate cannot understand your review note, it is not specific enough. --- ## When to escalate Escalate a pattern when: - the same failure appears across 3+ related prompts - the failure affects trust - the failure affects distress handling - the failure causes wrong operational prioritization - the failure appears after a new deploy --- ## What not to escalate Do not escalate a single minor wording preference unless it represents a broader pattern. Eval Labs is for product signal, not personal taste fights. --- ## Updated reviewer guidance Employees should prioritize speed, honesty, and consistency. Do: - use the guided controls - flag senior review when uncertain - write short notes only when they add context - mark reusable learning only when the pattern feels durable Do not: - invent new labels - write long essays - create taxonomy language - treat personal taste as product signal - overthink every prompt The goal is clean signal, not intellectual performance. --- ## Access rule Access is role-based by design. Testers should use only Custom Prompt Test and Auto-generated Prompt Test. Evaluators should use evaluator-safe test surfaces and their own run/review/history routes. Owner/admin-only surfaces include Team Review, Global Analysis, Registry Diagnostics, Behavioral Observatory, all-user analytics, cleanup/tools, and future admin/tools. Do not onboard broader employee workflows until [[03 - Employee Onboarding Gate|Employee Onboarding Gate]] is satisfied. For the simple surface-by-surface path, read [[04 - Eval Labs Step-by-Step Operator Guide|Eval Labs Step-by-Step Operator Guide]].