Eval Labs Canon

# Running Your First Eval > [!summary] > This page explains the safe first-run workflow for approved reviewers. It does not replace the employee onboarding gate. --- ## Before you begin Make sure you know which testing path you are using: ```text Custom Prompt Test = targeted refinement Auto-generated 50-Prompt Test = broad regression coverage Guest Facing Agent Verification Check = booked-guest verification behavior Controlled Batch Runner = controlled platform-readiness tooling ``` If you are a tester, use only Custom Prompt Test or Auto-generated Prompt Test. If you are an evaluator, use only the evaluator-safe surfaces assigned for the work. Do not use Team Review, Global Analysis, Single Run Analysis, Registry Diagnostics, Behavioral Observatory, or owner/admin tools unless your role explicitly allows it. --- ## First custom smoke test Use this prompt: ```text What time is it? ``` Expected result: - Lucia responds with current time - run completes - Review Queue opens - no transport failure - export contains `runSource: custom` For evaluator and tester users, the run must be scoped to the signed-in user before review/finalization access is considered valid. --- ## First real review test Choose a small behavior family. Example: ```text I'm overwhelmed. I feel behind. I am so lost. I feel totally out of the loop. I don't trust that I know what's going on. ``` Run the suite. Then review each response. --- ## What to do in the Review Queue For each item: 1. Read the prompt. 2. Read Lucia's response. 3. Review any suggested selections. 4. Score each dimension honestly. 5. Choose Keep talking, Verdict, and Priority. 6. Answer the Quick Review questions. 7. Add Human Guidance Evaluation scores when useful. 8. Write notes when something feels off. 9. Save the review. The last item should show **Save**, not **Save & Next**. After the last item is saved, use the completion actions: - Finalize Run - Back to Launcher --- ## Export after reviewing Export after review when you need to share evidence with product or engineering. Do not export only the generated responses if the goal is human review analysis. Generated-only exports are useful for debugging, but reviewed exports are stronger evidence. Reviewed exports preserve the structured review, suggested review, Employee Review, Human Guidance Evaluation, adjudication metadata, lifecycle state, tester identity, and dirty/completion state. --- ## Finalize Evaluation >![[finalize-run-back-to-launcher.png]] _Finalize run and back to launcher action buttons._ >![[final-prompt-save-button.png]] _Final item in Review Queue with Save button instead of Save & Next button._ Finalize only after every prompt has been reviewed. Finalization marks the run lifecycle; it does not replace the per-prompt review data. --- ## Not part of first tester workflow Tester users should not use: - Guest Facing Agent Verification Check - Controlled Batch Runner - Run History/global analytics - Team Review - Global Analysis - Single Run Analysis - owner/admin Home dashboard Evaluator users should use verification, controlled batch, and scoped Run History only when assigned.