# Running Your First Eval
> [!summary]
> This page explains the safe first-run workflow for approved reviewers. It does not replace the employee onboarding gate.
---
## Before you begin
Make sure you know which testing path you are using:
```text
Custom Prompt Test = targeted refinement
Auto-generated 50-Prompt Test = broad regression coverage
Guest Facing Agent Verification Check = booked-guest verification behavior
Controlled Batch Runner = controlled platform-readiness tooling
```
If you are a tester, use only Custom Prompt Test or Auto-generated Prompt Test.
If you are an evaluator, use only the evaluator-safe surfaces assigned for the work.
Do not use Team Review, Global Analysis, Single Run Analysis, Registry Diagnostics, Behavioral Observatory, or owner/admin tools unless your role explicitly allows it.
---
## First custom smoke test
Use this prompt:
```text
What time is it?
```
Expected result:
- Lucia responds with current time
- run completes
- Review Queue opens
- no transport failure
- export contains `runSource: custom`
For evaluator and tester users, the run must be scoped to the signed-in user before review/finalization access is considered valid.
---
## First real review test
Choose a small behavior family.
Example:
```text
I'm overwhelmed.
I feel behind.
I am so lost.
I feel totally out of the loop.
I don't trust that I know what's going on.
```
Run the suite.
Then review each response.
---
## What to do in the Review Queue
For each item:
1. Read the prompt.
2. Read Lucia's response.
3. Review any suggested selections.
4. Score each dimension honestly.
5. Choose Keep talking, Verdict, and Priority.
6. Answer the Quick Review questions.
7. Add Human Guidance Evaluation scores when useful.
8. Write notes when something feels off.
9. Save the review.
The last item should show **Save**, not **Save & Next**.
After the last item is saved, use the completion actions:
- Finalize Run
- Back to Launcher
---
## Export after reviewing
Export after review when you need to share evidence with product or engineering.
Do not export only the generated responses if the goal is human review analysis.
Generated-only exports are useful for debugging, but reviewed exports are stronger evidence.
Reviewed exports preserve the structured review, suggested review, Employee Review, Human Guidance Evaluation, adjudication metadata, lifecycle state, tester identity, and dirty/completion state.
---
## Finalize Evaluation
>![[finalize-run-back-to-launcher.png]]
_Finalize run and back to launcher action buttons._
>![[final-prompt-save-button.png]]
_Final item in Review Queue with Save button instead of Save & Next button._
Finalize only after every prompt has been reviewed. Finalization marks the run lifecycle; it does not replace the per-prompt review data.
---
## Not part of first tester workflow
Tester users should not use:
- Guest Facing Agent Verification Check
- Controlled Batch Runner
- Run History/global analytics
- Team Review
- Global Analysis
- Single Run Analysis
- owner/admin Home dashboard
Evaluator users should use verification, controlled batch, and scoped Run History only when assigned.