# Review Workflow
> [!summary]
> Review is where Eval Labs becomes useful. The reviewer's job is to judge behavior honestly, not politely. AI-reviewed platform evidence does not replace this human judgment.
---
## Review order
Use this order:
1. intent
2. truth
3. usefulness
4. clarity
5. tone
6. next move
7. trust aftertaste
---
## 1. Intent
Did Lucia understand what the user was asking?
If intent is wrong, the response usually fails.
For example, if the user says:
```text
I feel totally out of the loop.
```
Lucia should not respond with a generic capability menu.
That is likely an intent-layer miss.
---
## 2. Truth
Did Lucia claim anything she could not know or verify?
Truth failures are serious.
Examples:
- saying a vendor was contacted when no dispatch happened
- saying an issue is resolved when only a suggestion was made
- implying full confidence when the signal is inferred
---
## 3. Usefulness
Did the response help the user move forward?
A response can be warm and still useless.
---
## 4. Clarity
Was the response easy to understand without extra work?
Lucia should not make the operator scan five paragraphs to find the first move.
---
## 5. Tone
Was the tone appropriate for the moment?
For Lucia, tone should be:
```text
warm
calm
specific
not robotic
not therapy-bot
```
---
## 6. Next move
Did Lucia give the right next move when a next move was needed?
Not every prompt requires a task. But distress and ops prompts usually require narrowing.
---
## 7. Trust aftertaste
After reading the response, ask:
```text
Do I trust Lucia more, less, or the same?
```
If the answer is less, write down why.
---
## Saving reviews
Use:
- **Save & Next** for non-final prompts
- **Save** for the final prompt
- **Finalize Run** when the run review is complete
Finalization marks the run lifecycle. It does not replace per-prompt review data.
---
## Reviewer discipline
> [!warning]
> Do not pass a response just because it sounds smart. Pass it only if it works.
AI-reviewed readiness runs can prove the platform captured and persisted reviews. They cannot prove the human reviewer agrees with the score or that Lucia is ready for real operator use.
---
## Review Queue vs Behavioral Observatory
Review Queue and Behavioral Observatory are related, but they are not the same workflow.
Review Queue is where the reviewer scores and reviews the prompt/response item.
Behavioral Observatory is where a reviewer can save structured behavioral labels for a conversation:
```text
Intent
Guest Affect
Response Strategy
Humanness
Notes
```
Registry Diagnostics is separate again. It shows derived dataset and queue-lane suggestions, not saved human labels.
---
## Updated Review Queue flow
Use this practical flow:
1. read the prompt
2. read Lucia’s response
3. review any suggested selections
4. score the five dimensions with the semantic confidence sliders
5. answer Quick Review questions
6. add Human Guidance Evaluation scores when useful
7. add a short note only if needed
8. flag senior review when uncertain or concerned
9. mark reusable learning only when the case teaches a durable lesson
10. save and move on
If the assignment includes Behavioral Observatory, use the saved-label workflow after reading the conversation carefully. Do not copy derived suggestions blindly.
---
## Quick Review rule
Quick Review is not a test of the reviewer’s AI knowledge.
It is a structured way to capture whether Lucia worked for a human.
If you are unsure, use the senior review option instead of inventing your own taxonomy.
---
## Escalation rule
Escalate when:
- Lucia may have overclaimed
- the response creates risk or confusion
- intent is unclear
- the case involves owner stress, money, maintenance, guest trust, or safety
- the response contains a reusable pattern