# Eval Labs Review Layer Release Notes
> [!summary]
> This page records the May 2026 review-layer evolution: shared run launchers, employee review, suggested selections, Human Guidance Evaluation, adjudication-ready schema, exports, queue filters, lifecycle finalization, and the later platform-readiness split between normal testing and controlled batch gates.
---
## May 2026 review-layer milestone
Eval Labs evolved from a prompt runner into a layered review product.
The key change:
```text
custom or automated run
→ shared Review Queue
→ suggested selections plus human review
→ lifecycle finalization and export
```
---
## Major shipped changes
### Adjudication-ready schema
Added review model support for:
```text
reviewState
luciaPredictedLabels
humanLabels
adjudication
employeeReview
suggestedEmployeeReview
humanGuidanceEval
suggestedHumanGuidanceEval
canonCandidate
reviewLifecycle
```
These fields are preserved through local storage, Supabase payload persistence, dirty-state detection, and exports.
---
### Employee Review layer
Added guided employee-review fields:
```text
understoodNeed
rightNextMove
calmingEffect
riskOrConfusion
seniorReview
reusableLearning
```
These replace freeform taxonomy collection for non-expert reviewers.
The app can also suggest Employee Review answers from prompt/response heuristics. The suggestion is visible as suggested signal; the reviewer still saves the human review.
---
### Suggested review layer
Added app-suggested review values for:
```text
1-10 ratings
keepTalking
pass / refine / fail
priority
Employee Review answers
1-5 Human Guidance Evaluation scores
```
These suggestions come from prompt text, Lucia response text, run status, run errors, and simple response-quality heuristics such as clear next move, calming language, list-heavy output, robotic language, fake empathy, and overclaiming.
They are not canonical truth.
---
### Review Queue UX
The Review Queue now favors guided employee judgment:
- single-column Quick Review flow
- numbered question cards
- separate selection boxes
- suggested selections
- reduced freeform text burden
- senior-review routing
- canon-candidate routing
- Human Guidance Evaluation
- Save / Save & Next / Save & next flagged flows
- search and workflow filters
- JSON, CSV, and Markdown export controls
- finalization after all prompts are reviewed
---
### Semantic confidence sliders
The “How did Lucia do?” scoring section moved from 1–10 button rows to stepped semantic confidence sliders.
The final design direction:
```text
low score → muted concern
mid score → soft uncertainty
high score → restrained confidence
```
The sliders should feel like native OS controls: calm, premium, tactile, and low-friction.
---
### Adjudication queue filters
Added workflow queue filters for:
```text
Needs final call
Canon candidates
```
This lets senior review focus on the cases that matter most.
This release supports adjudication routing, metadata, and exports. It does not depend on a separate senior-adjudication editing screen.
---
### Exports
JSON, CSV, and Markdown exports now preserve structured review, suggested review, Employee Review, Human Guidance Evaluation, adjudication metadata, lifecycle state, tester identity, and prompt dirty/completion state.
---
### Supabase persistence
Supabase persistence now stores run lifecycle metadata on `eval_runs`, embeds the full case and prompt review record in `eval_run_items.payload`, and writes `eval_item_reviews` rows for review persistence.
Hydration prefers the embedded `eval_run_items.payload.promptRecord` over fragile review-table reads.
---
## Current doctrine impact
This release established a new Eval Labs principle:
```text
The app may suggest.
The reviewer must decide.
Senior meaning stays separate from employee signal.
```
This should be protected in future product work.
---
## Product surface refinement
After the review-layer release, Eval Labs was refined into a clearer product surface:
- top app shell owns page identity
- in-page blog-style mastheads were removed from the app
- Custom Prompt Test, Auto-generated Prompt Test, and Controlled Batch Runner are separate surfaces
- Controlled Batch Runner is controlled readiness tooling; current access is owner/admin/evaluator, not tester
- Auto-generated Prompt Test remains the normal 50-prompt generated tester
- Run History rows use a standardized two-zone layout
- copy controls use Copy Session ID / Copy Deep Link patterns across key surfaces
- Single Run Analysis gives read-only run-level evidence outside the Review Queue
---
## Readiness doctrine added
The AI-reviewed platform readiness gate passed after 60 completed runs and 3,000 reviewed prompts.
This extends the review-layer doctrine:
```text
The app may suggest.
The reviewer must decide.
AI-reviewed platform readiness is not human Lucia-quality approval.
```
Protect this distinction in future release notes and onboarding language.