# Data Model and Export Contract
> [!summary]
> Eval Labs data is designed to preserve prompt context, run source, Lucia response, layered review judgment, lifecycle state, and tester identity.
---
## Session metadata
A session export includes metadata such as:
```text
id
title
mode
runSource
category
subcategory
templateKey
promptCount
status
createdAt
updatedAt
adminBranch
engineBranch
runFailureType
runFailureReason
runFailureAt
reviewLifecycle
remoteRunId
ownerUserId
ownerScopeVersion
localPayloadState
```
The important field for the custom launcher is:
```text
runSource: custom
```
Custom and Auto-generated launchers both create run sessions that flow into the same Review Queue. The session `mode` remains the run mechanics; `runSource` distinguishes whether the run came from the Custom launcher or the Auto-generated launcher.
Controlled batch readiness runs also rely on the same run/session lifecycle, but their product meaning is different: they are platform readiness evidence, not normal evaluator-facing tests.
Registry Diagnostics reads existing session/run/review evidence and derives dataset membership and review-lane suggestions from it. Those suggestions are diagnostic output, not saved labels.
Behavioral Observatory labels are separate persisted records when saved through the Behavioral Observatory flow.
---
## Cases
Each prompt becomes a case.
A case contains:
```text
id
sessionId
orderIndex
sourceType
title
promptText
promptLocked
luciaResponse
runStatus
runError
category
subcategory
templateKey
createdAt
updatedAt
```
Order matters. The exported `orderIndex` must remain stable.
---
## Prompt results
Prompt results include:
- draft review state, including reviewer input and suggestions
- saved review state
- saved timestamp
- saved-by tester identity
- completion/dirty state derived from saved vs draft review
A generated-but-unreviewed item may have null ratings and `savedBy: null`.
That is expected.
A saved review should include `savedBy`.
---
## `exportedBy` vs `savedBy`
`exportedBy` identifies who exported the file.
`savedBy` identifies who reviewed/saved the individual prompt.
This distinction matters because one person may export a run that another person reviewed.
---
## Tester identity fields
Eval Labs stores only limited identity fields:
```text
clerkUserId
email
name
```
No unnecessary Clerk metadata should be stored.
Role gating reads Clerk public metadata through `eval_labs_role`, but role metadata is product access state, not review authorship.
---
## Nulls in exports
Some nulls are normal.
Expected nulls include:
```text
runFailureType: null
runFailureReason: null
runFailureAt: null
savedBy: null when not reviewed yet
ratings: null when not scored yet
```
Do not treat every null as a bug.
Treat nulls as suspicious only when the workflow step should have populated them.
---
## Export Options and Example
>![[export-controls.png]]
_Export controls for easy usability with multiple data formats._
```json
{
"format": "lucia-eval-lab-session/v0.3",
"exportedAt": "2026-04-29T19:33:10.000Z",
"exportedBy": {
"clerkUserId": "user_3D2BItLYUO1uqJOqzlZTvHZNgsF",
"email": "
[email protected]",
"name": "Aviv Hadar"
},
"session": {
"metadata": {
"id": "session-example",
"runSource": "custom",
"status": "ready",
"reviewLifecycle": {
"status": "in_review",
"finalizedAt": null,
"finalizedBy": null
}
},
"caseOrder": ["case-001"],
"cases": {
"case-001": {
"orderIndex": 0,
"promptText": "I'm spinning a little. Tell me what to do first so I can breathe again.",
"luciaResponse": "Take a breath. This feels heavier than it is. Nothing critical is slipping beyond the first move.",
"runStatus": "success"
}
},
"promptResults": {
"case-001": {
"draft": {},
"saved": null,
"savedAt": null,
"savedBy": null
}
}
}
}
```
---
## Review-layer fields
Prompt reviews now support these additional fields:
```text
ratings
suggestedRatings
keepTalking
suggestedKeepTalking
status
suggestedStatus
priority
suggestedPriority
reviewState
luciaPredictedLabels
humanLabels
adjudication
employeeReview
suggestedEmployeeReview
humanGuidanceEval
suggestedHumanGuidanceEval
canonCandidate
```
The suggested fields are product suggestions, not final reviewer judgment. They are generated from prompt/response/run-status heuristics and remain separate from the reviewer-saved values.
Suggested fields may feed derived context in Registry Diagnostics or Behavioral Observatory, but they do not become persisted Behavioral Observatory labels unless a reviewer saves a label in the Behavioral Observatory surface.
---
## Employee Review object
Employee Review captures guided non-expert signal:
```text
understoodNeed
rightNextMove
calmingEffect
riskOrConfusion
seniorReview
reusableLearning
```
These fields are intentionally simple and should remain employee-friendly.
---
## Human Guidance Evaluation object
Human Guidance Evaluation captures a 1-5 review layer:
```text
emotionalValidation
cognitiveUnderstanding
actionability
toneAppropriateness
authenticity
notes
```
Warmth and intelligence are not separate export fields. They are expressed through the current scoring dimensions and guidance fields: `tone`, `calming`, `naturalness`, `trust`, `usefulness`, `cognitiveUnderstanding`, `actionability`, and `authenticity`.
---
## Adjudication object
Adjudication captures final senior-review meaning when it exists in the review record:
```text
finalLabels
reason
adjudicator
adjudicatedAt
```
Final labels may include:
```text
guestIntent
followThroughRequired
actionType
emotionalRead
ownerStressLevel
```
---
## Review lifecycle object
Run lifecycle finalization is stored at the session level:
```text
status: in_review | ready_to_finalize | finalized
finalizedAt
finalizedBy
```
Finalization does not replace per-prompt review data. It marks the run lifecycle after all prompts are reviewed.
---
## Supabase persistence contract
The Supabase persistence layer stores the run and item contract in three places:
```text
eval_runs.metadata.reviewLifecycle
eval_runs.metadata.metadata
eval_run_items.payload.case
eval_run_items.payload.promptRecord
eval_item_reviews
```
`eval_run_items.payload.promptRecord` embeds the full prompt review record, including saved/draft state, suggested review, employee review, Human Guidance Evaluation, adjudication metadata, canon candidate signal, tester identity, and dirty/completion state.
`eval_item_reviews` is still written for review rows, but hydration prefers the embedded `eval_run_items.payload.promptRecord` instead of relying on fragile review-table reads.
Behavioral Observatory labels are stored separately:
```text
public.eval_behavioral_labels
```
This table stores first-class Behavioral Observatory labels:
```text
run_id
run_item_id
owner_user_id
reviewer_user_id
intent
guest_affect
response_strategy
humanness
notes
status
payload
created_at
updated_at
```
The key distinction:
```text
eval_item_reviews = Review Queue review evidence
eval_behavioral_labels = Behavioral Observatory label evidence
```
One saved Behavioral Observatory label exists per reviewer per run item.
Current persisted run evidence is scoped by the signed-in Clerk user and the role claim available to Supabase RLS. Owner/admin can inspect shared persisted evidence where privileged RLS allows it. Evaluator and tester data remains scoped to their own work except where owner/admin oversight applies.
Current readiness verification checks counts across:
```text
public.eval_runs
public.eval_run_items
public.eval_item_reviews
```
For the 60-run readiness gate, the final verified result was:
```text
ready | 60 | 3000 | 3000 | 3000 | 3000
```
Meaning:
- 60 ready runs
- 3,000 expected prompts
- 3,000 run items
- 3,000 non-empty Lucia responses
- 3,000 reviews for the tested reviewer id
---
## localStorage compaction contract
Completed cloud-backed runs should not leave full item-level payloads persisted in localStorage.
The platform-readiness diagnostic target is:
```text
persistedLocalFullPayloadSessionCount = 0
persistedLocalHasItemLevelData = false
persistedLocalItemLevelDataSessionCount = 0
ownedSessionCount = expected run count
otherOwnerSessionCount = 0
ownerlessSessionCount = 0
```
The final verified 60-run diagnostic was:
```text
sessionCount = 60
persistedLocalFullPayloadSessionCount = 0
persistedLocalHasItemLevelData = false
persistedLocalItemLevelDataSessionCount = 0
ownedSessionCount = 60
otherOwnerSessionCount = 0
ownerlessSessionCount = 0
rawByteSize ≈ 68,815
```
This supports platform readiness and client compactness. It does not prove backend authorization is complete.
---
## Export rule
Exports should preserve the full review contract:
```text
employeeReview = what the reviewer experienced
suggestedEmployeeReview = what the app suggested
humanGuidanceEval = structured 1-5 human guidance
suggestedHumanGuidanceEval = app-suggested human guidance
adjudication = final senior-review metadata when present
reviewLifecycle = whether the run is still in review or finalized
```
Do not collapse these into one field.