Eval Labs Canon

# Behavioral Observatory > [!summary] > Behavioral Observatory is the first-class Eval Labs surface for reviewing conversations and saving structured behavioral labels. Derived suggestions can help the reviewer start, but only saved labels count as Behavioral Observatory label data. --- ## Status ```text Behavioral Observatory surface: implemented Behavioral label controls: implemented Behavioral label persistence: persisted when Supabase table is applied and save succeeds Derived context: derived Saved Behavioral Observatory label: persisted Entry-level employee rollout: future / access-dependent Evaluator-reviewing-owner-run workflow: deferred ``` Canonical route: ```text /behavioral-observatory ``` Current access intent is owner/admin. Broader employee use requires approved access and security decisions. --- ## Plain-English definition Behavioral Observatory is a first-class Eval Labs product surface for reviewing conversations and saving structured behavioral labels. It answers: - What was the human trying to do? - How did the human feel? - What response strategy did Lucia use? - How human did the response feel? - What notes should be preserved as behavioral evidence? --- ## What it is Behavioral Observatory is: - a conversation review surface - a structured behavioral labeling workflow - a place to compare the human message and Lucia's response - a place to save reviewer intent, affect, strategy, humanness, and notes - a persisted evidence layer when Supabase confirms the save This is the surface where saved Behavioral Observatory labels become real behavioral data. --- ## What it is not Behavioral Observatory is not: - Registry Diagnostics - a dataset membership debugger - a queue-routing model debugger - a replacement for Review Queue scoring - a claim that Lucia is globally human-approved - a guarantee that future evaluator workflows are already supported Behavioral Observatory labels are specific to the reviewed run item and reviewer. --- ## Difference from Registry Diagnostics Registry Diagnostics shows derived classification suggestions from existing Eval Labs data. Behavioral Observatory lets a reviewer save structured behavioral labels. Use this rule: ```text Registry Diagnostics explains what the classification model thinks. Behavioral Observatory records what the reviewer intentionally saved. ``` --- ## Derived suggestion A derived suggestion comes from existing run/review fields. It may prefill or suggest: - intent - guest affect - response strategy - humanness Derived suggestions are useful starting points. They are not final human judgment. They are not saved Behavioral Observatory labels. --- ## Saved label A saved label is a Behavioral Observatory label saved by a reviewer. Saved labels: - are stored in Supabase - reload after refresh when persistence is available - count as real Behavioral Observatory label data - drive persisted Behavioral Observatory distributions and trends - should be treated as intentional behavioral evidence --- ## Label fields ### Intent Intent describes what the human was trying to do. Current values: - Booking Help - Check-In - Checkout - Billing - Noise - Room Issue - Concierge - Other Use `Other` only when the conversation does not fit the listed categories. ### Guest Affect Guest Affect describes the human's emotional state. Current values: - Neutral - Mildly Upset - Upset - Grateful Do not over-dramatize affect. Mark the smallest truthful emotional read. ### Response Strategy Response Strategy describes what Lucia did as her main response move. Current values: - Acknowledge - Apology - Offer - Escalation Choose the dominant strategy, not every strategy present in the text. ### Humanness Humanness is a 1-7 judgment of how human the response felt. Current anchors: ```text 1 = Template 4 = Functional 7 = Warm + Specific ``` Do not use humanness as a general pass/fail score. A response can feel warm and still fail truth or usefulness. ### Notes Notes preserve short behavioral evidence. Notes should explain the judgment when the structured fields alone are not enough. --- ## Good notes Good notes are short, specific, and evidence-based. Examples: ```text Good: Guest sounds anxious about check-in; Lucia gave one clear access-code next step. Good: Apology is appropriate, but no operational next move was offered. Good: Warm and specific, but implies the team already acted when only a suggestion exists. ``` Good notes name the behavior and the reason it matters. --- ## Bad notes Bad notes are vague, personal, or not evidence-based. Examples: ```text Bad: Sounds good. Bad: I like this one. Bad: Weird vibe. Bad: Make it more AI. ``` Bad notes create noise. If the structured fields already tell the story, leave notes brief or empty. --- ## What happens when a label is saved When a reviewer saves a Behavioral Observatory label: 1. Eval Labs confirms the selected run item can be tied to a persisted run and run item. 2. The label is written to `public.eval_behavioral_labels`. 3. The label is keyed to the run, run item, owner user, and reviewer user. 4. The label status is saved unless another supported status is explicitly used. 5. The UI can reload the saved label from Supabase. 6. Persisted Behavioral Observatory analytics can use the saved label. If the save fails, the label should not be treated as persisted. --- ## Saved / unsaved / error states Use these states plainly: - `derived`: suggestion only; nothing has been saved as a Behavioral Observatory label - `unsaved`: reviewer has changed fields but has not saved them - `saving`: save is in progress - `saved`: Supabase confirmed the label - `error`: save or load failed; do not count it as persisted --- ## Step-by-step usage 1. Open `/behavioral-observatory` if your role and assignment allow it. 2. Select a conversation from the labeling queue. 3. Read the Human message. 4. Read Lucia's response. 5. Notice whether the current values are derived suggestions or a saved label. 6. Set Intent. 7. Set Guest Affect. 8. Set Response Strategy. 9. Set Humanness from 1 to 7. 10. Add a short note only if it preserves useful behavioral evidence. 11. Save the label. 12. Confirm the saved state before treating it as persisted evidence. --- ## Entry-level employee rule Entry-level employees should use Behavioral Observatory only inside an approved assignment. They should: - read before clicking - keep labels literal - use the smallest truthful affect - choose the dominant response strategy - write short notes - ask when uncertain They should not: - invent new label categories - treat derived suggestions as truth - use Registry Diagnostics as a label workflow - claim a saved label means Lucia is globally approved --- ## Canon rule ```text Derived context helps the reviewer start. Saved Behavioral Observatory labels are intentional behavioral evidence. ```