# Scoring Dimensions
> [!summary]
> Eval Labs scores Lucia across dimensions that matter to both operational quality and emotional containment.
---
## Current dimensions
Eval Labs currently captures these rating dimensions:
```text
tone
usefulness
calming
naturalness
trust
```
It also captures:
```text
keepTalking
suggestedKeepTalking
status
suggestedStatus
priority
suggestedPriority
feltOff
owner
```
---
## Tone
Score whether the language fits the moment.
Strong tone is:
- warm
- clear
- direct
- composed
- human
Weak tone is:
- cold
- robotic
- mushy
- fake cheerful
- corporate sludge
---
## Usefulness
Score whether the response helped the user act or understand.
A useful response reduces work.
An unhelpful response creates new work.
---
## Calming
Score whether the response reduces pressure.
Calming does not mean soft.
Calming means the user feels more oriented after reading it.
---
## Naturalness
Score whether the response sounds like a real trusted operator would speak.
Natural does not mean casual fluff.
Natural means the phrasing feels human and appropriate.
---
## Trust
Score whether the response increases or preserves confidence in Lucia.
Trust is damaged by:
- overclaiming
- vague certainty
- missing obvious context
- wrong tone
- false reassurance
- capability menus in emotional moments
---
## Keep talking
This answers:
```text
Would a user keep talking to Lucia after this response?
```
Use this honestly.
If a response makes Lucia feel like a wall, mark it down.
---
## Felt off
Use this field for specific notes.
Good:
```text
Lucia detected operational stress but responded with a generic capability redirect instead of containment.
```
Bad:
```text
Weird.
```
---
## Semantic confidence sliders
The five scoring dimensions use stepped 1–10 semantic sliders.
The slider is not decoration. It is part of the evaluation interface.
A low score should feel like concern.
A middle score should feel mixed or uncertain.
A high score should feel confident.
This reduces the amount of mental translation required from reviewers.
The app may show suggested 1–10 values before the reviewer chooses a score. A visible suggestion is not the saved score until the reviewer accepts or overrides the review and saves.
---
## Human Guidance Evaluation
Eval Labs also captures 1–5 Human Guidance Evaluation scores:
```text
emotionalValidation
cognitiveUnderstanding
actionability
toneAppropriateness
authenticity
notes
```
The Review Queue can show suggested 1–5 guidance scores.
The displayed guidance state uses the mean score and treats any score of 2 or below as a hard-fail signal.
Warmth and intelligence are not separate dimensions. In the current product, they are expressed through `tone`, `calming`, `naturalness`, `trust`, `usefulness`, `cognitiveUnderstanding`, `actionability`, and `authenticity`.
---
## Quick Review fields
In addition to scoring dimensions, Eval Labs captures:
```text
Did Lucia understand what was needed?
Did Lucia give the right next move?
Did Lucia make the situation feel calmer?
Did anything feel risky, confusing, or wrong?
Should a senior reviewer look at this?
Could this teach Lucia something reusable?
```
These fields are not replacements for senior adjudication. They are the employee signal layer.
Suggested Quick Review selections are allowed. They should reduce reviewer burden, not replace reviewer judgment.