Eval Labs Canon

# Relationship to OpenAI Evals > [!summary] > Eval Labs may borrow ideas from OpenAI-style eval frameworks, but it remains the Lucia-native evaluation product. --- ## Position Eval Labs should not be replaced by a generic LLM eval framework. Lucia's most important qualities require human judgment and product-specific review. --- ## What external eval frameworks are good for External eval frameworks can help with: - structured datasets - automated graders - model comparisons - JSONL exports - benchmark-style checks - repeatable scoring pipelines --- ## What they do not solve for Lucia They do not automatically answer: - Did Lucia reduce overwhelm? - Did Lucia choose the right emotional posture? - Did Lucia avoid overclaiming? - Did Lucia preserve trust? - Did Lucia sound like Lucia? - Did Lucia reduce operator scanning burden? --- ## Future direction Eval Labs may eventually export OpenAI-compatible eval datasets. Potential mapping: ```text Custom Prompt Suite → dataset Lucia response → model output Human ratings → labels Review notes → qualitative evidence Run metadata → provenance ``` --- ## Principle Eval Labs is the source of truth. OpenAI eval concepts can become adapters. Do not invert that relationship.