LLM EvalDesign OpsDashboard
2025 / Experience Evaluation Lead / Design Ops
LLM Experience Evaluation
A measurable evaluation framework for subjective AI experience quality.
-42%Review cycle
+55%Issue discovery
8Eval dimensions
PDF
Evaluation framework PDF
Scoring dimensions, sample structure, and review process.
Images
Dashboard screenshots
Issue distribution, version comparison, and trend tracking.
Prototype
Review workflow prototype
Cross-functional review and annotation paths.
01
Background
The team lacked a shared language for whether an AI experience was good, making prioritization across product, design, and ML difficult.
02
My role
- Defined evaluation dimensions and scoring rules.
- Designed evaluation dashboards, issue distribution, and version comparison flows.
- Facilitated cross-functional evaluation workshops.
03
Design process
- Clustered historical issues into actionable experience dimensions.
- Connected model output, user intent, and product feedback into evaluation samples.
- Calibrated scoring consistency with real use cases.
04
Outcome
- Created an LLM experience scorecard, dashboard, and review workflow.
- Enabled ML, product, and design to discuss quality through the same metrics.
05
Results
- Design review cycles shortened by 42%.
- Pre-launch issue discovery improved by 55%.
- Reused across five AI product lines.