Product Designer
Back to list
LLM EvalDesign OpsDashboard

2025 / Experience Evaluation Lead / Design Ops

LLM Experience Evaluation

A measurable evaluation framework for subjective AI experience quality.

-42%Review cycle
+55%Issue discovery
8Eval dimensions
PDF

Evaluation framework PDF

Scoring dimensions, sample structure, and review process.

Images

Dashboard screenshots

Issue distribution, version comparison, and trend tracking.

Prototype

Review workflow prototype

Cross-functional review and annotation paths.

01

Background

The team lacked a shared language for whether an AI experience was good, making prioritization across product, design, and ML difficult.

02

My role

  • Defined evaluation dimensions and scoring rules.
  • Designed evaluation dashboards, issue distribution, and version comparison flows.
  • Facilitated cross-functional evaluation workshops.
03

Design process

  • Clustered historical issues into actionable experience dimensions.
  • Connected model output, user intent, and product feedback into evaluation samples.
  • Calibrated scoring consistency with real use cases.
04

Outcome

  • Created an LLM experience scorecard, dashboard, and review workflow.
  • Enabled ML, product, and design to discuss quality through the same metrics.
05

Results

  • Design review cycles shortened by 42%.
  • Pre-launch issue discovery improved by 55%.
  • Reused across five AI product lines.