6 components

Prompts / Eval

Prompt engineering and offline evaluation — saved prompts, variable forms, eval runs, datasets, A/B compare.

Saved-prompt card with name, description, version, model, variable count, and tags.

Auto-form for `{{variables}}` extracted from a prompt template, with live preview.

Single eval metric with delta vs baseline and an optional inline sparkline.

Eval-run summary — status, prompt, model, dataset, progress, headline metrics.

Stacked table for an eval dataset — input / expected / actual / score with bucket filters.

Side-by-side comparison of two prompts (or two models) with metric deltas and sample outputs.