Prompts / Eval
ABCompare
Side-by-side comparison of two prompts (or two models) with metric deltas and sample outputs.
Preview
Input
Summarise PR #42 — refactor: split chat barrel into per-component subpaths.
Asummarise-pr v2.1
baseline
model: claude-sonnet-4-5
Bsummarise-pr v2.2
challenger
model: claude-sonnet-4-5
Metric comparison
- accuracy0.780.8420.062vs B
- latency p95240ms175ms65.0vs B
- cost0.014$0.0124$0.0016vs B
Sample output
A
- Split chat barrel - Per-component subpaths - Tree-shaking improvement (slight)
B
- Split chat barrel - Per-component subpaths - Better tree-shaking, smaller bundles
Installation
pnpm add nyxis-ui
Usage
import { ABCompare, type ABCompareSide } from 'nyxis-ui';
const a: ABCompareSide = {
label: 'summarise-pr v2.1',
modelId: 'claude-sonnet-4-5',
metrics: [
{ name: 'accuracy', value: 0.78, goodDirection: 'up', precision: 3 },
{ name: 'latency p95', value: 240, unit: 'ms', goodDirection: 'down' },
],
sample: '- bullet 1\n- bullet 2',
};
const b: ABCompareSide = {
label: 'summarise-pr v2.2',
modelId: 'claude-sonnet-4-5',
metrics: [
{ name: 'accuracy', value: 0.842, goodDirection: 'up', precision: 3 },
{ name: 'latency p95', value: 175, unit: 'ms', goodDirection: 'down' },
],
sample: '- bullet 1\n- bullet 2 (sharper)',
};
<ABCompare input="Summarise PR #42…" a={a} b={b} />;
Anatomy
- Optional
inputblock at the top — the shared test case both sides ran on. - Two side panels (A in primary tone, B in violet) with label, sublabel, model.
- Metric comparison table: each metric row shows
A → B (delta)with the tone telling which side won. - Sample outputs side-by-side at the bottom.
baselineSidecontrols which side the deltas are computed against (defaulta).