Model Benchmark

Official Pro

Compare LLM models on your specific use cases with cost analysis

15 installs

What it does

Run standardized benchmarks against multiple LLM providers on YOUR test cases, not generic benchmarks.

Why you need it

Generic benchmarks don't tell you which model is best for your use case. This skill tests models against your actual prompts with your actual data.

Key capabilities

- Custom test suites from your real prompts - Latency, cost, accuracy, and token efficiency metrics - Side-by-side output comparison - CSV and interactive HTML reports

Latest: v1.0.0

Initial release

Apr 13, 2026

Related Skills

Pro

AI Code Review

Deep code review agent — catches bugs, security issues, and anti-patterns

No ratings yet

42 installs

#ai#code-review#security+1

Free

Prompt Engineer

Build, test, and optimize prompts with version control

Meeting Notes Agent

Extract action items, decisions, and summaries from meeting transcripts

No ratings yet

22 installs

#meetings#notes#action-items+2

infrastructure

Free

Terraform Planner

Terraform plan analysis with cost estimation and risk scoring

No ratings yet

18 installs

#terraform#iac#cost+1

Model Benchmark

What it does

Why you need it

Key capabilities

Tags

Latest: v1.0.0

Related Skills

AI Code Review

Prompt Engineer

Meeting Notes Agent

Terraform Planner