MCP tool selection · measured against named rivals

Does Claude pick your tool, or a rival’s?

ToolRank measures your MCP server’s pick-rate against rivals on Claude, finds the description smells losing you calls, and proposes rewrites verified to lift it.

Scan my server Find my server

Free loss card first — no signup to look.

Live leaderboard · Claude Opus 4.x · refreshed daily

Payments·Claude Opus 4.x

live

#serverpick-rateΔstatus

1
Stripestripe/agent-toolkit
72%
+4
GOOD
2
lemonpayindie-dev/lemonpay-mcp
58%
+9
GOOD
3
Squaresquare/square-mcp
49%
−3
GOOD
4
PayPalpaypal/agent-toolkit
41%
−31
LOSES
5
Adyenadyen/adyen-mcp
38%
−6
LOSES

See the full leaderboard →

10servers ranked·2categories·Claude Opus 4.x·refreshed daily

10 servers ranked · 2 categories

Claude Opus 4.x · refreshed daily

Your dashboard shows the calls you won. Never the ones you lost.

You ship the MCP server, watch usage climb, and assume the description is fine. But 97.1% of real tool descriptions carry at least one defect that misroutes selection, and 56% never state what the tool is for. The agent picks the rival. You see a flat usage chart and never learn why.

97.1%of 856 real tools carry a description smellarXiv 2602.14878

56%never state their purpose

1 in 6DIY rewrites make pick-rate worse, with no signal that they did

What $49 actually runs.

01Real tasks.We generate 20–25 realistic jobs in your category and present your tool beside its named rivals.

02Every pick recorded.Claude chooses. We log selections/tasks — your pick-rate, and exactly which tasks you lost.

03Verified rewrites.We propose fixes, re-simulate each one, and report the measured before/after lift. Regressions get rejected, not shipped.

You get a shareable report: task-by-task breakdown, the full smell audit, and rewrites with numbers you can trust.

Know your pick-rate

One number tells you what share of category tasks route to you on Claude.

pick-rate

41%vs rivals avg 72%

41% of category tasks → you

Beat your rival

See the head-to-head: tasks you won, tasks a named rival took.

head-to-head

vsstripe/agent-toolkit

7/25 tasks won · 18 lost

Verified rewrites

Before → after on the real description, with the measured lift attached.

create_paymentNo stated purpose

−Creates a payment.

+Initiates a payment intent for one-time or recurring charges…

41% → 55%+14 lift · verified

The evidence

“Unmeasured description rewrites regress performance 1 in 6.”

arXiv 2602.14878, 856 tools across 103 real MCP servers

97.1% of tool descriptions carry at least one quality smell; 56% never state the tool’s purpose — the field agents weight most.

paypal/agent-toolkit41%−31

loses tostripe/agent-toolkit72%

top smell:No stated purpose

See why →

Simple, honest pricing

Run a scan →

Free

FREE

The public leaderboard and your loss card.

What you measure

rank

top finding

pick-rate

findings depth

rewrite lift

Browse leaderboard

Scan

SCAN

$49one-time

One competitive scan, full report.

What you measure

pick-rate

all findings

rewrite lift

rival gap

tool coverage

Executes shell commands on the host

Runs sandboxed shell commands and returns stdout

Scan my server

Monitor

SCAN

$39/mo

Every model release reshuffles the board. Stay on it.

What you measure

pick-rate

drift vs prev

release-by-release

alert threshold

history depth

Executes shell commands on the host

Runs sandboxed shell commands and returns stdout

Start monitoring

white-hat rewrites only · public correction policy · cancel anytime

Pay once to fix it. Subscribe to keep it fixed as models change.

Questions, answered

Find your row. See what you’re losing.

The leaderboard is public and free. Your pick-rate and two findings cost nothing. The full scan costs $49.

See the leaderboard Scan my server →