How It Works

The rating system, fairness measures, and quality metrics explained

The Rating System

How we calculate rankings from your votes.

Glicko-2: Three Numbers Per Source

Rating
The skill estimate. Starts at 1500. Higher is better.
Deviation
How confident we are. Lower means more certain.
Volatility
How stable the performance is over time.
What is Glicko-2 and why do you use it?
Glicko-2 is a rating system created by Professor Mark Glickman at Boston University. It's what powers the rankings on sites like Lichess (online chess) and is used by gaming platforms worldwide. We chose it because it's honestly just better than simpler alternatives. When you vote, we don't just count "wins" — we track how confident we are in each rating. A source with 500 comparisons has a much more reliable rating than one with just 10.
How is this different from just counting wins?
Think about it this way: if Wikipedia has won 3 out of 3 comparisons, is it really better than Britannica at 450 out of 500? Simple win rates would say yes, but that's obviously wrong. Glicko-2 solves three big problems: Sample size matters. More data = more confidence. We actually track this as "Rating Deviation" — a number that shrinks as we get more votes. Who you beat matters. Winning against a top-rated source is worth more than winning against a low-rated one. Recent results matter more. If there are no new votes for a while, the confidence in the rating naturally decreases.
What do the rating numbers actually mean?
Every source starts at 1500. Here's a rough guide: • 1500 = Average (the starting point) • 1550-1600 = Somewhat above average • 1600-1700 = Clearly strong • 1700+ = Dominant The "Rating Deviation" number tells you how much to trust the rating. Under 100 is pretty solid. Over 200 means we're still figuring it out. For the math nerds: a 200-point gap means the higher source wins about 75% of the time. A 400-point gap means they win 90% of the time.
Can people cheat or manipulate the ratings?
We've thought about this. Here's why it's hard to game: You can't pick favorites. The comparisons are blind — you literally don't know which source is which until after you vote. Spam voting has diminishing returns. The first 50 votes shape a rating way more than votes 500-550. So flooding with votes doesn't help much. Patterns get noticed. If someone always votes the same way or votes suspiciously fast, that data can be flagged. Time works against manipulation. Ratings naturally become less certain over time if there aren't new votes. So any manipulation campaign fades away.

Ready to help shape the rankings?

Start Voting
W
WikiArena
Compare knowledge sources with Glicko-2 ratings