Attending yesterday's Halo 3 launch event at the Silicon Valley Microsoft campus -- and the large Halo3 tournament we helped moderate -- got me thinking about player ranking and matching systems. Without a well-designed ranking and matching system in place, you'll get horribly mismatched games, where one team demolishes the other by a margin of 3x or more. This is counterproductive for both teams:
An ideal system would always match players of approximately equal skill against each other. How do you know when you've achieved that goal? You look at the data. The record of wins, losses, and ties for each player provides the answer: if the matching and ranking system is working properly, every player or team will eventually plateau at a skill level where they are winning about 50% of the time.
There are dozens of possible ranking and rating systems we could use. Christopher Allen and Shannon Appelcline's post Collective Choice: Competitive Ranking Systems explores many of them. But perhaps the most famous ranking system is the Elo rating system, which originated in the chess world. It's a simple statistical model used to objectively rate player skill levels:
Here's how Elo works:
- A new player is assigned a default rating, say 1200.
- Two players compete, with one of three results: win, loss, tie.
- The two player's ratings are fed into an algorithm along with the end state of the game. A new rating for each player is returned.
Let's say two players, both rated 1200, play a game. Player 1 wins. Player 1 will now be rated 1205 and player 2 will be rated 1195.
Players that win a lot will achieve higher ratings. But the higher rated player also starts to see diminishing returns for defeating low ranked players. In order to increase his rank, he must defeat other higher ranked players. If a high ranked player loses to a low ranked player, he loses much more of his rating then he'd gain if he won the match.
Over time the game players will end up being rated based on their Elo skill level rather than other factors.
It's simpler than it seems. Let me put this in language us geeks can understand: you gain more XP from slaying a massive, mighty dragon than you do from beating up on a common wharf rat.
As you level up, it's entirely possible that you may become so very powerful that you'll have to move on from those massive dragons to even more intimidating opponents. Seems obvious enough to those of us who understand the statistical conventions of Dungeons and Dragons.
The Elo system, like the Dungeons and Dragons system, is proven to work. Although it was originally designed to rank chess players, it's used throughout the online and offline gaming worlds to rank and match players, in everything from Pokemon to Scrabble to World of Warcraft.
When it comes to matching players online, Blizzard Entertainment is arguably one of the most experienced companies on the planet. They pioneered online ranking systems back in 1996 with Diablo and battle.net, and extended that lead through Starcraft, Diablo II, Warcraft III, and the juggernaut that is World of Warcraft. Blizzard has collectively matched and ranked millions of players-- if anyone knows how to get this right, it's Blizzard.
Warcraft III is an excellent case study in player ranking and matching. Despite Blizzard's prior experience, and even though it uses the proven Elo ranking system, Warcraft III had a lot of problems matching players -- problems that took years for Blizzard to work out. I myself was an avid Warcraft III ranked player for about a year, so I've experienced this first hand. Warcraft III's automatic matching system was radically overhauled with Patch 1.15 in 2004, a full two years after the game was released. If you scrutinize the Blizzard support FAQs, you can read between the lines to see where the exploits and pathologies are:
1. Until a player has played a certain number of games, their performance will be highly variable, and accurately matching them is difficult.
The skill of players with low-level accounts can vary radically, from a player who has just begun playing multiplayer games of Warcraft III to a highly experienced player who has played hundreds of games but has just created a new account. Novice players would all too often find themselves in unfair matches against very skilled opponents. In an attempt to better match players of similar skill together into games, Battle.net's matchmaking system no longer uses a player's level as the only determining factor when creating games.
2. Unless players are consistently playing games, their rating should decay over time.
Players may lose XP if they do not play a minimum number of games per week, as listed on Chart 1. Failure to play the minimum game requirement during a week results in an XP penalty. For each game you don't play below the amount required for your level, you incur one loss [to an opponent of the same skill level].
3. Players cannot cherry-pick opponents or environments to improve their ranking. They must play randomly chosen opponents on neutral ground.
Battle.net's ladder system has been revamped for Warcraft III. This new system promotes active competition and maintains the ladder's integrity. The anonymous matchmaking system (AMM) prevents win trading and ensures that high-level players face opponents of similar skill. The AMM anonymously matches players based on skill level, and randomly selects maps based on the players' preferences. Players can choose their race, but most other game options are preset by the designers, resulting in a higher level of play.
If you're not into anonymous play, arranged team games are possible, but they are isolated on a special ladder of their own. This is done to prevent any kind of player collusion from tainting the main ladder results.
You can find many of the very same dangers echoed in the "practical issues" section of the Elo rating Wikipedia entry. Chess games and real-time strategy games are completely different things, but they have strikingly similar ranking and matching pathologies.
One particular weakness of the Elo system is the overly simplified assumption that the performance of every player will follow a normal distribution. This is a more severe problem than you might think, as it disproportionately affects new and beginning players. As Blizzard noted above, "Novice players would all too often find themselves in unfair matches against very skilled opponents." For many games, a large proportion of your audience will be novices. If these novice players experience several bad mismatches, they may never come back, and pretty soon you won't have a community of gamers to match anyone against.
Most modern implementations of the Elo system make adjustments to rectify this glaring flaw:
Subsequent statistical tests have shown that chess performance is almost certainly not normally distributed. Weaker players have significantly greater winning chances than Elo's model predicts. Therefore, both the USCF and FIDE have switched to formulas based on the logistic distribution. However, in deference to Elo's contribution, both organizations are still commonly said to use "the Elo system".
Perhaps the most modern variant of Elo is Microsoft's TrueSkill ranking system used on its Xbox Live online gaming network. The TrueSkill system has better provisions for scoring team games, whereas Elo is based on the single player per game model. But TrueSkill's main innovation is incorporating the uncertainty of a player's rating deeply into the mathematical model.
Rather than assuming a single fixed skill for each player, the system characterises its belief using a bell-curve belief distribution (also referred to as Gaussian) which is uniquely described by its mean μ (mu) ("peak point") and standard deviation σ (sigma) ("spread"). An exemplary belief is shown in the figure.
![]()
Note that the area under the skill belief distribution curve within a certain range corresponds to the belief that the player's skill will lie in that range. For example, the green area in the figure on the right is the belief that the player's skill is within level 15 and 20. As the system learns more about a player’s skill, σ has the tendency to become smaller, more tightly bracketing that player’s skill. Another way of thinking about the μ and σ values is to consider them as the "average player skill belief" and the "uncertainty" associated with that assessment of their skill.
It's more complex math than the relatively simple classic Elo ranking system, but TrueSkill should result in more accurate ranking and matching. Remember, that's why we're doing this: to achieve the best possible gameplay experience for all players, not just the elite players who have hundreds of games under their belt.
Our goal is to match players of all skill levels to their counterparts, and to let players reliably rise in rank until they reach a natural plateau of 50% win/loss ratio. We don't want blowouts. We want nail-biting, edge-of-your-seat cliffhangers every time, games that go down to the wire. We will know we've succeeded when each player feels like every game was the equivalent of slaying a massive, mighty dragon-- and not beating up on some puny wharf rats.
| [advertisement] TransferBigFiles.com allows you to send huge files (up to 1GB) to anyone without worrying about email attachment limits. Send via the Web site or download the DropZone utility for even more functionality. It’s fast, easy, and totally free! Transfer big files now. |
Posted by Jeff Atwood View blog reactions
« Steve McConnell in the Doghouse Can Your Team Pass The Elevator Test? »
I do not believe that most players want to achieve a 50% ratio of win/loss. Generalising from my own I guess that most players want to win having an opponent that challenges all skills.
Christoph on September 25, 2007 04:01 PMFIRST!
You know, the thrill wasn't quite what I imagined it to be...
Ideally (to me at least), all games should be the equivalent of beating up on puny wharf rats. When I play games, I want it to reinforce my impression that I am an invincible god.
Gwyn on September 25, 2007 04:04 PMVery interesting read!!!
Thank you for this sweet article.
It is very hard to achieve a perfect level of difficulty, especially in a game like halo with a small number of players.
However, in games like Battlefield 2/142 you can have up to 64 players which makes every game challenging in some way but dead easy in another.
One thing I disagree is with rule #2, that not playing the 'req'd amount of games/week will damage your points. That is just dumb. Well they obviously 'want' you to become addicted but I've gone through that more than enough times so it forces me to not care about my stats or to become addicted, in ways, to the game.
Cheers.
Jaan on September 25, 2007 04:07 PM> One thing I disagree is with rule #2, that not playing the 'req'd amount of games/week will damage your points.
Do you understand why this rule exists, though? If we didn't have it, someone could achieve a high ranking, and then protect that rank by barely playing at all. Such a player would only play when s/he identified opponents or situations that were extremely favorable matchups.
You have to force people to play a certain amount whether they want to or not. It doesn't have to be a lot, but some.
This is further explained in the wikipedia entry on Elo:
--
Some of the clash of agendas between game activity, and rating concerns is also seen on many servers online which have implemented the Elo system. For example, the higher rated players, being much more selective in who they play, results often in those players lurking around, just waiting for "overvalued" opponents to try and challenge. Such players because of rating concerns, may feel discouraged of course from playing any significantly lower rated players again for rating concerns. And so, this is one possible anti-activity/ anti-social aspect of the Elo rating system which needs to be understood. The agenda of points scoring can interfere with playing with abandon, and just for fun.
--
Great post. This reminds me of the old trick on BNet of people yanking their physical internet connection if they thought they were going to lose, so that their stats wouldn't be affected.
There were still some jokes about the BNet matching system though.
http://www.penny-arcade.com/comic/2002/07/26
You say that obliterating players who are much lower than you is generally boring, and yet a significant portion of MMORPG players do nothing but that. Of course, there will always be jerks in video games, but it really does reduce the fun factor.
EVE Online does have a mechanism to combat this kind of trouble. Generally the idea is that the lower you are, the more you should stay in "Secure" space. If someone attacks you in "Secure" space then the cops show up and convince them of the error of their ways.
kettch on September 25, 2007 04:18 PMTry to get your friends invited to a TrueSkill game. Not gonna happen. They have to be randomized. Anyways Trueskill isn't the end all. In Gears I was always thoroughly owned in probably 9/10 of the games I tried online. Shouldn't that never happen? Even if I suck I shouldn't do that bad should I? That and listening to people talk about how drunk/stoned they were drove me away.
Hey Now Jeff!
Who wants to beat up a warf rat anyway? Halo 3 is everywhere it seems. Chess ratings are interesting also say a begginger @ 1000ish & master @ 2400ish. This post makes me think more on how they are calculated.
Coding Horror Fan,
Catto
An even win/loss ratio sounds like a good goal on paper, but there are a lot of players that are going to want more/less of a challenge than those odds provide. If the system can accurately predict the probability of winning or losing a match, why not allow the player to select a desired difficulty and have the system attempt to match it?
A. Fountain on September 25, 2007 04:22 PMfyi - Most Xbox games don't show your TruSkill ranking for the game. However, Settler's of Catan on XBox live does have a screen for it.
I was amazed that it has so few 'levels', but with the explanation above I now see that it must be showing mu and sigma is still hidden.
Steve Steiner on September 25, 2007 04:34 PM> why not allow the player to select a desired difficulty and have the system attempt to match it?
That's a good point-- if you're always playing people of the *exact* same skill level, it will take forever to move up (or down) through the ranks.
To compensate for this, many implementations of Elo use a higher "K-value" for new players, which means they can win (or lose) more points for each game and arrive at their proper skill rating sooner. FIDE uses the following ranges:
K = 25 for a player new to the rating list until he has completed events with a total of at least 30 games.
K = 15 as long as a player's rating remains under 2400.
K = 10 once a player's published rating has reached 2400, and he has also completed events with a total of at least 30 games. Thereafter it remains permanently at 10.
This way, new players can see rapid changes in their rank, which is more satisfying.
Jeff Atwood on September 25, 2007 04:35 PMFor this skill system to work, you'll need each console to have a breathalizer attached, and an appropriate multiplier...
Ghettoimp on September 25, 2007 04:49 PMAnyone else heard about the rather serious bug in Excel 2007? Might be worth a column Jeff.
Kieran on September 25, 2007 05:09 PM>> One thing I disagree is with rule #2, that not playing the 'req'd
>> amount of games/week will damage your points.
> Do you understand why this rule exists, though?
I played on a system like that. But not often enough, so I was constantly underrated. It's a broken system. You can make a rating system that is an estimation of skill level, or a system of rewards. They don't mix.
I like what you wrote about Microsoft's TrueSkill. It sounds a lot like the Glicko system by Mark Glickman. If a player doesn't play very often, it should raise his RD, not lower his rating. If there is a public ranking of players, then after some period of inactivity, a player should simply disappear from it -- not be punished by being underrated.
(And to tell the truth, I actually took a certain perverse pleasure in being underrated.)
Anyway, thank you for posting about Elo! Maybe now more games will consider it. I've always wished that Quake had it (count every kill as a win), instead of dealing with skills imbalance by making a shallower game (simpler maps, weapons balance, etc.)
Another strategy for matching skills is handicapping, but to do that well you need a good rating system first.
It's a shame, though, that most games that are enlightened enough to have an unbroken Elo-style rating system don't bother to calculate your win expectancies for you.
Hey Jeff, I'm an avid Magic player, even competed on the Pro Tour and they use a form of the Elo system there. What's neat though in their case is the K-value. The K-value is effectively the highest number of ratings points that you can win/lose in a match in this tournament.
Various types of tournaments are given different K-values which does a couple of things. Small weekly or entry-level tourneys will be 8K, larger monthly events will be 16K and qualifiers and quarterly events will be 32K.
Highly-ranked players can attend small events with much weaker players and only risk a small amount of their ratings points. Likewise, beginners tend to attend small events to start, so they only lose small chunks. Bigger tourneys allow the beginners to have "breakout" events where a good winning-streak will give you a massive ratings boost. And highly-ranked players can actually make rating points against other highly-ranked players during the big events.
The biggest limitation tends to be at the high levels of a regional ladder. If I'm ranked say 3rd in my Province/State, there may be only a dozen local people that are really "worth playing". So I have to start going to national or Pro Tour events to meet up with evenly matched players. There are also different formats of play, so if some format of play is unpopular in your area then you can quickly end up at the top of the regional chain but nowhere in competition nationally. At some point, the better players have to attend national events and gain ratings points so that they can "bring them back" to the regional level.
They deal with <b>Problem #2: Decay</b> by simply removing those players who are "no longer active" (the period is a little long: 1 year), but the concept is there, if you don't play, we just dump you from the ladder. If you start playing again, you come back in at your old number.
However, I really like the concept of having a sigma and a deviation built-in to the model. Magic suffers from having Hard Ratings numbers. So if a player gets an invite for achieving a 1900 rating as of 3 weeks from now, then players have been known to not play any ranked tourneys for 3 weeks to ensure that they don't dip below 1900 and lose their bonus.
But this is actually a failing in the difference between chess and Magic. Chess does not involve luck and Magic does. Chess players don't have to worry about some beginner "getting lucky"; but with Magic, some days, you're just going to lose.
And this is where some type of sigma would be nice. Even if it were some type of "momentum buffer" so that you you didn't get raked. I've gone 7-1 on the day and lost rating points. I mean clearly everyone was ranked well beneath me, but even an average player can "get lucky" and beat an excellent player (which even he admitted was the case), so at some point the system should have a check for "got unlucky" or "got lucky" versus "this player is under/over-rated".
It's nice to know that someone has a sigma/distributed approach, I'd love to see Magic take this on.
Gates VP on September 25, 2007 05:48 PMVideo games are positive reinforcement systems. Players want to be told how great they are, all the time. You have to have progress, score, bonuses, wins, captures, or a higher position on the ranking ladder. 50% win rates are less fulfilling than the long string of wins that came before it to reach one's current ranking.
There's significant problems with high-rated players sniping "overvalued" opponents, but that's minimized by randomly selected opponents. So instead, highly rated players will often get secondary accounts for "having fun". They maintain a lower rating on those other accounts so they can have matches that they are much more likely to win (and therefore enjoy). This practice is so common that it has its own jargon, "smurfing". A top rated player's alternate account is his smurf.
If it's at all feasible to challenge specific players, it allows a sinister way of hacking the system. A highly rated player can stay dominant by using their smurf to challenge an equally highly rated competitor. A loss doesn't matter, because it's on the low-rated smurf account, but a win significantly damages the competitor's rating because it was a lower-rated account beating a higher rated one.
So random matchups are crucial to minimize the negatives of smurfing. But it can't be purely random. There's simply no point in matching a top ten player against someone who just started playing. So the matching systems limit the range of ratings within which it will find an opponent.
Again, this has a negative side effect. Unfortunately, the highest rated players become a rarefied breed. They're far off on the right edge of the bell curve, and aren't as likely to be online at the same time. A random matching system that limits the match range effectively allows the top players to arrange matches to their benefit.
Each time the gaming population perceives unfairness in the system, their opinion of the meaning of the ratings goes down, and they reject it.
Bob Whiteman on September 25, 2007 05:52 PMGreat Post as usual Jeff and the first one I've felt like commenting on - I like A.Fountain's idea of allowing a user to select a difficulty level but with one caveat - the difficulty level matchup shouldn't lead to someone who selected an easy difficulty playing a lower skilled player who did not select a hard difficulty. I like the idea that if a novice player is really an advanced player but doesn't yet have stats on their side that they can challenge some higher ranked players and rise accordingly (and similarly pull down the rankings of the higher ranked players who have lost to a lower ranked player) rather than sit around grinding wharf rats up the ladder.
Graphain on September 25, 2007 06:02 PMWhy have people lose rank if they can't play all the time? This is punishing to the casual player. Take a look at Guild Wars, they have a system working that allows casual and hardcore gamers to play together.
Akira on September 25, 2007 06:26 PMAkira, imagine in the chess world if you could never challenge the best GrandMaster in the world (I'm not really sure on chess terms so bear with me, cringe if you will) - they would remain the undefeated champion not because they were necessarily better than everyone else but simply because they beat everyone at one stage and then refused to play anymore. The idea of decaying rank is that in the online world players can't always be available for challenges - the system instead allows players to play when they choose provided they play regularly to prove they are still worthy of their rank.
Graphain on September 25, 2007 06:59 PMI don't think decaying rating is due to any of the things described above. It's to keep players playing. Note that Supreme Commander (RTSG) doesn't have ratings decay, but MMOs do. They need to keep players hooked, keep them paying that monthly subscription fee.
Bob Whiteman on September 25, 2007 07:04 PM"Conan, what is best in life?"
"To crush your enemies, to see them driven before you, and to hear the lamentations of the women."
Jarrod on September 25, 2007 07:07 PMBob you are probably right but I was trying not to be cynical :P
If we imagine an ideal ranking system though I still think it would incorporate some method to stop players "maintaining through stagnation".
Ideally it would also address the issue of "smurfing", possibly by analysing what differentiates these players from others with similar rankings and matching accordingly - for instance in an MMO they might have much greater gear (<a href="http://en.wikipedia.org/wiki/Twinking">"twinking"</a>), perhaps higher percentages of headshots, or just a faster rate of change of ranking (i.e. climbing the ladder faster).
I'm curious to see what others consider to be fundamental elements of a "perfect" ranking system.
Graphain on September 25, 2007 07:17 PMTo me, an enjoyable game (online or off) is one where I am _nearly_ beaten, but manage to pull through. Of course, this kind of game is impossible to get every time with a ranking system, but a good one should make these kinds of games a more common occurance.
Bernard on September 25, 2007 09:55 PMI agree. I know people I play with, myself included, don't actually mind losing a close game because we can often see where we could have pulled it off and learn from our mistakes which I think is something Jeff highlighted.
Graphain on September 25, 2007 10:16 PMI've always wanted some sort of ranking for games when finding servers. Something like the range of their skill. That way you can find a server that's most appropriate for you. As this post outlines, there are some difficulties though.
[ICR] on September 25, 2007 11:52 PMI just wanted to add that all the math stuff seems to be pretty basic probability theory. I also imagine the rankings are also based on the Bayes theorem, and derivatives of it.
So if anyone is interested in understanding more than they should look around the net for some basic explanation of probability theory. The basics will also help you understand other topics such as spam filtering.
I often play go online (an asian game equivalent to chess for us) on the KGS server.
You can find an explanation about their ranking system here :
http://www.gokgs.com/help/rank.html
http://www.gokgs.com/help/rmath.html
Basically, they use a system similar to the ELO system, but they improved it by taking previous games into account when calculating your rank. This way, if you play against a strong player who just created a new account, this player will rapidly climb the ladder, and your loss against him will be considered as a loss against a strong player.
It also avoids loosing all your rank when you are in a bad day, since you previous games are here to couterbalance your bad performances.
GrandMoche on September 26, 2007 02:12 AMI agree completely with the ideal of a 50% win/loss ratio. Stomping all over people in an unfair match is about as much fun as being stomped on in an unfair match.
Any matching system also needs to take environmental factors into consideration - there's no point matching equally skilled players if the network introduces a massive advantage to one side. I found Gears of War particularly bad for this (I'm in the UK).
Of course, Gears online is horribly broken anyway as someone usually quits early (often within a few seconds of starting) which unbalances the game entirely.
Also, I'm mainly interested in the team-play aspects of online games. I like a good blast, but it's nice to meet people too. Anonymous matching rips the soul right out of Xbox online gaming as you rarely meet the same player twice. I agree that there should be a separate 'team' ranking system, but would like to see equal effort spent on ensuring it works. Gears ignored it completely :-(
Halo 3 out today over here (yay!). I'm hoping it still has the 'party' online mode that Halo 2 had, and a ranking system to go with it.
robaker on September 26, 2007 02:16 AMI dislike those rating systems. I am a casual player, don't ever play MMO's or RTS's or FPS's. I like simple browser-based flash/java games that can be played in 10 minutes.
I loved "Dice Wars" since the first moment I tried it. It just matches my gamer profile. And, like almost everyone else, I thought it would be a great idea to turn that game into multiplayer. And then "K-Dice" appeared. I celebrated it and started playing. After less than a week, I was fed up with that game and returned to dice wars.
The difference was: I could play dice wars for fun.
K-dice has an ELO-based ranking system. Which at first would sound as a good idea. The problem was, it changed the game into a completely different one. I was not playing a simple strategy game of stacking dice and conquering territories (which was the goal of Dice Wars). I was playing a game of scoring points in some ranking system, and the actual dice games were the means to achieve those points. I wouldn't just click "start" and play any more, because every game counted. I started considering the chances to win a given game before clicking "accept". I was really pissed off when I started a game in a bad position which would mean loosing like 500 points in a row.
I was not having fun any more, not the sort of fun I have when I solve a game of minesweeper.
The point I am trying to make is that including a ranking system transforms the game into a different one. Not necessarily a bad one, but not the game I was looking for at the first moment. I am a casual player, so I want games that I can play in 5 minutes, not during a month of effort and time investment.
Reminds me of what Joel Spolsky says about rewarding developers with monetary prizes, paying them for amount of lines written or bugs solved. They change their focus from "developing good software" to "maximizing monetary input".
Jan on September 26, 2007 02:54 AMI used to love Gears, before Christmas 2006 when every 12 year old on the planet got a copy. The game went from a tactical squad game where team members spoke to each other and coordinated their actions to a curse filled screeching baby voiced cacophony.
Your next blog should be on how children are ruining gaming...
Kai Tain on September 26, 2007 03:05 AMthe bayesian approach seems to be one that will
converge quite quickly to optimal values for
everyone playing. team play could be handled
as multivariate gaussians, with the covariance
built up out of the individual player gaussians
on the fly. if teams wanted to be ranked and
play together frequently, they could have their
own 1-d gaussian that followed them around.
on decay: decay is critical. one way that decay
works well is along with a gaussian description
of rank -- the uncertainty just creeps up over
time, although the center doesn't need to move.
one thing that rating systems really ought to do
better: display. either decide that people don't
want to see the tedious details of how their ranking
is calculated, or lay it all out for them.
as the dice guy mentioned earlier, knowing your own
ranking can make you consistently worried about losing
ranking points. i think that a good ranking system
might just pair players together according to how much
risk they wanted to take (i.e. how closely they'd like
the expected result to be a coinflip), but not give them
any (displayed) number value whatsoever.
s.
steve uurtamo on September 26, 2007 04:41 AM"I used to love Gears, before Christmas 2006 when every 12 year old on the planet got a copy. The game went from a tactical squad game where team members spoke to each other and coordinated their actions to a curse filled screeching baby voiced cacophony.
Your next blog should be on how children are ruining gaming..."
Kai Tain on September 26, 2007 03:05 AM
Preach on, Brother! It's because of those obnoxious 12 year olds, that my friends and I don't play most games online anymore. They've completely ruined the online gaming experience. So to deal with it, I simply don't go there anymore. Now, we play system linked games or play in LAN parties where we control who can play. Don't invite the 12 year olds, and the game is still fun.
Spencer on September 26, 2007 05:57 AMI appreciate having my rank slowly decay with inactivity for my own personal well being. If I jumped back into Halo 2 online (not like anybody will be playing it in a month) I would get creamed. Having stopped playing just under two years ago people who are "evenly matched" based on where I was ranked when I left are no longer even matches. My skill level is probably half (or less) of the peak ranking I achieved.
John C on September 26, 2007 06:24 AMAs Alan mentions above (and is also mentioned in the cited article "Collective Choice: Competitive Ranking Systems"), the TrueSkill rating system is based on the Glicko rating system. The innovation of incorporating uncertainty into the rating system did not come from Microsoft with TrueSkill. It was already present in the Glicko system and has been used for years on the Free Internet Chess Server.
anon on September 26, 2007 06:33 AMYes, John C above just voiced what I was going to say. Rank decay is not necessarily about keeping players playing -- it can also reflect a decay in actual skill. In skill level estimation ranking systems, that is fundamental -- one presumes that lacking practice, a player will tipically see his performance decay. I'd like to see both uncertainty increase *and* the center slowly drop, however.
I'm a big fan of ranking systems and looking forward to see improvements in this area. The configurable difficulty option mentioned above also seems like a great idea -- I like playing games where the odds aren't in my favor.
Cheers!
Shade on September 26, 2007 06:46 AM<a href="http://fibs.org/">FIBS</a> uses something like this.
Mike on September 26, 2007 06:48 AMOne problem with your D&D analogy is that you don't lose XP in D&D unless something powerful drains some from you. AFAIK there's no way to lose XP from a botched encounter with a lesser creature like a kobold.
Mike on September 26, 2007 06:51 AMTo me, there should be a difference between "Difficulty Ranking" and "Player Ranking." Yes, I like being able to play people who are approximately equal in skill (where the difficulty ranking comes into play), but I ultimately don't care how I rank as a player (i.e. How many games I've played, the number of kills/wins/losses, etc).
Does anyone here actually remember playing Diablo II online? After raising your character to Nightmare difficulty, going back through and playing on easy didn't lower your character's level...you just didn't get any experience for whooping up on sewer rats. Which is where the distinction between your difficulty ranking and your (publicly viewable) player ranking would come in handy...
"Wow. You just totally pwned me. You're a level 65 Pally? Sweet. And you've been at that level for...(checks stats)...three years. Although you've played over 5000 matches. On easy.
Get a real hobby, bro."
James on September 26, 2007 06:51 AM"In order to increase his rank, he must defeat other higher ranked players. If a high ranked player loses to a low ranked player, he loses much more of his rating *THAN* he'd gain if he won the match."
If you don't know when to use them: <a href="http://www.sparknotes.com/writing/style/topic_172.html">http://www.sparknotes.com/writing/style/topic_172.html</a>
Grammar Police on September 26, 2007 07:38 AMBack in my Unreal Tournament days, there wasn't a system or anything on the servers, but most of the good server farms had Beginner/Intermediate/Advanced next to game room for you to hop into. That way the newbies or extremely rusty could work their way into the system, and when you felt bold enough, you walked into a "gladiator" arena with the experts.
I don't have XBox Live, but does anybody know if they have an "open" room as well. There are definitely two areas to play to. The non-competitive gamers that just want a good match, and the competitive folks that want a legitimate way to say that they're the best.
Sean Patterson on September 26, 2007 08:35 AMGrammar Police, that section is quoted from someone else. That's why it's in a blockquote tag.
http://jobemakar.blogspot.com/2007/05/why-elo-rating-system-rocks.html
So you might want to post a comment on that blog.. then.
> AFAIK there's no way to lose XP from a botched encounter with a lesser creature like a kobold
I was thinking that myself while driving in to work this morning. It's close, but not exactly the same. The monsters aren't your peers, they are disposable non-player characters. They can't gain or lose rank because they have no persistence.
> The point I am trying to make is that including a ranking system transforms the game into a different one.
Sure. All this ranking and matching talk assumes that players want to play competitively against each other. There are plenty of other valid forms of play-- cooperative, solo, or unranked.
Jeff Atwood on September 26, 2007 10:13 AMShouldn't a player's skill level be tested to determine a level of expertise? It would be natural for me that any type of game would have certain skills that the player would have to master to gain a rating.
This is similiar to fitness test. How many pushups can you do? How fast can you cover a mile. Could games build in tests for players to determine a skill level? Probably. These could be administered to any player to determine a skill level for the player. Similiar thing with golf for handicapping.
I think you could do the same for any game if some thought went into what kind(s) of tests you would put the players through.
Jon Raynor on September 26, 2007 11:17 AMI think implementing rating decay to match a user's decay for not playing is a flawed reason. A player who doesn't play ranked matches may be playing unranked matches, or playing ranked matches on another account. Effects like getting out of practice should be naturally reflected in the resulting wins and losses, not anticipated by the scoring system.
Bob Whiteman on September 26, 2007 11:20 AMI'm surprised no one's mentioned the automatic part of the BCS (ranking of college American football teams) and Google's web page ranking (how Google [and everyone else] orders search results).
Google rank is an example of a random walk probability model, and you can do either point or full Bayesian estimates (with or without priors, which amount to damping in the graph algorithms). Bayesian algorithms (typically sampling) tend to be more robust than point estimates (typically EM or gradient descent), but can be more expensive to compute. The general class of algorithms were developed in social network analysis.
To reduce game rankings to Google rank, represent a player A as a web page that has one link to every player B beaten by player A (count multiple defeats as multiple links). This produces a graph where the link weight from player A to B is the number of times A was beaten by B. Then the Google rank of a player is their rank.
There's a good explanation of both the probability model and algorithms behind these kinds of rankings on the Wikipedia:
http://en.wikipedia.org/wiki/PageRank
http://en.wikipedia.org/wiki/Social_network
Bob Carpenter on September 26, 2007 11:27 AM>Attending yesterday's Halo 3 launch...
Don't you think it's appropriate that the Halo 3 website be designed and developed using Silverlight?
Jeff W. on September 26, 2007 11:37 PMI've started playing chess on FICS this month and I've read the Wikipedia entry on Elo ratings a couple of weeks ago. Funny you mentioned it.
The problem with the rating system is of course that people tend to take these ratings seriously. As if they have some intrinsic meaning. So, they choose players to improve their rating.
This morning, I was musing over your blogpost, and thought; Why not do a binary search for your opponent? Suppose the lowest level player has 800 points, and the highest has 2200 points. Now, an entry level player should play against a player close to (2200+800)/2 = 1500 points. If he wins, he should play against (2200+1500)/2 = 1850. If he loses, he should play against (800+1500)/2 = 1150. The lower and upper bounds should be determined in subsequent rounds, by the highest he loses from and the lowest he wins from.
My 0,02€
Analogy to Dungeons and Dragons has also the role playing side:
I agree on the idea that its fun to beat monsters when you are only nearly beaten. So I would learn to advance faster when I could choose the monsters I want to take out. I like the learning and freedom of choice. If I was tired, I could match against easier monsters but make still some progress. But if the game chose the monsters for me, I would be forced to take out those, even if I could do better or even loose.
In the real world there are no match making systems. For a role playing game a match system would be ok only for arranged battle arenas. Though that way a skillful player could slay all the others in a wilderness. That would require that skills don't affect too much compared to player's character level.
And what is wrong with unbalance? Certain creatures are more powerful than others, but by teaming up you can beat stronger characters. Why not let the player be a human, hobit, or a dragon?.
Don on September 27, 2007 07:51 AMI used to play a tactical realism shooter and some of the most fun I had (and the times I learned the most) was when I was effectively cannon fodder for the best players. (No, I'm not a masochist, I actually became one of the best players after a time -- because I kept playing with, and learning from, the best).
It seems to me that a lot of this ranking crap would specifically prevent a motivated newbie from learning from the best, and thereby short-circuiting the tedious learning process.
And sometimes it's fun to steamroll a bunch of lesser players, especially when they loudly extol their skills. :^)
I'm with Sean Patterson -- allow for the choice of skill matching, but don't require it.
Of course, my favourite game of all time was a community mod by and for the lunatic fringe and I can't stand most commercial games, so I'm sure the unwashed masses actually LIKE this ranking...
yipyip on September 27, 2007 10:29 AMInterestingly, different cultures prefer to play games in different ways.
For example the Japanese love the thrill of completing a game, even if it means the gameplay is fairly easy so that they can achieve that goal.
Any system would have to take this into account.
Fleejay on October 3, 2007 11:46 AMSomeone noticed KGS ranking system here. It's poor and I think, that rating systems on other Go servers, such as IGS or Cyberoro are better.
Alex
Wow. I wish that I knew how to play this game - it's so cool! My older sister doesn't like it, though. She calls it the "game of marital discord" because her husband absolutely LOVES the game and goes out to play it with some friends on Fridays. He doesn't get back until really late, or so I'm told. Ta Ta!
Kim on February 28, 2008 09:31 AMu all r fucking asshole
brandon on May 3, 2008 07:49 AM| Content (c) 2008 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved. |