|Trials, Triumphs & Trivialities #96:
In Sickness & In Health
November 14, 2002 - I've been sick for almost five days now, every since a co-worker brought his Black Plague to last Thursday's strategy game night at my house. It's been annoying for all the normal reasons sickness is: inability to breathe; lack of sleep; onset of weird hallucinations. Actually, the weird hallucinations are kind of fun sometimes.
But, even more, it's been very frustrating because I haven't felt up to fixing some bugs that cropped up in diplomacy code that I recently introduced to Hegemony. They're not huge bugs, granted, but they're annoying, and I know that every day that I'm too muzzy-headed to put on my programmer's hat and pick through the code, those bugs will appear a few more times, and the Hegemony players will be inconvenienced by them.
And that really points to an idea at the core of our online gaming medium, one that we need to constantly be aware of and always plan for: The games are always up and the players are always there. It doesn't matter if I'm sick or on vacation or even if it's Christmas. If you're producing a title for the New World Paradigm of online games, you need to make sure you understand what 24x7 really means, and be prepared for it.
This week, in my still somewhat muzzy-headed state, I want to talk a little about what that means, as it relates to hardware, administrators, coders, and even players.
Always Available Hardware
Being able to provide a game that will (hopefully) be played 24x7 starts off with the hardware. Computers, power supplies, hard drives, and all the rest usually aren't thought about that much by players, but without a good, stable machine with plenty of backup parts their game could be down for hours or days in case of failure.
Over in fields that actually pay for 24x7 reliability, it's all measured in "9"s. A computer that is 99% reliable has 0 "9"s, a computer that is 99.9% reliable has 1 "9", etc. Visa and similar financial concerns that have critically important hardware try and measure their reliability as 4 "9"s: 99.9999% uptime.
It's interesting to list out exactly what this means:
Unfortunately, when you produce an online game you'll soon learn that a sizeable percentage of your customers will expect 100% reliability (infinite "9"s). Yet, they won't be willing to pay the (literally) millions of dollars that a Visa puts out for the somewhat lower four-9s level of reliability.
Skotos would like to offer no more than two hours of downtime, on average, per month: one hour for scheduled upgrades and one hour for unscheduled problems. That's two hours out of 720 hours in the month, or about 99.7% reliability between zero and one "9". (In actuality, we've traditionally had more like three unscheduled hours of downtime per month, with a lot of the problems stemming from network issues, which would bring us closer to 99.4%) Our own credit card processor has about an hour of downtime for scheduled maintenance every other month, and maybe an hour of unscheduled downtime every year. Pretty good call it 7 hours of downtime out of 8760 hours in the year or 99.92% uptime just over 1 "9".
The point? Without expending millions of dollars in staff time, super-reliable hardware, and plentiful swaps you'll never be to actually approach 100% reliability, and no game company will ever be on the economic scale where a millions of dollar expenditure makes sense. So, let your players know what your abilities are in providing reliable hardware, and what they can actually expect for their subscription fee..
While on the topic, I should mention that there are ways that you can increase the reliability of your hardware. Here's the top effective measures that we've taken at Skotos:
There's a coda to the whole issue of hardware reliance. In the world of a global Internet, much of it will be beyond your control. Our most annoying problems, and the ones most visible to our players, haven't been related to downed machines, but rather to unreliable network connections. We've expended huge amounts of time working with ISPs in an attempt to reduce latency and improve reliability, but often it's not even our ISP at fault, but rather their ISP, or their ISP's ISP, or a player's ISP, or a random "peer" connection. Talk about "beyond your control". All in all this simply suggests more reason to clearly set player expectations in regard to what type of reliability they can reasonably expect.
Always Available Administrators
If you've read this far, you're probably thinking, "Always available machines... that makes sense." After all, it's hardware. It doesn't have any feelings. It doesn't mind if it has to miss Thanksgiving or if it's on call over the Christmas holidays.
People, on the other hand, aren't as easy going. Nonetheless, if you're going to have an online game, and it's going to be available 24x7, you need to figure out some way to keep players entertained all the time and that usually means administration (including plotting, customer support, and any number of other tasks). Fortunately this problem can be approached in a number of different ways.
Rotate Your Administration. If you're trying to build your entertainment entirely around administrators, then you're absolutely going to have to rotate the schedules of your administrators. That could mean lots of different shifts. If you're choosing volunteers from all over the world, it more likely means lots of different time zones.
Empower Your Players. More effectively, you can give some of your players the ability to entertain through limited administrative powers. Not only does this increase the number of people entertaining, but in a global environment like the Internet it increases the likelihood that someone will always be around.
Provide Other Entertainment. However, even in a social game like Castle Marrach, it's insane to think that you'll always have administrators or empowered players on hand to keep things moving. Thus, you need to provide at least some other systems to help keep that momentum going. These could be actual achievement systems, like the skill system in The Eternal City, or they could simply be "backdrop" systems intended to encourage roleplaying and storytelling, like the dueling or chess systems in Castle Marrach.
Clearly Denote Your Exceptions. With all that said, it's also helpful to let players know when administrators are expected to be around (going back to the topic of setting expectations, above). For example, we've always kept Thanksgiving Day and Christmas Day open to Skotos staff as holidays (the only two Skotos holidays). We've thus let our players know that staff will be less available during those times.
For more information on empowering players see Trials, Triumphs & Trivialities #16, Guiding Lights, Trials, Triumphs & Trivialities #43, The Power of the Medium: People, and Trials, Triumphs & Trivialities #67, Creativity & The Online Gamer. For more information on building teams of administrators see Trials, Triumphs & Trivialities #30, The Team's The Thing.
Always Available Coders
At some point, hopefully before you release your game, your code base should settle down to a point where it's stable and bugs that could ruin the entire player experience won't crop up suddenly. Actually, you'll never quite achieve that ideal, but hopefully you'll approach it within a few "9"s.
At that point problems with coder reliability with be of the sort I experienced with Hegemony this last week. A coder will upgrade a system and some minutes, days, hours, or weeks later your pristine game will suddenly spring a "leak". And emergency coder repair will be very quickly needed. Since coders, like administrators, are human beings you won't always be able to guarantee their availability, thus it's best to follow this maxim:
Make Any Code Easy to Roll Back. In other words, if something really doesn't work out, make it easy to go back to the old, previously working code base. Sometimes you won't be able to I couldn't with my Hegemony upgrade because it required changing the data storage mechanism of the entire diplomacy system, and there was no existing level of data abstraction. But whenever you can make a rollback easy, you should. Because then a coder can spend a minute or ten to revert code, and later tackle the big problem at his leisure.
For another instance of us not quite following this rule, see "More on Programmers and Vacations", the final section of Trials, Triumphs & Trivialities #2, Keeping Up with the Joneses.
Always Available Players
Before closing the book on 24x7 games it's worthwhile to note that players won't be available 24x7 any more than administrators, coders, or even hardware really can be. This probably doesn't matter too much in most achievement-based games, but in social games players might be more crucial to society or to plots. (I actually discussed this particular issue in depth two years ago; consider this a synopsis.)
This problem is hard to entirely correctly, but the following helps:
Explain Missing Players. Don't force players to constantly explain absences from critical events. Instead, have some default explanations for absences built into your backstory. In a previous column I wrote that I wished we'd offered the explanation in Castle Marrach that time worked differently for different players, and thus any missing player could be explained by chrono-difference. In a Castle of the Newly Awakened, just saying missing people were "asleep" would have worked well too. Other explanations (visits to nearby realms, hunts, journeys, whatever) would work equally well in different games.
Don't Make Individual Players Critical Paths. When you're designing plots you also need to make sure that players aren't single points of failure. If the Mystical Goobaz is necessary to finish the Plot of the Goobaz and Thingbob, and you give a single player the Goobaz just before he leaves the game, then you're in trouble. There are lots of possibilities to get around this: make sure any plot has multiple paths at any nodal point; make sure that critical elements can be regenerated if they're not used within a certain time; or don't allow critical objects to be logged out of the game ("The Goobaz falls to the ground as Johndoe disappears."). Whichever you prefer; your mileage may vary.
For more information on vanishing players see Trials, Triumphs & Trivialities #9, The Puzzle of the Purloined Players.
Sometimes Available Columnists
I think the whole topic of always available games summarizes to this: you need to make allowances for the fact that players will be trying to play your game 24x7. Do what you can to support this, but at the same time let your players know what the real limitations are.
And that's it this week from this sometimes available columnist. Next week I plan to continue my occasional "Brief History" of Skotos and let you know what changes the past year has wrought. And after that it's Thanksgiving... which this columnist plans to take off this year.
I'll see you in 7.