Series Info...Trials, Triumphs & Trivialities #193:

Managing User Creativity, Part Two

by Shannon Appelcline

As I've discussed previously in this column, user creativity is a great force which can allow the creation of new content at a level that staff couldn't possibly manage. However, user creativity also leads to issue of management, because it can be somewhat chaotic, somewhat unfocused, and somewhat uneven.

In my last column I offered three real examples of how we've tried to manage user creativity at Skotos and RPGnet. At My.Skotos we tightly constrained it. For Hegemony maps, we depended upon our player's precise technical aptitude. For RPGnet reviews we instead offered very freeform input, and though the reviews are fine, the technical information entered with them tends to be uneven.

Recently at RPGnet we rolled out a new piece of software, The Gaming Index, which is intended to be a huge catalog of ever tabletop roleplaying product ever produced (and as of this writing already contains about 2300 entries). Users can freely enter new games and correct old games via simple web forms, and that's allowed for remarkable growth in the period of just a month or so.

As with the RPGnet reviews, the Index allows for very freeform entry of information. However not only have we learned some lessons since our last user creativity management project, but there have also been some technological advances. As a result I think we've put together a much superior management system than any of our previous attempts. It's still a work-in-progress mind you, but it's looking like a good work thus far, and so this week I'm going to share some tricks and tips for managing this type of creative management project.

The Technical Side

A lot of the technical side of this sort of project is pretty obvious. We have a front-end built in PHP, though that could as easily be Perl, ASP, or whatever other language someone was interested in using. When we received RPGnet a couple of years ago parts of it (largely rewritten now) were written in PHP, and we decided to maintain that as a standard.

The back-end uses a MySQL database. Though MySQL has gotten some flak over the years for efficiency, machine power has largely overstripped those concerns. MySQL is also at the heart of the huge RPGnet forums (approximately #100 in the Internet based on size), and it's unlikely that the Index will get to that size any time soon even with freeform user input of games, ratings, and ownership. So, as long as we maintain some level of effiency in our table construction and use we'll be fine. (Though that question of efficiency is actually a real one. Just neglecting to mark a column that's used to join tables as an index, cost us something like 100x slowdown in one lookup before we figured it out.)

The heart of the Index as it relates to user creativity is that users can enter new games to the database. These means that they describe the games very specifically. This includes entering publisher, genre, background setting, year of publication, game system, a summary, and lots more.

Some of the info entered can only be (reasonably) entered in one possible way (e.g., year), while other info is entirely freeform (e.g., summary). However three other types of information can cause troubles in this type of user entry system.

The first is set information that can have slight variability. I talked about this in the last article. Some people might enter the exact same info in different ways (e.g., the company "Chaosium Inc." could be entered as "Chaosium", "Chaosium, Inc.", or "Chaosium Inc").

The second is set information that has intrinsic variability. This most commonly comes up in names, where the person most commonly known as "Marc W. Miller" also has credits under "Marc Miller" and "Marc William Miller". Likewise, "Chaosium Inc." was previously known as "The Chaosium".

The third is entirely freeform information. Genre is a good example of this. "Science Fiction" might be a fair standard, but what about "Science Fantasy"? Or "Space Opera" or "Hard Science"? This information could literally have thousands of possible and correct entries.

When you're trying to tie information together, these differences can cause problems, because you can't reasonable search for all the misspellings or similar but alternate spellings at once. So, you need some ways to massage the data, preferably in a largely automated method, to ensure that it reasonably consistent.

There are a few ways to do this, but first I'm going to talk about the technical methods. AJAX: One of the great expansions of web design in the last few years is AJAX. It connects up your web page with Javascript and XML such that a web browser can talk to a web server live and then make minor updates to a web page being browsed without having to reload the whole page. This allows for a lot more interactivity between the browser and server that web designers weren't previously willing to do (because they'd have previously required reload after reload).

The downside of AJAX is that it's an unholy mess. To be precise, in the Gaming Index I have PHP which talks to Javascript which talks to XML which talks to PHP which does MySQL lookups. Bleh. Thanks to the wonders of open source on the Internet (that's more user creativity for you) I have access to a nice PHP package called XAJAX which takes care of all the JS and XML creation for me, but it's still set back debugging thirty years; I might as well be using punchcards for all the feedback I get on errors, some of which are almost totally invisible because they're occurring back in the Javascript or even in the PHP-to-JS conversion. I say again, bleh. I've spent an hour debugging a ":" that was supposed to be a ";". If the functionality weren't so useful, I'd drop it like a box of dead rats.

In any case, the biggest problem with managing user data entry of this type has to be the freeform information I mentioned like genre (and also "background setting" and "game system", which are a little less freeform than "genre", but not a lot).

To manage these in the Gaming Index, I did two things. First, this freeform information is always selected from pull-down menus (on which, more in a second) with an option to enter new text if you don't like the options you're presented with. Second, they use AJAX. In particular, the first things you're asked to enter are genre & setting. If you want, you can do a quick AJAX lookup of your game system, and it'll then tell you the most frequent genres and settings for that game system, ordering them by usage. It even auto-sets the drop-down menus to the most common settings for you.

I think AJAX's most important utility for any data management project is to mold data input in this exact manner. I'm going to do more of it as the project matures. (Making good use of the data already in the database, as we do, is also pretty crucial.)

Drop Downs v. Free Entry: One question that arises on the data input side of any creative management project is whether to use drop-down menus or freeform text entry boxes.

Generally, I don't like drop-downs. They break the flow of data input and for a project with many different variables they can get ungodly big. However they're crucial in the case of freeform input. So my general philosophy is to use them as little as possible, but that does mean using them for the more freeform possibilities

However for the set information, even if it does have some variability of entry, I let people type things in freeform. This improves the data entry experience (as users don't have to try and find an author from a list that would currently be around 2300 different people) but it makes the data dirtier because there's more possibilities. For that, other management solutions are required.

Searches, Aliases, and Peers

In balancing user experience against data purity, we've come down on the side of user experience when possible. But that means that more data needs to be fixed. To do this you need to first find the errors, then resolve them.

Searches: We do not yet have any systematic way to find discrepencies in our user data entry. It's all administrator approved, so nothing terrible should get in, but on the other hand an administrator might not notice that, say, both "Douglas Niles" and "Doug Niles" are going into the database as authors.

(Note: not a made-up example. I just found 1 Doug in the database to 14 Douglases. They were all entered correctly from the books as far as I can tell too, putting the discrepency one step back, at the publisher's doorstep.)

One of the important tools that we're using thus far to find this sort of problem is a simple search. We have pretty robust searches built into the Index, so that an end user can (for example) do a search for every Wizards of the Coast book ever produced (153 in our database thus far, which is a long ways from complete), then sort that listing by year published or game system or whatever.

This makes it easy for an administator (or, down the line a bit, a line editor) to look at a broad set of games by publisher, setting, or game system, and then to see if everything matches up as it should. For example you can look at all the Vampire: The Masquerade games and see if some were published by "White Wolf", some by "White Wolf Game Studio", etc. (Fixing it is the next step.)

A single administrator can't possibly cover an entire database of this size, but the next step in managing this type of user creativity is lining up local experts, each of who has good knowledge of a certain area, and then letting them watch over their categories.

The searches I talk about are a pretty rudimentary way to discover this type of issue. Smart programs that scoured the database looking for discrepencies that appear "wrong" would be better, but figuring out what's right and what's wrong could be quite a puzzle on its own.

Aliases: After you find a problem the next step is correcting it. The Index's most basic tool for that is an alias, which is totally invisible to users but accessible to administrators (and down the line, line editors). The basic idea of an alias is "for type of information A, datum B is the same as datum C".

It's intended for situations where two different data entries should really be the same thing. For example, "Marc Miller", "Marc W. Miller", and "Marc William Miller" are really all the same person, and there's not much purpose in differentiating between them. So one's chosen as the "default" which is typically the most commonly used variant, the most correct variant, or the most recent variant (as appropriate). And an alias is then entered for each of the alternatives.

An alias does two things. First, whenever new information of type A is entered, then datum B is converted into datum C. Second, whenever a user does a search for information type A, a search of datum B is converted into datum C. This means that the database slowly gets cleaner. As it becomes obvious that people incorrectly enter datum B as info, then it gets fixed, and all future data becomes correct and our search functionality magically improves too.

These aliases can also be used for the more freeform information entries too,if there's something being entered consistently wrong. For a while we had people entering "Star Trek" as a genre. An alias now turns "Star Trek" as a genre into "Science Fiction".

Technically, there's some loss of info here. If we were trying to produce an exact replica of data as it appears in the games, this would be a failure, because "Marc W. Miller" might really appear in one book and "Marc Miller" in another. However what we're trying to preserve is actually the core of the information, rather than its facade, and for that purpose an alias is fine.

Peers: There are other situations where two datums of information are equivalent, but they really do have different names. For myself, "Shannon Appelcline" and "Shannon Appel" are two genuinely different names, not just a choice of whether to use a middle initial or not. Likewise there are some game books by "Lynn Willis" and some by "L.N. Isinwyll". The latter is a pseudo-name, but they're the same person. In terms of game systems, the game called "Hero Wars" later became "HeroQuest". Or, alternatively, the game called "Traveller" became "Megatraveller" then "Traveller: The New Era", then "Traveller" (again). (My earlier example of "Chaosium Inc." and "The Chaosium" technically fits into this category too, though in that case I decided to just alias them: any data entry project is ultimately about choices.)

However, for the purposes of these real, and meaningful differences, a different type of data massaging system was needed. To resolve that in the database we created "peers" which essentailly say "for type of information A, datum B is the same as datum C, but they're distinct enough that they should be kept separate". Peers don't change the actual data, but do display them together in searches.

Hierarchies: A closely related system is the hierarchy, which says that "for information type A, datum B is a super/subset of datum C". This doesn't show up for the authors which have been my prime examples thus far, but it does get used in game systems (where "Call of Cthulhu" is a "BRP" system and "Vampire: The Masquerade" is a "Storyteller" system) and in companies (where "Black Dog" is an imprint of "White Wolf").

These last two elements, peers and hierarchies, aren't quite user-input problems such as aliases are meant to correct, but are instead data problems, which may not be an issue in all sorts of user creativity management.

Problems to Date

As I said, I think our Gaming Index system is cleaner and produces better freeform data than anything we've done to date. However, it also has some issues. Here's some that we've encountered and continue to encounter:

Database Structures: For any user-input data to work, you have to have a database that adequately supports it. As we had real data input into the Gaming Index we started to find where it was inadequate. The biggest early change that we made was to break up the Setting into Genre and Setting. As it was originally put together, various users would randomly enter either real setting names (e.g., "Middle Earth") or general genres (e.g., "Heroic Fantasy"). As soon as we provided our users with better and clearer data input categories, the quality of the data jumped considerably.

We're also running into at least one problem where the database is simply inadequate to store the data as it really exists. We've tried to create a rational structure for games by allowing multiple editions of a product to be grouped under that specific product. But it turns out that we didn't allow for enough differentiation between those editions. Authors might change, or users might want to rate different editions differently, but the database only supports such changes on a per-product basis. We are planning to expand to support these, but the issue is how to do so without making regular data entry more complex.

Right now we have 2301 different books, and 2497 different editions, meaning that the question of editions is only relevant for less than 8% of the entries in the database (perhaps considerably less, because if a game has two editions, it may well have five or ten). We need that complexity to ensure that those 8% of the entries are most correct, but if we're not careful we'll degrade the data quality of the other 92% of the books because of confusion that the new possibilities create. We'll probably resolve it by allowing for ways that certain data can be set per edition, but only making that available to "advanced" users (who declare themselves such) and by only allowing it to be set when a book ends up with multiple editions.

Database Input: Between AJAX, drop-down menus, and aliases, our data input has been pretty clean. The only problem that regularly turns up is that some users don't realize that they can add new backgrounds, systems, and genres to the database just by choosing "Other" and then typing something in the nearby text box labeled "Other:".

It would seem pretty obvious, but it's clearly not. People choose "Other", but then don't type anything. I think it has to do with the drop-down menus which have a sort of forboding and permanent look, like it's the sort of thing a normal user couldn't change just by entering data.

The solution is really easy. We should be checking our data a little better as it's input. If someone selected "Other" and didn't enter anything into the text box, they should get bounced back to the page with a warning. That type of rapid feedback is almost as good as AJAX.

Pseudo Aliases: Even with all the work with the alias, peer, and hierarchy systems, there's still some more work to be done with data input. For example, take my earlier example of "Traveller". Technically those four systems are called "Traveller 1st edition", "Megatraveller 1st edition", "Traveller: The New Era 1st edition", and "Traveller 4th edition". If you do the math you see that "Megatraveller" is really "Traveller 2nd edition" and "The New Era" is really "Traveller 3rd edition". However, "Megatraveller" and "The New Era" are what users know them as. It'd be really nice if we could treat these systems by their correct names internally, but display them by their common names externally. This is probably possible by a variant of the alias system, where for information type A, datum B changes to datum C but continues to be displayed as datum B. However, this would have implications throughout the system, so I haven't looked into it more deeply.


I'm aware of a couple of other gaming indexes on the Internet, but they each depend upon the input or OK of a single person, and so are ultimately constrained. In the RPGnet Gaming Index we've instead tried to create a system where user creativity can create and slowly improve the index. I think we've done a good first cut of it, through lessons learned from years of other, similar projects, and that it's just going to improve as time goes on.

[ <— #192: Managing User Creativity, Part One | #194: Collective Choice: Using Five-Star Rating Systems —> ]