Series Info...Notes from the Dawn of Time #10:

The Stages of Parsing

by Richard Bartle
January 16, 2002

Last time, I described how the parsing process could be partitioned into 5 stages:

  1. Command line input.
  2. Pre-processing.
  3. Tokenisation.
  4. Parsing.
  5. Binding.

Before I go into these in detail, it’s interesting to consider where exactly each step is performed. Typically, modern MUDs will consist of a server and a client. Each user has a client, but there’s only one server (or cluster of servers) per game instance. It therefore makes sense to do as much work at the client end as possible, because this will reduce the workload of the (shared) server.

Basic command line input can clearly go in the client. Even telnet will do backspacing, although as we’ll shortly find out there are many more things that can be done to make players’ lives a little easier.

Pre-processing could also go into a client, assuming it was one built specifically for a particular MUD (or for a particular set of MUDs sharing a protocol). For tokenisation to join it, however, would require the downloading of an up-to-date list of all current entries in the game’s vocabulary. Actually, this can be achieved relatively painlessly — 30,000 words is only 150K or so uncompressed, and from then onwards updates would consist of very small patches only. Furthermore, tokenised input is much shorter than untokenised input, so players would use less bandwidth on the uplink if they had a tokenising client than if they didn’t.

With the tokenisation code already in the client, adding the necessary programming for the parse phase isn’t a problem (although updating it might be, if it needed to be changed). Binding, however, can never occur in the client because it relies on real-time game data that is known only to the server.

Tokenisation Problems

You have to write the tokenisation and parsing code anyway, so why not put it into a client instead of the server? Why should the server have to parse the commands of every player when each could do it comparatively easily on their own machines?

Well, there are three reasons. The first is that MUDs requiring a specific client have an uphill battle persuading people to use that client. How do you ever get newbies if they can’t just pop in to have a look around? The server has to be able to understand both parsed input from the client and unparsed input from vanilla telnet. It’s possible, but a little bleah!

The second reason is that if you give players the vocabulary then they’ll know what’s in it. It passes them information that they might not have been privy to before, which could easily give them an advantage in the game. It’s not that they can issue commands from a hacked client that they couldn’t from telnet; rather, it’s that they can see what commands might make sense when a regular player would have to hunt out clues. Every time the coders make a change to the game, a quick comparison of vocabularies will rapidly inform people exactly what has been added and probably suggest exactly what they should do with it. It takes away all the mystery!

The third reason is that the server has to perform a sanity check on everything the client sends it anyway, just in case the data has been tampered with. For short sentences (i.e. most of them), this is going to take just as long as a parse would anyway.

I think the fact that bespoke clients aren’t all that easy to write might have something to do with it, too...

Command Line Input

The first step to successful parsing is command line input. This is pretty much the same for a MUD as it is for any command line interface, from the MS-DOS prompt to a Unix shell. People type stuff in, mess about with it until they’re satisfied, then type a return to transmit it to the server.

Command line input doesn’t need to know anything about the game at all. It isn’t remotely concerned with what is transmitted, so long as it meets some common, basic standard. For most MUDs, this standard is that the line is in plain ASCII and terminated by some special character (usually ASCII 12 or 13, i.e. line feed or carriage return or both). That’s what makes it a "line".

These days, command line interfaces are more of a topic for programmers of clients than programmers of MUDs. So long as a MUD’s pre-processor receives a stream of ASCII characters, it doesn’t care whether they were generated by typing, hot keys, clicking a button, triggers or a bot (which is just as well, as it can’t tell anyway). Telnet is sufficient to play most MUDs, and even that can handle backspacing by itself; there’s thus no real pressure to program the server to do it.

Listing what command line features players want is therefore pretty much the same as listing what today’s popular clients provide. Nevertheless, here are some of the more useful of them (for input — although clients also handle output, it’s input that’s the issue here):

  • Backspace character/word/line
  • Cursor back character/word/line
  • Cursor forward character/word/line
  • Single-key repeat-last-line
  • Copy from command history
  • Insert/overwrite toggle
  • Cut and paste
  • Hot keys
  • Key rebinding
  • Parameterized macros
  • Automated (timed or triggered) commands

That’s quite a lot, and it merely scratches the surface of what’s possible. More interesting, perhaps, is what isn’t possible, i.e. those aspects of command line input processing which must be done at the server end (if done at all). There are three main cases:

  • Word completion. You type GET CHRY$, it expands the CHRY$ into CHRYSANTHEMUM. The client needs to know the vocabulary to be able to do this (although players could be allowed to provide their own, local vocabulary to fake something similar).
  • Line too long. No matter how large you make your input buffer, someone’s going to flood it. You can’t rely on always getting input in convenient 1-80 character chunks.
  • Too many lines. Programs can generate input quicker than servers can handle it. If lots of lines from an individual player await processing but that player continues to transmit more, they must be stopped.

I think that’s enough on command line input. It’s not exactly riveting stuff, but it has to be done. Next article, I’ll switch to the only marginally more interesting topic of what happens to a line of input once the tokenisation phase gets hold of it.

Recent Discussions on Notes from the Dawn of Time:

jump new