Introduction to Parsing
by Richard Bartle
In the context of MUDs, parsing is the process by which players input is converted into action.
Well, OK, there is a second use talking to mobiles but thats a separate topic. Im going to be talking about command parsing: how your typing KILL OGRE eventually gets to initiate a fight between your character and some nearby ogre.
Natural Language Understanding is a very large and difficult field of Artificial Intelligence. It isnt made any easier by the fact that people and computers are good at different things. People excel at bringing all kinds of knowledge to bear in understanding an utterance, and all at the same time; computers do well at following fixed grammatical rules.
MUDs have to run on computers. They must parse input as quickly as possible, and do so without the aid of parallel processors. This will doubtless change as AI techniques advance, but it hasnt changed for the past 15 years and might not for the next 15 either. Under these conditions, how can a simple MUD programmer hope to make a reasonable job of getting their game to accept anything other than the most basic of sentences?
The Problem Space
Fortunately, MUDs have a number of things going for them (at least if the mother tongue of their players is English) that general-purpose Natural Language Understanding systems dont.
The only other type of input likely to be presented to a MUD is the question. In English, some of these can be can be mapped with minor differences into the same format as commands (WHERE IS THE SWORD), but there can be deeper problems for queries such as IS THE SWORD IN THE SCABBARD. I wont be going into these in detail, but suffice to say theyre doable.
Even given the above constraints, however, MUDs still dont have a hope of being able to make sense of every command given to them they can never understand all of English (or whatever). What they can do is understand enough of a subset of the language that players can express their desire for action without having to think too hard about it.
In other words, the parser accepts commands in a language which, although not actually English, is enough like it that it doesnt give players a problem. The more this language resembles English, the more an English-speaker will be able to use it without being inconvenienced.
Although human beings can and do perform the entire parsing process in real time on-the-fly, computer games dont get to do that (not just yet!). Instead, the overall parsing process is partitioned into a number of sub-processes (passes), generally performed sequentially in order:
(Execution follows binding).Command line input delivers an ordered set of ASCII characters constructed at the whim of the player. Basically, people type things, they backspace, retype, delete some more, type some more, and so on until they hit the return key. The entire line (which may equate to several commands) is then made available for pre-processing. Pre-processing tidies up what the player typed so that it fits some standard form. This normally involves compacting white space, converting everything to same-case, performing any macro expansions and stripping anything thats out of range (e.g. control characters). It makes the step that follows run a lot faster. Tokenisation takes the now standard-form input line and splits it into individual units (symbols). These units will typically concern integers, punctuation, quoted strings and words (separated from each other by white space). For the integers and strings, tokenisation creates a record to represent them; for words or punctuation, it looks them up in a table and extracts a predefined record from there. These records represent "tokens" semantic entities with potentially several meanings, waiting to be attributed to particular parts of speech. The system reports an "unknown word" error if it cant find a definition for a word (i.e. cant tokenise it).
Pre-processing and tokenisation together comprise the lexical analyser module of the system. It sometimes happens that you need to run them in tandem, rather than in sequence. This is particularly the case if you have verbs that imply that everything following them ought to be a string (e.g. some players prefer SAY HI THERE PEEPS! to SAY "HI THERE PEEPS!").Parsing, although also applied to the entire process, properly refers only to the specific task of taking a list of tokens and from it creating a meaningful data structure. To do this, it follows a grammar a set of recursive rules that defines all possible valid input sentences. If the parse fails, the input could not be understood and the player who typed it should be given some indication of where the problem lies; if it succeeds, the parsers output is a series of individual commands each of which is represented as a tree of elements (in MUD2s case, from the hierarchy that I described in my opening articles to this series).
The final step is binding. This takes the parsed data structure and applies it to the players situation in the game world to generate actual commands that can be executed. In many ways, its the most challenging part of the whole process. If a player types HIT ELF and there are 20 elves in the game, which one does the player mean? The one in the same room. What if there are several in the same room? What if the input was SUMMON ELF, or some other spell that could apply to any elf wherever it was? Or SUMMON DARK ELF? Its the binders job to decide all this. Once it has done so, parsing is formally over and the resulting world-changing functions are invoked: [SUMMON ELF21]() or whatever.
Note that although Im talking about parsing primarily in the context of text MUDs, the same principles apply to graphical MUDs. The interface differs and the text generated by the interface differs (itll be in the more rigid format of a packet rather than a stream of freeform ASCII characters) but the general process is pretty much the same: read data; clean up data; formalise data; structure data; map data to code; execute code.
OK, well thats given you a brief overview of the architecture of a parser. Next time, Ill begin explaining how the individual components work.