topnav

Series Info...Notes from the Dawn of Time #22:

A Complete Example

by Richard Bartle
July 3, 2002

Here’s an example of the complete parsing process (as described in my previous 13 articles) in action.

Phase 1 - Command Line Input

The user types a sequence of characters. Let’s go with something that players might actually type, rather than something that illustrates the finer points of parsing:

dr t x diamons^Hds in box

(English translation: DROP ALL TREASURE EXCLUDING DIAMONDS IN THE BOX).

Phase 2 – Pre-processing

The input line is normalised. White space is removed, control characters are processed and letters are converted to same-case.

DR T X DIAMONDS IN BOX

Phase 3 – Tokenisation

The pre-processed line is split into words, which are looked up in the vocabulary. If a word doesn’t have a vocabulary entry, this is reported and the parsing process stops.

With a vocabulary that includes the following:

Word PoS Atom Comment
<dr verb drop>
<t noun treasure_plural> T default to plural
<t adjective treasure_adj for TREASURE MAP
<t verb get> short for TAKE
<x preposition but> short for EXCEPT
<diamonds noun diamond_plural>
<in preposition in>
<in adverb in>
<box noun box>
<box verb hit>

the tokenisation that results would be:

Word DR T X DIAMONDS IN BOX
Number 1 2 3 4 5 6
PoS verb noun
adjective
verb
prep noun prep
adverb
noun
verb

Phase 4 – Parsing

There’s an ambiguity in the input sentence, depending on how its prepositions are read. It could mean:

  • DR (T X DIAMONDS IN BOX)
  • DR (T) X (DIAMONDS IN BOX)
  • DR (T X DIAMONDS) IN (BOX)

How this is resolved is, in theory, entirely up to the parser – so long as it does whatever it does consistently. However, the first alternative is bad because it means DROP can’t ever take 2 parameters, and the second is bad because players almost invariably prefer to specify the subject (ie. first parameter) rather than the object (ie. second parameter). Trust me: even if you have to hack your parser to get the third possibility, do so! In MUD2, I implement it by having two types of preposition: those that connect noun phrases and those that connect noun groups. IN is one of the former and X is one of the latter, which together enforce the third reading of the command. This is how it’s done in the following trace:

                                                            current:
                                                            DR
+ parse([r_input])
| + parse([r_sentence, r_inpa])
| | + parse([r_command, r_snta, r_inpa])
| | | + parse([r_command1, r_snta, r_inpa])
| | | | + parse([r_cmd1a, r_snta, r_inpa])
| | | | | + current(verb)
| | | | | - current(verb) =true                             T
| | | | | + parse([r_noun_phrase, r_cmd1b, r_snta, r_inpa])
| | | | | | + parse([r_noun_group, r_npa, r_cmd1b, ... ])
| | | | | | | + current(article)
| | | | | | | - current(article) =false             
| | | | | | | + parse([r_nga, r_npa, r_cmd1b, ...])
| | | | | | | | + current(integer)
| | | | | | | | - current(integer) =false
| | | | | | | | + parse([r_ngb, r_npa, r_cmd1b, ...])
| | | | | | | | | + current(adjective)
| | | | | | | | | - current(adjective) =false
| | | | | | | | | + parse([r_ngc, r_npa, r_cmd1b, ...])
| | | | | | | | | | + current(noun)
| | | | | | | | | | - current(noun) =true                   X
| | | | | | | | | | + parse([r_npa, r_cmd1b, ...])
| | | | | | | | | | | + current(ng_prep)
| | | | | | | | | | | - current(ng_prep) =true              DIAMONDS
| | | | | | | | | | | + parse([r_noun_group, r_npa, r_cmd1b, ... ])
...
| | | | | | | | | | | | + current(noun)
| | | | | | | | | | | | - current(noun) =true               IN
| | | | | | | | | | | | + parse([r_npa, r_cmd1c, ...])
| | | | | | | | | | | | | + current(ng_prep)
| | | | | | | | | | | | | - current(ng_prep) =false
| | | | | | | | | | | | | + parse([r_cmd1d, ...])
| | | | | | | | | | | | | | + parse([r_cmd1e, ...])
| | | | | | | | | | | | | | | + current(np_prep)
| | | | | | | | | | | | | | | - current(np_prep) =true      BOX
| | | | | | | | | | | | | | | + parse([r_noun_phrase, r_cmd1f, ...])
...                                                         EOL
| | | | | | | | | | | | | | -parse([r_cmd1e, ...])
| | | | | | | | | | | | | - parse([r_cmd1d, ...])
| | | | | | | | | | | | - parse([r_npa, r_cmd1c, ...])
...
| | | - parse([r_command1, r_snta, r_inpa]) =true
| | - parse([r_command, r_snta, r_inpa]) =true
| - parse([r_sentence, r_inpa]) =true
- parse([r_input]) =true

Phase 5 – Binding

From the parse phase, we get

[drop, [in],
    [   [treasure_plural,
            [but, [diamond_plural]]
        ],
        [box, []]
    ]
]

First, the binder identifies the verb by applying each verb modifier to it. If we’ve defined

{ in drop }: insert

then the verb becomes INSERT.

Next, the binder has to bind the nouns, in the context of INSERT. For TREASURE_PLURAL, this means all the objects of class TREASURE that the player’s character is carrying. Let’s say that this gives

[ruby1, ruby2, diamond0, diamond2, ring7]

This set has to be modified by the BUT clause. That clause requires that DIAMOND_PLURAL be bound in the context of INSERT, which gives

[diamond0, diamond2]

We can now apply

but([ruby1, ruby2, diamond0, diamond2, ring7], [diamond0, diamond2])

to get [ruby1, ruby2, ring7] for the set of objects that satisfy the first noun phrase.

For the second noun phrase, we have to bind BOX in the context of being the second parameter to DROP. Let’s say it finds [box2, box5].

So finally the binder has the following information:

Function: insert
First parameter list: [ruby1, ruby2, ring7]
Second parameter list: [box2, box5]

It therefore makes 3 function calls:

insert(ruby1, box2)
insert(ruby2, box2)
insert(ring7, box2)

Phase 6 – Execution

Each execution inserts one object into box2 and generates the appropriate messages. The player reads the messages and decides what to do next. The whole process can then begin again.

For us, however, this marks the end of this series of articles on parsing – at last!

Next column, I’ll start on a fresh topic: mobile AI.

Recent Discussions on Notes from the Dawn of Time:

jump new