Notes from the Dawn of Time

Notes from the Dawn of Time #12:

Grammar

by Richard Bartle
February 13, 2002

OK, so where are we?

We’re looking at MUD parsers. So far, we’ve examined the first three stages of the process — command line input, pre-processing and tokenisation — and have now finally arrived at the actual parsing phase itself. This is where you wish you’d paid attention in your English lessons at school...

Ah, yes, that is "English lessons". Parsers are language-specific, and unfortunately so am I. Although the basic ideas I’ll be describing translate into most other languages, the actual machinery is going to vary in each instance, perhaps quite radically. You might say "The red windmill" in English, but in French it’s "Le moulin rouge" — the adjective (rouge/red) comes after the noun (moulin/windmill). Indeed, in French you can’t directly state something like, "The teacher is not here" at all unless you know whether the teacher is male or female.

I’m sticking with English, anyway.

Basic English Grammar

A lot of what I’m about to discuss is predicated on an understanding of how English works at a grammatical level. In particular, it relies heavily on an understanding of parts of speech. Because many native English speakers are a little fuzzy in this area, here’s a brief explanation of the concepts involved:

Noun
- A word that names a thing, quality, state etc. A proper noun denotes a single person or place and is always capitalized in English.
- Examples: ball, tree, Joe, hunger, impudence, harmonization, redness.
Verb
- A word stating that a thing undergoes or undertakes an action, or exists in a state.
- Examples: kick, climb, announce, write, ask, harmonize, redden, be.
Preposition
- A word indicating a relationship of time, place, case etc..
- Examples: with, to, by, except, at, on.
Adjective
- A word used to qualify, describe or define a noun.
- Examples: heavy, shiny, kickable, biggest, my, harmonized, red.
Adverb
- A word that modifies a verb, adjective or another adverb.
- Examples: quickly, heavily, happily, very, now, never, reddeningly.
Article
- The words "the", "a" and "an" (which would otherwise be considered adjectives).
Pronoun
- A word used in place of a noun.
- Examples: it, him, her, them, that, this, which, some, any.
Conjunction

A word that joins sentences or parts of sentences.
Examples: and, or, neither, but, then, although, because.

There are many more of these, and they may be subdivided further (e.g. nouns can be plural or singular). The above are the ones that are of primary importance to the MUD parsing process, however. Here’s an example of a full sentence broken down into parts of speech, to illustrate:

TAKE/verb THE/article GREEN/adjective APPLE/noun FROM/preposition THE/article BOX/noun THEN/conjunction HIT/verb IT/pronoun WITH/preposition MY/adjective SWORD/noun.

There are a few other terms I’ll need, but I’ll define them as I introduce them.

Ambiguity

Tempting though it is to "read off" the parts of speech you’re looking for from the flat list of tokens, it’s not generally possible. You can’t simply scan through for the verb, then scan through again for the first noun (if there is one), then scan through a third time for the second noun (ditto). There are several reasons for this, but the main one is ambiguity.

Many words in English can take on the role of several parts of speech without changing their spelling. Although the word "kick" is a verb in "kick the door", it’s a noun in "give the door a kick". In "give this to her", "this" is a pronoun; in "give this shield to her" it’s an adjective. Most annoyingly, many nouns can be used as adjectives, as in "get the gold" versus "get the gold watch" or "congratulate the king" versus "congratulate the king dwarf". Some words have completely unrelated double meanings, e.g. "wind" as a noun is a weather phenomenon but as a verb it’s a rotational movement.

In other words, you don’t know at the tokenisation stage what part of speech a word is taking on; you only know what parts it could take on, and there are usually several alternatives. A command thus has to be broken down methodically to discover which of the possible parts of speech that a token can take it actually does take. This is the job of the parser, working to a grammar.

Warning: the parser can only attack syntactic ambiguity. There is a much more formidable problem, that of semantic ambiguity. Semantic ambiguity can only be resolved (if indeed it can be resolved at all) by the application of something that people have but computers don’t: common sense. To demonstrate, suppose you were to enter a room and see two men, one of whom had a sneer and the other of whom didn’t. You might take a dislike to the former and HIT THE MAN WITH THE SNEER. Syntactically, though, this is identical to HIT THE MAN WITH THE SWORD. If the man wasn’t carrying a sword but you were, then you’d want to use the sword to hit the man; if the man was carrying a sword but you weren’t, you’d want to hit him (implied: with your hand).

Unless you allow people to use parentheses (which is an option), you can’t get your parser to disambiguate between <verb> (<noun> <preposition> <noun>) and <verb> (<noun>) <preposition> (<noun>). Either you have to do a lot of finicky context programming that may still not work (as in the case where both you and the man have swords) or you should choose one option and stick with it. I recommend the latter; at least that way the players know what to expect when they type a sentence containing a preposition, instead of having to second-guess what the parser thinks they might mean.

Next article, I’ll explain the basic mechanism by which the parser proceeds.

Recent Discussions on Notes from the Dawn of Time: