Grammarby Richard Bartle OK, so where are we? Were looking at MUD parsers. So far, weve examined the first three stages of the process command line input, pre-processing and tokenisation and have now finally arrived at the actual parsing phase itself. This is where you wish youd paid attention in your English lessons at school... Ah, yes, that is "English lessons". Parsers are language-specific, and unfortunately so am I. Although the basic ideas Ill be describing translate into most other languages, the actual machinery is going to vary in each instance, perhaps quite radically. You might say "The red windmill" in English, but in French its "Le moulin rouge" the adjective (rouge/red) comes after the noun (moulin/windmill). Indeed, in French you cant directly state something like, "The teacher is not here" at all unless you know whether the teacher is male or female. Im sticking with English, anyway. Basic English GrammarA lot of what Im about to discuss is predicated on an understanding of how English works at a grammatical level. In particular, it relies heavily on an understanding of parts of speech. Because many native English speakers are a little fuzzy in this area, heres a brief explanation of the concepts involved:
There are many more of these, and they may be subdivided further (e.g. nouns can be plural or singular). The above are the ones that are of primary importance to the MUD parsing process, however. Heres an example of a full sentence broken down into parts of speech, to illustrate: TAKE/verb THE/article GREEN/adjective APPLE/noun FROM/preposition THE/article BOX/noun THEN/conjunction HIT/verb IT/pronoun WITH/preposition MY/adjective SWORD/noun. There are a few other terms Ill need, but Ill define them as I introduce them. AmbiguityTempting though it is to "read off" the parts of speech youre looking for from the flat list of tokens, its not generally possible. You cant simply scan through for the verb, then scan through again for the first noun (if there is one), then scan through a third time for the second noun (ditto). There are several reasons for this, but the main one is ambiguity. Many words in English can take on the role of several parts of speech without changing their spelling. Although the word "kick" is a verb in "kick the door", its a noun in "give the door a kick". In "give this to her", "this" is a pronoun; in "give this shield to her" its an adjective. Most annoyingly, many nouns can be used as adjectives, as in "get the gold" versus "get the gold watch" or "congratulate the king" versus "congratulate the king dwarf". Some words have completely unrelated double meanings, e.g. "wind" as a noun is a weather phenomenon but as a verb its a rotational movement. In other words, you dont know at the tokenisation stage what part of speech a word is taking on; you only know what parts it could take on, and there are usually several alternatives. A command thus has to be broken down methodically to discover which of the possible parts of speech that a token can take it actually does take. This is the job of the parser, working to a grammar. Warning: the parser can only attack syntactic ambiguity. There is a much more formidable problem, that of semantic ambiguity. Semantic ambiguity can only be resolved (if indeed it can be resolved at all) by the application of something that people have but computers dont: common sense. To demonstrate, suppose you were to enter a room and see two men, one of whom had a sneer and the other of whom didnt. You might take a dislike to the former and HIT THE MAN WITH THE SNEER. Syntactically, though, this is identical to HIT THE MAN WITH THE SWORD. If the man wasnt carrying a sword but you were, then youd want to use the sword to hit the man; if the man was carrying a sword but you werent, youd want to hit him (implied: with your hand). Unless you allow people to use parentheses (which is an option), you cant get your parser to disambiguate between <verb> (<noun> <preposition> <noun>) and <verb> (<noun>) <preposition> (<noun>). Either you have to do a lot of finicky context programming that may still not work (as in the case where both you and the man have swords) or you should choose one option and stick with it. I recommend the latter; at least that way the players know what to expect when they type a sentence containing a preposition, instead of having to second-guess what the parser thinks they might mean. Next article, Ill explain the basic mechanism by which the parser proceeds. |