Ive been very coy thus far concerning token stream management in my explanation of how to write an all-singing, all-dancing MUD parser. Its not that I have anything to hide, just that if Id showed it earlier then it would have got in the way of the more important points that were being explained. The thing is, it gets
easier to implement with full backtracking, so describing it prematurely would have introduced complexities that were going to disappear anyway.
Ive described how the token stream can be viewed as an array that contains tokens and the parts of speech that they can take. It doesnt have to be complete before parse() is called a just-in-time approach works fine (indeed, its necessary for when you get tokens that can change the meaning of subsequent input, e.g. enquoting verbs).
Ive also shown that when the parser successfully checks for the presence of a token its looking for, it will advance the current token pointer to the next token in the array. What I havent shown is how to return to where you were when you have to backtrack.
Fortunately, its not difficult.
Heres the fragment of the parse() function I gave at the end of my previous article, with the code management stuff added in:
case r_ngb:
if current(adjective) then
begin
advance()
if parse(rule_list) then
return true
retreat()
end
if parse(r_ngc . tail_of(rule_list)) then
return true
else
return false
In other words, every time you advance() along the current token stream, you need to reflect it by a retreat() if the parse from thereon fails.
That wasnt hard, was it? You can do it implicitly by passing an extra parameter to parse() if you prefer, to index the current token.
If youre using something generated by a compiler-compiler, this token management stuff will normally be included for you. It may involve saving and restoring the current token index, though, as when you get to a failure point its not always immediately obvious how many tokens youve skipped over to get there.
OK, thats enough about tokens...
Interlude
At this point, its worthwhile mentioning that players will continually type things that your parser doesnt understand. The golden rule is log everything. Dont be shy you can afford the disc space!
The contents of the resulting log file will generally fall into one of the following categories:
- Typos
- People meant to type SWORD and they typed SWROD. If there are some that crop up all the time, make them into synonyms. MUD2 has CAOL as a synonym for COAL.
- Misspellings
- People think theres a word SEPERATE. Treat as typos.
- Unknown real words
- The player referred to FURNITURE rather than CHAIR, or tried to DISENTANGLE something rather than UNKNOT it. You should allow such alternatives, although this may involve your adding something to the game itself rather than just to the vocabulary. For example, if a room description refers to a window but the game doesnt have the concept of windows in it, youd have to add the concept before you could let people refer to it.
- Misused real words
- So many people think the verb LOSE is spelled LOOSE that you decide to make LOOSE a synonym of LOSE, as well as retaining its proper adjectival meaning.
- Unquoted strings
- This is where people type a freeform message but forget to put the leading SAY or whatever. Alternatively, they might have an IRC channel or similar accessory open and believe theyre typing in a window for that when the focus is actually on their MUD client.
- Ex-vocabulary words
- These are things that were once in the vocabulary but are no longer present, e.g., the name of someone who has recently quit.
- Interrupted lines
- These are usually caused by players who abandon sentences mid-way through by hitting return, rather than by erasing the line like a tidy person would. You can ignore them.
- Misparses
- There are situations where a player is clearly trying to do something sensible but your parser doesnt understand it. If this happens frequently, you may have to hack the parser (or the grammar) to get it to accept the alternate form. MUD2 remaps GIVE THE MAN THE SWORD as GIVE THE SWORD TO THE MAN, for example.
- Experiments
- Some players experiment with parsing to see where the boundaries lie. You dont have to make the parser understand their arcane ramblings because theyll only continue to experiment until they find the new boundaries anyway.
- Messages
- Players sometimes type in messages that they know wont parse, just to see if you keep and read logs...
By regularly consulting your parser failure log file, you can see whats causing your players problems and you are thus better informed as to how you could make life easier for them (not that players ever consider that youd want to make life easier for them...). Logs of the mistakes made by newbies are particularly valuable, as the more the parser gets in newbies way the less likely they are to stay.
End of interlude. Next time, were back to looking at the parsing process and asking what we do with what it produces.