Notes from the Dawn of Time

Notes from the Dawn of Time #17:

Token Stream Management

by Richard Bartle
April 24, 2002

I’ve been very coy thus far concerning token stream management in my explanation of how to write an all-singing, all-dancing MUD parser. It’s not that I have anything to hide, just that if I’d showed it earlier then it would have got in the way of the more important points that were being explained. The thing is, it gets easier to implement with full backtracking, so describing it prematurely would have introduced complexities that were going to disappear anyway.

I’ve described how the token stream can be viewed as an array that contains tokens and the parts of speech that they can take. It doesn’t have to be complete before parse() is called – a just-in-time approach works fine (indeed, it’s necessary for when you get tokens that can change the meaning of subsequent input, e.g. enquoting verbs).

I’ve also shown that when the parser successfully checks for the presence of a token it’s looking for, it will advance the “current” token pointer to the next token in the array. What I haven’t shown is how to return to where you were when you have to backtrack.

Fortunately, it’s not difficult.

Here’s the fragment of the parse() function I gave at the end of my previous article, with the code management stuff added in:

	case r_ngb:
		  if current(adjective) then
		  begin
			    advance()
			    if parse(rule_list) then
	return true
		  retreat()
		  end
		  if parse(r_ngc . tail_of(rule_list)) then
			    return true
		  else
			    return false

In other words, every time you advance() along the current token stream, you need to reflect it by a retreat() if the parse from thereon fails.

That wasn’t hard, was it? You can do it implicitly by passing an extra parameter to parse() if you prefer, to index the “current” token.

If you’re using something generated by a compiler-compiler, this token management stuff will normally be included for you. It may involve saving and restoring the current token index, though, as when you get to a failure point it’s not always immediately obvious how many tokens you’ve skipped over to get there.

OK, that’s enough about tokens...

Interlude

At this point, it’s worthwhile mentioning that players will continually type things that your parser doesn’t understand. The golden rule is log everything. Don’t be shy – you can afford the disc space!

The contents of the resulting log file will generally fall into one of the following categories:

Typos
- People meant to type SWORD and they typed SWROD. If there are some that crop up all the time, make them into synonyms. MUD2 has CAOL as a synonym for COAL.
Misspellings
- People think there’s a word SEPERATE. Treat as typos.
Unknown real words
- The player referred to FURNITURE rather than CHAIR, or tried to DISENTANGLE something rather than UNKNOT it. You should allow such alternatives, although this may involve your adding something to the game itself rather than just to the vocabulary. For example, if a room description refers to a window but the game doesn’t have the concept of “windows” in it, you’d have to add the concept before you could let people refer to it.
Misused real words
- So many people think the verb LOSE is spelled LOOSE that you decide to make LOOSE a synonym of LOSE, as well as retaining its proper adjectival meaning.
Unquoted strings
- This is where people type a freeform message but forget to put the leading SAY or whatever. Alternatively, they might have an IRC channel or similar accessory open and believe they’re typing in a window for that when the focus is actually on their MUD client.
Ex-vocabulary words
- These are things that were once in the vocabulary but are no longer present, e.g., the name of someone who has recently quit.
Interrupted lines
- These are usually caused by players who abandon sentences mid-way through by hitting return, rather than by erasing the line like a tidy person would. You can ignore them.
Misparses
- There are situations where a player is clearly trying to do something sensible but your parser doesn’t understand it. If this happens frequently, you may have to hack the parser (or the grammar) to get it to accept the alternate form. MUD2 remaps GIVE THE MAN THE SWORD as GIVE THE SWORD TO THE MAN, for example.
Experiments
- Some players experiment with parsing to see where the boundaries lie. You don’t have to make the parser understand their arcane ramblings because they’ll only continue to experiment until they find the new boundaries anyway.
Messages
- Players sometimes type in messages that they know won’t parse, just to see if you keep and read logs...

By regularly consulting your parser failure log file, you can see what’s causing your players problems and you are thus better informed as to how you could make life easier for them (not that players ever consider that you’d want to make life easier for them...). Logs of the mistakes made by newbies are particularly valuable, as the more the parser gets in newbies’ way the less likely they are to stay.

End of interlude. Next time, we’re back to looking at the parsing process and asking what we do with what it produces.

Recent Discussions on Notes from the Dawn of Time: