Notes from the Dawn of Time

Notes from the Dawn of Time #24:

An Introduction to Planning

by Richard Bartle
July 31, 2002

The model I described last time for a mobile AI, wasn’t actually for a mobile AI; it was for a generic AI. When programming mobiles, we can take advantage of a number of useful MUD-specific features:

We can use the “real” game world as the world model, because we have direct, up-to-date access to it.
Only free-form input (eg. speech) needs to be parsed; everything else can be tagged in advance. Because we (the game engine) generate the messages ourselves, we can give precise meanings without going through English. The mobile doesn’t have to parse The elephant is advancing towards you menacingly as we can simply tell it [attacking, elephant].
There’s no real need to learn, because mobiles don’t live all that long anyway.
The planner can act as the effectors. Whereas in humans our brains somehow need to tell our muscles to contract in our calves and thighs in just the right way for us to walk, in MUDs we can just call the [east]() or whatever command.

This gives a much more program-like model:

Input from the game world is inserted pre-parsed into a sensory buffer. When it’s the mobile’s “turn” to act, its emotional system looks at the buffer. If there’s anything in there that requires action, it interrupts by putting an action at the head of the plan, perhaps something like [handle, event]; otherwise, it checks the planner isn’t looping, and gives it control if not. The planner takes the first action from the plan. If this is a “physical” one, it calls the appropriate game function. If not, it’ll be a mental action (eg. “figure out how to open the door” rather than “get key from bag”) that changes the plan rather than the game world. The planner reads the state of the world directly from the game, instead of from some model of it, although the procedure is moderated so as not to involve cheating (eg. if you’re invisible, it still can’t see you).

The plan

The core of the system is the plan. This is a basic data structure – a linear list of actions that the mobile intends to perform in sequence. The emotional system is predefined code that defines the general behavioural responses of the mobile, over which it has no control; the planner generates code on the fly in response to the situational needs, which it stores as the plan. I should point out that there are such things as non-linear plans, by the way, but they can be implemented linearly and are more complex than we’re going to need anyway.

OK, I guess I’d better give an example of how this works...

Suppose a pixie is minding his own business when he suddenly becomes hungry. His sensory buffer has [hungry] inserted into it. Next tick, his emotional system checks this out. It can either dismiss it or do something about it. This being a happy-go-lucky pixie rather than a lean, mean goblin, it is predisposed to do something about it. It inserts an action at the head of the plan, say [goal, [not, hungry]].

The planner comes along and sees the goal. It looks through all the actions it knows to see if any of them result in the pixie’s being not hungry. Some can be dismissed because they have side effects vetoed by the emotional system (eg. committing suicide), but the obvious one to take is to eat something. The action [eat, food] has a precondition of [holding, food]. The planner therefore changes the plan to be: [[goal, [holding, food]], [eat, food]].

Next tick, the emotional system checks the sensory input buffer (finding nothing new), checks out the plan (it’s not stuck) and passes control to the planner. The planner looks at the first element in the plan: [goal, [holding, food]]. It considers all the actions it knows that could cause it to be holding some food, and discovers that it has to pick the food up.

The process continues. The planner finds that the pixie has to be in the same location as some food in order to pick it up

[[goal, [same, [location, me], [location, food]]], [get food], [eat food]]

then it thinks of some food it knows

[[goal, [same, [location, me], [location, peach2]]], [get peach2], [eat peach2]]

then it figures out how to reach the food

[[goto, kitchen], [get peach2], [eat peach2]]

and so on. It chips away at the first element of the plan, until it eventually gets to something it can just do. The pixie’s planat this stage will be something like:

[[west], [north], [get, peach2], [eat peach2]].

He can then execute this plan by performing each action in turn, which will eventually satisfy his original goal of not being hungry.

That was a very simple yet somewhat tiresome example. Fortunately, we can get computers to do it all for is.

Next time, I’ll begin explaining what use this is.

Recent Discussions on Notes from the Dawn of Time: