Dr. Mark Humphrys School of Computing. Dublin City University. My big idea: Ancient Brain Search:

Help on displaying equations

```
```

```

```

# 1 The Action Selection problem

By Action Selection we do not mean the low-level problem of choice of action in pursuit of a single coherent goal. Rather we mean the higher-level problem of choice between conflicting and heterogenous goals. These goals are pursued in parallel. They may sometimes combine to achieve larger-scale goals, but in general they simply interfere with each other. They may not have any terminating conditions.

These two problems have often been confused in the literature - this thesis shall argue in favour of carefully separating them. All systems solve the first problem in some form. The second one has been given far less thought.

Action Selection is emerging as a central problem in the simulation of whole creatures (as opposed to the solution of isolated uninterrupted tasks). It is clear that many interesting systems possess multiple parallel and conflicting goals, among which attention must endlessly switch. Animals are the prime example of such systems.

Typically, the action selection models proposed in ethology are not detailed enough to specify an algorithmic implementation (see [Tyrrell, 1993] for a survey, and for many difficulties in translating the conceptual models into computational ones). The action selection models that do lend themselves to algorithmic implementation (e.g. see [Brooks, 1991, Rosenblatt, 1995, Blumberg, 1994, Sahota, 1994, Aylett, 1995]) then typically require a considerable design effort. In the literature, one sees formulas taking weighted sums of various quantities in an attempt to estimate the utility of actions. There is much hand-coding and tuning of parameters (e.g. see [Tyrrell, 1993, §9]) until the designer is satisfied that the formulas deliver utility estimates that are fair.

In fact, there may be a way that these utility values can come for free. Learning methods that automatically assign values to actions are common in the field of Reinforcement Learning (RL), or learning from rewards. Reinforcement Learning propagates numeric rewards into behavior patterns. The rewards may be objective, external value judgements, or just internally generated numbers. This thesis compares a number of different methods of further propagating these numbers to solve the action selection problem.

The low-level problem of pursuing a single goal can be solved by straightforward RL, which assumes such a single goal. For the high-level problem of choice between conflicting goals we try various methods exploiting the low-level RL numbers. The methods range from centralised and cooperative to decentralised and selfish.

Instead of presenting yet another domain-specific program, this thesis will be searching for general principles across domains. It will be trying to categorize all the different ways in which the action selection of behaviors can be organized automatically.

# 1.1 Terminology

Terminology in this area is a problem. And one I'm not necessarily going to solve here. These are the crucial terms I use and their explanation:

agent
- the mind or part of the mind. A piece of semi-autonomous software. The usage is from Minsky's Society of Mind [Minsky, 1986], in which multiple agents may inhabit the same body. Agents have more autonomy than traditional modules or procedures. They are autonomous actors in their own right, not waiting on some higher level to call them. They are not necessarily structured in any hierarchy and are not even necessarily cooperative. In the model I introduce in §5, an agent will control the entire body on its own if allowed. It is generally frustrated by the presence of other agents. Baum also uses agent in his Economy of Mind [Baum, 1996].

Behavior (e.g. [Sahota, 1994, Mataric, 1994]) is probably the term in most common use, but is often used for modules that are not autonomous actors. For somewhat-autonomous, somewhat-competing modules within a single body, [Brooks, 1986] uses layer (though [Brooks, 1994] also uses process), [Blumberg, 1994] uses activity (as does the ethologist McFarland), [Tyrrell, 1993] uses system and subsystem (after the ethologist Baerends) and [Selfridge and Neisser, 1960] use demon. Other names, from psychology, include simpleton and homunculus (of the non-intelligent kind).

The word agent is still not altogether satisfactory since it is being claimed by other fields. Currently multi-agent learning implies multi-bodies (e.g. [Mataric, 1994, Tan, 1993, Grefenstette, 1992]). Symbolic AI agents imply communication, where agents negotiate among themselves (this is not altogether against the meaning of the term here). And in the long term, agent may mean Internet alter egos acting as an "agency" on behalf of the user.

creature
- the body. The animat [Wilson, 1990] or artificial animal. In the House Robot problem of §4, the house robot. In the Ant World problem of §7, the ant.

The usage is from Brooks [Brooks, 1991]. The creature may be inhabited by a single monolithic agent (a unitary mind) or a family of agents (a decentralised mind).

# 1.2 Notation

Throughout this thesis, the notation:

means we adjust the estimate D once in the direction of the sample d. For example, if we store D explicitly we may update:

where is the learning rate. See appendix §A to understand how sampling with a learning rate works. Alternatively, if we are using a neural network to return an estimate D, we may backpropagate the error D - d .

The notation:

means that over many such updates the estimate D converges to the expected value of the samples d.

# 1.3 A brief guide to this dissertation

This dissertation can be divided into the following parts:

• Chapters 1-13 - the Action Selection problem, introduction of Reinforcement Learning and our mathematical framework, the test problem, test of current strategies, invention and test of more decentralised strategies, followed by: Chapter 14 - summary of the four schools of decentralised Action Selection methods and their empirical results.
• Chapter 15 - survey of a wide variety of related work, translating it into our mathematical framework and comparing it to our methods, followed by: Chapter 16 - conclusions of this thesis, what we have learnt.
• Chapter 17 - survey of possible further work, followed by: Chapter 18 - how these models might scale up into a general decentralised, nested, non-hierarchical model of mind.

The reader familiar with RL may wish to skip Chapters 2 and 3.

First we introduce Reinforcement Learning.

```
```