How do we actually code the state-space?
In particular how do we encode the Q(x,a) data structure?
We want a data structure that can take as input
(i.e. be indexed by) two *vectors* x and a,
both members of finite sets,
and return a real number output Q(x,a).

The basic idea is that because x and a are members of finite sets, we can give each of them a unique ID number, and so we can give the (x,a) combination a unique ID too. Then we just have an array of some size N of real numbers, indexed by this ID.

x = (x_{1}, x_{2}, .. x_{n})

where each x_{i} takes one of some finite set of values
v_{0}, v_{1}, ... v_{m}
(in fact, the set of values, and the number of them,
could be different for each dimension i).

e.g. x = (x_{1}, .. x_{10})
where for example x_{7} takes values -100, 0, or 100.

To help us enumerate the states
so we can construct an ID number,
we simply constrain the set of values
to run from 0 to m
(we can translate to/from the real values when we use them).
So above we say x_{7} takes values 0,1,2.

Even with large finite, we may need generalisation.

To reduce the statespace size, we may need to be careful with our definition of state. Note that we can easily define too many possibilities, so that a huge chunk of the logical state space may not actually exist, e.g. all blank in chess board, or all kings. Could even have 90 percent of the lookup table unused.

Often we mightConsider how we could enumerate the finite set of states defined as follows:

x = (x_{1}, x_{2}, x_{3})

where x_{1} takes values 0,1,

x_{2} takes values 0,1,2,

and x_{3} takes values 0,1.

We can sum this up by just having a set of variables
saying how many values each dimension takes:

c_{1}=2, c_{2}=3, c_{3}=2.

We can enumerate as follows:

x = (0,0,0) id = 0 0 0 1 1 0 1 0 2 0 1 1 3 0 2 0 4 0 2 1 5 1 0 0 6 1 0 1 7 1 1 0 8 1 1 1 9 1 2 0 10 1 2 1 11Note that it is

Consider the last state (1,2,1).

The "1" means we have done the "0"'s in the first column,
i.e. we have done 1.c_{2}.c_{3},

plus the "2" means that within the "1"s
we have already done an additional 2.c_{3},

plus the last "1" means we have done an additional 1.

So id(1,2,1) = 1.c_{2}.c_{3}
+ 2.c_{3} + 1

In general, we can enumerate a state as follows:

int id ( state x )

{

return x_{1}c_{2}c_{3}+ x_{2}c_{3}+ x_{3}

}

Where
x = (x_{1}, x_{2}, .. x_{n}),

id(x) = x_{1}c_{2}..c_{n}
+ x_{2}c_{3}..c_{n}
+ ...
+ x_{(n-1)}c_{n}
+ x_{n}

The number of possible states is:

nostates = c_{1}c_{2}...c_{n}

The state IDs run from: 0 .. (nostates-1)

( 0, 0, ..., 0 )

Show using the rule above that its ID is:

0

( c

Show using the rule above that its ID is:

(c

Similar scheme. Action IDs run from: 0 .. (noactions-1)

(x,a) pairs can be enumerated:

(0,0) id = 0 + 0 (0,1) 0 + 1 ... ... (0,noactions-1) 0 + noactions-1 (1,0) 1.noactions + 0 ... ... (1,noactions-1) 1.noactions + noactions-1 (2,0) 2.noactions + 0 ... ... (nostates-1,noactions-1)The ID of an (x,a) pair is:

(id(x) * noactions) + id(a)

The number of (x,a) combinations is:

nostates*noactions

IDs run from:

0 .. (nostates*noactions)-1

So our function to access the State Space is:

real StateSpace :: at ( vector x, vector a )

{

int id = (id(x) * noactions) + id(a)

return ActualArray [ id ]

}

where ActualArray is simply an ordinary 1-dimensional array of real numbers:

real ActualArray [ nostates * noactions ]Then we have one of these data structures to hold the Q-values:

StateSpace Q

and we can just use it like:

Q.at(x,a) = ((1-ALPHA) * Q.at(x,a)) + (ALPHA * new-estimate)

In the sample code you will see that I enumerated Actions x States rather than States x Actions as above.

- This is basically a 1-1 (no collisions) hash function.
- Support for Multidimensional arrays may help, if it exists in your language.