Object = state x.
e.g. x = state of robot and its environment.
Features describing the object = Values of each dimension defining the point
x = (x_{1},..,x_{n}).
e.g. x_{i} = infra-red sensor reading to front of robot.
Feature space = State space. May be multi-dimensional, where each dimension takes continuous values. x is point in this space.
Classes to assign object into = Action a to take when state x is seen.
e.g. Move left, move right, stop.
Note classes normally small finite set.
Actions often small, finite (discrete),
but may be continuous, infinite set
(e.g. real number output, move at angle).
Agent or actor (robot, program) learns to map x to a.
Does each x map to unique or multiple a's?
Can multiple x map to same a?
Is whole space covered? Does each x map to some a? Can we return an a for a new x, never seen before?
Does each action a map to some state x?
From the start we will allow our world
to be probabilistic rather than necessarily
deterministic.
i.e. In state x, you take action a.
Sometimes this leads to state y.
Sometimes it leads to state z.
Our action taker, instead of linking each x
to a single a,
may say instead something like:
"In state x,
take action a with probability 0.9,
action b with probability 0.1."
Consider:
Monkey controls a robotic arm with its brainwaves.