Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact


CA216      CA249      CA318

CA400      CA651      CA668

Learning rate that does not start at 1

Recall convergence conditions.

The typical tex2html_wrap_inline6315 goes from 1 down to 0, but note that if the conditions hold, then for any t, tex2html_wrap_inline6803 and tex2html_wrap_inline6805 , so tex2html_wrap_inline6315 may start anywhere along the sequence. That is, tex2html_wrap_inline6315 may take successive values tex2html_wrap_inline6811

Q-learning will forget bad samples at the start

To "forget" old stuff, you could reset α = 1.
But in fact you don't have to:
α may start anywhere along the sequence and conditions for convergence satisfied. So can just keep learning and old stuff is eventually wiped.

e.g. Say world changes from MDP1 to MDP2 after time t. Just keep going with Q-learning and will learn optimal policy for MDP2 (eventually) and will forget what it learnt for MDP1 (eventually). No need to change anything.

Q-learning automatically adapts if world/problem changes.

Starting α at 1/t

Recall our running average.

Let tex2html_wrap_inline9198 be samples of a stationary random variable d with expected value E(d). Repeat:



Proof: D's updates go:


As tex2html_wrap_inline9226 :


that is, tex2html_wrap_inline9214. tex2html_wrap_inline7352

One way of looking at this is to consider tex2html_wrap_inline9232 as the average of all samples before time t, samples which are now irrelevant for some reason. We can consider them as samples from a different distribution f:




as tex2html_wrap_inline9226 .

1/n ( dt + ... + dn ) = (n-t+1)/n   1/(n-t+1) ( dt + ... + dn )
->   1 . E(d)

Initial bias

If start at:   α = 1/t   then initial Q-values bias our Q-values for some time.
And since we only run for finite time in any finite experiment, the bias may still be there after learning.

Consider being "born" with Q-values already filled in (i.e. in DNA) and then start learning:

Not-quite Lamarckism in nature

Feeds      w2mind.org

On Internet since 1987.