Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

Search:

CA216      CA249      CA318

CA400      CA651      CA668


Help on displaying equations


Mark Humphrys - Research - PhD - Appendix A - Appendix B



B Bounds

B.1 Bounds with a learning rate α

Let D be updated by:

displaymath6289

where d is bounded by tex2html_wrap_inline9310 , tex2html_wrap_inline9312 , and the initial value of tex2html_wrap_inline6704 . Then:

theorem3220

Proof: The highest D can be is if it is always updated with tex2html_wrap_inline9310 :

displaymath9303

so tex2html_wrap_inline9322 . Similarly tex2html_wrap_inline9324. tex2html_wrap_inline7352

I should note this only works if α is between 0 and 1.



B.2 Bounds of Q-values

theorem3241

Proof: In the discrete case, Q is updated by:

displaymath6516

so by Theorem B.1:

displaymath9330

This can also be viewed in terms of temporal discounting:

displaymath9331

Similarly:

displaymath9332

tex2html_wrap_inline7352

For example, if tex2html_wrap_inline6480 , then tex2html_wrap_inline9342 . And (assuming tex2html_wrap_inline9344 ) as tex2html_wrap_inline9346 , tex2html_wrap_inline9348 .

Note that since tex2html_wrap_inline6426 , it follows that tex2html_wrap_inline9352 .




B.3 Bounds of W-values

theorem3309

Proof: In the discrete case, W is updated by:

displaymath9355

so by Theorem B.1:

displaymath9356

by Theorem B.2.

Similarly:

displaymath9357

tex2html_wrap_inline7352

Note that since tex2html_wrap_inline9352 , it follows that tex2html_wrap_inline7664 .



Appendix C

Return to Contents page.



Feeds      w2mind.org

On Internet since 1987.