Skip to content

Peer Review - Thiemen Mussche #4

@ThiemenMus

Description

@ThiemenMus

I love the snow!

For the T-update: new Q(S, A) = Q(S,A) + R(S,A) + Max Q'(S',A') - Q(S,A).
There is a term Q(S, A) and -Q(S, A), don't they cancel eachother out? Why does this work?

It isn't clear to me how a Q-table should look like and why it works, maybe you could add an example?
Same for the alpha parameter in the TD-update rule, not sure what it is supposed to do.

Why do we assign '999' to go from L6 to L6 and not when going from any neighbouring state to L6?

For the implementation, you could enter the bits of code you explain as strings so it doesn't throw errors.

"Since we do not know the exact number of iterations the robot will take in order to find out the optimal route, we will simply loop the next set of processes until the next location is not equal to the ending location." In the code you use a while loop that runs until the next location is the end location, contradicting your previous statement.

Good luck!
Thiemen

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions