Tour de Heuristics: Joint Equilibrium Search (JESP)
May-2014
The final stop on this heuristics tour, and the last stop for our overview of Cooperative Decision Making is Joint Equilibrium Search. This technique starts with some pre-set horizon T policies for each agent, and then cycles through each agent so that it may tweak its behaviors to maximize the response with all other policies held fixed. It continues this cycle until all trees have stabilized.
The Algorithm
def joint_equilibrium_search(policy): curr_policy = policy do: prev_policy = curr_policy for agent in get_all_agents() : calculate_expected_values(curr_policy) curr_policy[agent] = get_best_response(curr_policy, agent) while prev_policy != curr_policy return curr_policy |
Well, That’s Nifty
It seems like a slam dunk, but there are a few gotchas. From an optimality standpoint, it is only locally optimal. In order to try and mitigate this, there needs to be some sort of algorithm before this that helps select somewhat optimal trees before coming in.