Tour de Heuristics: Memory-Bounded Dynamic Programming
May-2014
Memory bounded dynamic programming is another technique offered in Cooperative Decision Making. This is the first sub-optimal heuristic that is brought up. It takes the same techniques as seen before with an exhaustive backup, but at each stage, only a specific number of trees remain at the end of these operations. Due to this, the entire algorithm is bounded by the selected memory boundary. It can run efficiently, but not optimally.
It combines exhaustive backup for actual tree generation, but, like A*, it uses a heuristic to estimate potential tree value. It then trims the number of trees until only the top N trees remain.
The Algorithm
def memory_bound_dynamic_programming(max_trees=100, max_depth, heuristic): depth = 0 policy[0] = [] while depth < max_depth: policy[depth + 1] = exhaustive_backup(policy[depth]) compute_expected_values(policy[depth + 1]) tweaked_policy[depth + 1] = [] for k in range(0,max_trees): belief[k] = generate_belief(heuristic, max_trees - depth - 1) tweaked_policy[depth + 1].push(get_best_tree(policy[depth + 1]) depth += 1 policy[depth] = tweaked_policy[depth] return policy[depth] |
How bad is a heuristic, really?
IF your sensors and location data get you “almost there” (i.e. MDP vs Dec-POMDP), an MDP-like heuristic is probably going to be pretty close to what you could expect. If your sensor information really is quite terrible and your transitions are so bad that you really do need to know precisely where you are to have any confidence in this, then you probably won’t get a particularly optimal answer. IF, however, you are relatively able to determine where you are in whatever graph you are traversing, an MDP-based estimator is going to be fine.