We consider the multi-armed bandit problem. We show that when the state space is finite the computation of the dynamic allocation indices can be handled by linear programming methods.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results