|
Friday, February 1, 2013 |
|
|
|
|
The performance of the simplex method on deterministic Markov decision processes |
|
|
Motivated by the search for strongly-polynomial algorithms for linear programming
(an algorithm whose runtime depends only on the number of constraints and
variables, not the sizes of the numbers involved), we analyze the performance of
the simplex method on Markov decision processes (MDPs), a class of LPs that is
"close" to being strongly-polynomial but beyond the reach of existing
techniques. We prove that for deterministic MDPs the simplex method with the
highest gain/most-negative-reduced cost pivoting rule converges in
strongly-polynomial time, regardless of the discount factor of the MDP and even
when each action has a distinct discount. Previously the simplex method was known
to run in polynomial time only for discounted MDPs where the discount was bounded
away from 1. Unlike in the discounted case the algorithm does not greedily
converge to the optimum, and we require a more complex measure of progress. We
identify a set of layers in which the values of variables must lie and show that
the simplex method always makes progress optimizing the variables in one
layer. |