146x Filetype PDF File size 1.00 MB Source: editorialexpress.com
Numerical Dynamic Programmingin Economics John Rust Yale University Contents 1 1. Introduction 2. Markov Decision Processes (MDP’s) and the Theory of Dynamic Programming 2.1Definitions of MDP’s, DDP’s, and CDP’s 2.2Bellman’s Equation, Contraction Mappings, and Blackwell’s Theorem 2.3Error Bounds for Approximate Fixed Points of Approximate Bellman Operators 2.4AGeometric Series Representation for MDP’s 2.5Examples of Analytic Solutions to Bellman’s Equation for Specific “Test Problems” 2.6Euler Equations and Euler Operators 3. Computational Complexity and Optimal Algorithms 3.1Discrete Computational Complexity 3.2Continuous Computational Complexity 3.3Computational Complexity of the Approximation Problem 4. Numerical Methods for Contraction Fixed Points 5. Numerical Methods for MDP’s 5.1Discrete Finite Horizon MDP’s 5.2Discrete Infinite Horizon MDP’s 5.2.1Successive Approximation, Policy Iteration, and Related Methods 5.2.2Numerical Comparison of Methods in Auto Replacement Problem 5.2.3New Approaches to Solving Large Scale Problems 5.3Continuous Finite Horizon MDP’s 5.3.1Discrete Approximation Methods 5.3.2Smooth Approximation Methods 5.4Continuous Infinite Horizon MDP’s 5.4.1Discrete Approximation Methods 5.4.2Smooth Approximation Methods 6. Conclusions 1 RevisedNovember1994draftforHandbookofComputationalEconomics,H.Amman,D.KendrickandJ.Rust,(eds.). I am grateful for helpful comments by Hans Amman, Ken Judd, David Kendrick, Martin Puterman, and two not very anonymousreferees, Charles Tapiero and John Tsitsiklis. 7. References 2 1. Introduction This chapter surveys numerical methods for solving dynamic programming (DP) problems. The DP framework has been extensively used in economic modeling because it is sufficiently rich to model almost any problem involving sequential decision making over time and under uncertainty.2 By a simple re-definition of variables virtually any DP problem can be formulated as aMarkovdecisionprocess(MDP)inwhichadecisionmakerwhoisinstatest attimet = 1,...,T takes an action at that determines current utility u(st,at) and affects the distribution of next period’s state st+1 via a Markov transition probability p(st+1|st,at). The problem is to determine an optimal decision rule α that solves V (s) ≡ maxα EαnPT βtu(st,at)|s0 = so where Eα t=0 denotes expectation with respect to the controlled stochastic process {st,at} induced by the decision rule α ≡ {α1,...,αT}, and β ∈ (0,1) denotes the discount factor. What makes these problemsespecially difficult is that instead of optimizing over an ex ante fixed sequence of actions {a0,...,aT} one needs to optimize over sequences of functions {α0,...,αT} that allow the ex post decision at to vary as a best response to the current state of the process, i.e. at = αt(st). The method of dynamic programming provides a constructive, recursive procedure for computing α usingthevaluefunctionV asa“shadowprice”todecentralizeacomplicatedstochastic/multiperiod optimization problem into a sequence of simpler deterministic/static optimization problems. 3 In infinite horizon problems V is the unique solution to a Bellman’s equation, V = Γ(V ), where the Bellman operator Γ is defined by: Γ(V)(s) = max [u(s,a)+βZ V(s′)p(ds′|s,a)], (1.1) a∈A(s) and the optimal decision rule can be recovered from V by finding a value α(s) ∈ A(s) that attains themaximumin(1.1)foreachs ∈ S. WereviewthemaintheoreticalresultsaboutMDP’sinsection 2, providing an intuitive derivation Bellman’s equation in the infinite horizon case and showing that the value function Vα corresponding to a policy α is simply a multidimensional generalization of a geometric series. This implies that Vα can be computed as the solution to a system of linear equations, the key step in the Bellman 1955, 1957 and Howard 1960 policy iteration algorithm. The Bellman operator has a particularly nice mathematical property: Γ is a contraction mapping. 2 See Stokey and Lucas 1987 for examples of DP models in economic theory. See Rust 1994a, 1994b for examples of of DP modelsin econometrics. 3 In finite horizon problems V actually denotes an entire sequence of value functions, V ≡ {VT,...,V T}, just as α 0 T denotes a sequence of decision rules. In the infinite-horizon case, the solution (V,α) reduces to a pair of functions of the current state s. 3 Alargenumberofnumericalsolutionmethodsexploitthecontractionpropertytoyieldconvergent, numerically stable methodsfor computingapproximate solutionsto Bellman’sequation, including the classic method of successive approximations. Since V can be recovered from α and vice versa, the rest of this chapter focuses on methods for computing the value function V , with separate discussions of numerical problems involved in approximating α where appropriate.4 Fromthestandpointofcomputation,thereisanimportantdistinctionbetweendiscreteMDP’s whosestateandcontrolvariablescanassumeonlyafinitenumberofpossiblevalues,andcontinuous MDP’s whose state and control variables can assume a continuum of possible values. The value |S| functions for discrete MDP’s live in a subset B of the finite-dimensional Euclidean space R (where |S| is the number of elements in S), whereas the value functions of continuous MDP’s live in a subset B of the infinite-dimensional Banach space B(S) of bounded, measurable real-valued functions on S. Thus, discrete MDP problems can be solved exactly (modulo rounding error in arithmetic operations), whereas the the solutions to continuous MDP’s can generally only be approximated. Approximate solution methods may also be attractive for solving discrete MDP’s with a large number of possible states |S| or actions |A|. TherearetwobasicstrategiesforapproximatingthesolutiontoacontinuousMDP:1)discrete approximation and 2) smooth approximation. Discrete approximation methods solve a finite state MDPproblemthatapproximatestheoriginalcontinuousMDPproblemoverafinitegridofpointsin the state space S and action space A. Since the methodsfor solvingdiscrete MDP’shave been well developed and exposited in the operations research literature (e.g. Bertsekas, 1987, Porteus, 1980 andPuterman,1990,1994),thischapterwillonlybrieflyreviewthestandardapproaches,focusing instead on recent research on discretization methods for solving continuous MDP’s. Although smooth approximation methods also have a long history (dating back to Bellman, 1957 Bellman and Dreyfus, 1962 and Bellman, Kalaba and Kotkin 1963), there has been a recent resurgence of interest in these methods as a potentially more efficient alternative to discretization methods (Christiano and Fisher, 1994, Johnson et. al. 1993, Judd and Solnick 1994, Marcet and Marshall, 1994, and Smith 1991). Smooth approximation methods treat the value function V and/or the decision rule α as smooth, flexible functions of s and a finite-dimensional parameter vector θ: examples include linear interpolation and a variety of more sophisticated approximation methods 4 Special care must be taken in problems with a continuum of actions, since discontinuities or kinks in numerical estimates of the conditional expectation of V can create problems for numerical optimization algorithms used to recover α. Also, certain solution methods focus on computing α directly rather than indirectly by first computing V . Wewillprovidespecial discussions of these methods in section 5.
no reviews yet
Please Login to review.