Following is a list of typos (and questionable items) in the text: Numerical Optimization, by Jorge Nocedal, and Stephen Wright, 1999, Springer Verlag. (Thanks go to the graduate students and the auditors in the course.)
  1. On page 16, the definition of positive definite and positive semidefinite does not include B=BT whereas the definition in the appendix correctly does state that the matrix A is symmetric, see page 594.
  2. On page 24, in the equation above (2.15), the Hessian should be at xk rather than at xk+1, i.e. that is what follows from the Taylor series. Then the approximation is done for Bk+1.
    Alternatively, the Taylor series is approximated at xk+1 in the direction (xk-xk+1).
    Also: the third line above (2.15); it should be the Hessian, not the gradient.
  3. Page 47, above (3.24), the gradient at xk changed notation to gk
  4. In problem 3.2 on Page 62: as stated, it is unclear whether the constants c1,c2 are given or are to be shown to exist. In fact, there exist functions for which any choice of constants 1>c1>c2>0 result in no (Wolfe) step lengths. However, there also exist functions where any choice of constants 1>c1>c2>0 result in (Wolfe) step. ( see comments on homework solutions)
  5. In problems 3.3 and 3.4 on Page 62, the term strongly convex quadratic is used. For a quadratic, strictly convex is equivalent. It is unsure that the terms are defined somewhere in the text. The definitions can be found e.g. at the Mathematical Programming glossary: http://carbon.cudenver.edu/~hgreenbe/glossary/S.html
  6. Page 69: paragraphing error in the middle of the page (due likely to missing end-center in latex)
  7. Page 78: In Theorem 4.3, the primal feasibility condition on p* is missing.
  8. Page 133: In problem 5.4, the assumption that the vectors p_0 to p_{k-1} are linearly independent is missing. (These vectors were conjugate directions in previous questions.)
  9. Page 156: The comment about large-scale situations for trust-region subproblems has changed. There are now codes that will solve the large-scale case using a Lanczos approach, i.e. it is no longer necessary to sovle several linear systems per iteration, but rather employ a 'restarted' Lanczos eigenvalue routine. See e.g. the following references (and the references cited therein):
  10. Pages 258-259: There is a problem with x* below equation (10.17) on page 258. There should be a minus in front of the right-hand-side. The next two formulae should be updated appropriately.
  11. Page 286, equation (11:20): The LHS is a vector while the RHS is a scalar. It appears that the norm of the LHS is intended to be <= to the RHS.
  12. Pages 320-322: The discussion here that derives a Lagrange multiplier condition does not assume a constraint qualification. The discussion seems to imply that the results are true without a constraint qualification, but this clear fails e.g. if \nabla c_1(x)=0 but \nabla f(x) \neq 0.
  13. Page 339: the second sentence following "INTRODUCING LAGRANGE MULTIPLIERS" on page 339 does not make appear to make sense. Possibly < 0 should read \geq 0. Note that the linearizing cone F1 contains the tangent cone.
  14. Page 486, exercise 16.1: This is a quadratic maximization program but the objective is strictly convex. I assume that minimization is intended.
  15. Page 579 (A.3): the RHS is actually the span. This will differ if 0 is not in the affine hull. The true definition of affine combinations are those that are linear combinations whose coefficients add up to 1, i.e.
    {z: z=ax+by, for all x,y in F, and for all a,b with a+b=1}
  16. Page 398, the equation above equation (14.7) has delta s and delta lambda out of order.
  17. Page 406, extra comma in 3rd line from the bottom in the expression for RHS.
  18. Page 424, second line after (15.2): increasing should read decreasing.
    "exact penalty function" is used but not explained.
    sentence starting with "In barrier methods": "but approach zero" should read "but approach infinity".
  19. page 435, between (15.25) and (15.26): referring to the ell_2 norm is pointless as it is not used.
    (15.26): A(x) refers to the Jacobian of the constraint function c(x)
    last sentence: grammatically wrong: "constraints are MORE AND MORE penalized by decreasing mu"
  20. page 437, top half: psi should be replaced twice by phi.
  21. page 495, in Framework 17.1: The tolerance $\tau_{k+1} needs to be updated (specified).
  22. Page 509, Theorem 17.4, iii: The Hessian \nabla^2 P(x(\mu),\mu) is positive definite, i.e. $x(\mu)$ not $x$.
  23. Page 575, Problem 18.2: The starting point and the solution are exchanged.
  24. Page 614, Reference [57]: It should be "G. F. Corliss", not "B. F. Corliss".