Following is a list of typos (and questionable items) in the text: Numerical Optimization, by Jorge Nocedal, and Stephen Wright, 1999, Springer Verlag. (Thanks go to the graduate students and the auditors in the course.)

On page 16, the definition of positive definite and positive semidefinite does not include B=B^T whereas the definition in the appendix correctly does state that the matrix A is symmetric, see page 594.
On page 24, in the equation above (2.15), the Hessian should be at x_k rather than at x_k+1, i.e. that is what follows from the Taylor series. Then the approximation is done for B_k+1.
Alternatively, the Taylor series is approximated at x_k+1 in the direction (x_k-x_k+1).
Also: the third line above (2.15); it should be the Hessian, not the gradient.
Page 47, above (3.24), the gradient at x_k changed notation to g_k
In problem 3.2 on Page 62: as stated, it is unclear whether the constants c₁,c₂ are given or are to be shown to exist. In fact, there exist functions for which any choice of constants 1>c₁>c₂>0 result in no (Wolfe) step lengths. However, there also exist functions where any choice of constants 1>c₁>c₂>0 result in (Wolfe) step. ( see comments on homework solutions)
In problems 3.3 and 3.4 on Page 62, the term strongly convex quadratic is used. For a quadratic, strictly convex is equivalent. It is unsure that the terms are defined somewhere in the text. The definitions can be found e.g. at the Mathematical Programming glossary: http://carbon.cudenver.edu/~hgreenbe/glossary/S.html
Page 69: paragraphing error in the middle of the page (due likely to missing end-center in latex)
Page 78: In Theorem 4.3, the primal feasibility condition on p^* is missing.
Page 133: In problem 5.4, the assumption that the vectors p_0 to p_{k-1} are linearly independent is missing. (These vectors were conjugate directions in previous questions.)
Page 156: The comment about large-scale situations for trust-region subproblems has changed. There are now codes that will solve the large-scale case using a Lanczos approach, i.e. it is no longer necessary to sovle several linear systems per iteration, but rather employ a 'restarted' Lanczos eigenvalue routine. See e.g. the following references (and the references cited therein):
- A Survey of the Trust Region Subproblem within a Semidefinite Framework, (Master's thesis of Charles Fortin, supervisor Henry Wolkowicz)
- A Semidefinite Framework for Trust Region Subproblems with Applications to Large Scale Minimization, Franz Rendl and Henry Wolkowicz, Math Progr 1997, vol 77, Mathematical Review
- author = "N. GOULD and S. LUCIDI and M. ROMA and Ph. L. TOINT", TITLE = {Solving the trust-region subproblem using the {L}anczos method}, JOURNAL = siopt, VOLUME = {9}, number = {2}, YEAR = {1999}, PAGES = {504--525}
- author = "S.A. SANTOS and D.C. SORENSEN", title = "A new matrix-free algorithm for the large-scale trust-region subproblem", institution = "Rice University", year = "1995", type = "Technical Report", number = "TR95-20", address = "Houston, TX"}
- author = " W.W. HAGER", title = "Minimizing a quadratic over a sphere", institution = "University of Florida", year = "2000", address = "Gainsville, Fa"}
Pages 258-259: There is a problem with x^* below equation (10.17) on page 258. There should be a minus in front of the right-hand-side. The next two formulae should be updated appropriately.
Page 286, equation (11:20): The LHS is a vector while the RHS is a scalar. It appears that the norm of the LHS is intended to be <= to the RHS.
Pages 320-322: The discussion here that derives a Lagrange multiplier condition does not assume a constraint qualification. The discussion seems to imply that the results are true without a constraint qualification, but this clear fails e.g. if \nabla c_1(x)=0 but \nabla f(x) \neq 0.
Page 339: the second sentence following "INTRODUCING LAGRANGE MULTIPLIERS" on page 339 does not make appear to make sense. Possibly < 0 should read \geq 0. Note that the linearizing cone F₁ contains the tangent cone.
Page 486, exercise 16.1: This is a quadratic maximization program but the objective is strictly convex. I assume that minimization is intended.
Page 579 (A.3): the RHS is actually the span. This will differ if 0 is not in the affine hull. The true definition of affine combinations are those that are linear combinations whose coefficients add up to 1, i.e.
{z: z=ax+by, for all x,y in F, and for all a,b with a+b=1}
Page 398, the equation above equation (14.7) has s and out of order.
Page 406, extra comma in 3rd line from the bottom in the expression for RHS.
Page 424, second line after (15.2): increasing should read decreasing.
"exact penalty function" is used but not explained.
sentence starting with "In barrier methods": "but approach zero" should read "but approach infinity".
page 435, between (15.25) and (15.26): referring to the ell_2 norm is pointless as it is not used.
(15.26): A(x) refers to the Jacobian of the constraint function c(x)
last sentence: grammatically wrong: "constraints are MORE AND MORE penalized by decreasing mu"
page 437, top half: psi should be replaced twice by phi.
page 495, in Framework 17.1: The tolerance $\tau_{k+1} needs to be updated (specified).
Page 509, Theorem 17.4, iii: The Hessian \nabla^2 P(x(\mu),\mu) is positive definite, i.e. $x(\mu)$ not $x$.
Page 575, Problem 18.2: The starting point and the solution are exchanged.
Page 614, Reference [57]: It should be "G. F. Corliss", not "B. F. Corliss".