Harmonic Analysis and Signal Processing Seminar
Efficiently
recovering second-order models in derivative-free optimization
Katya Scheinberg
Columbia University and CIMS
Wednesday, February 17, 2010, 2:00pm, WWH 1314
Abstract
Derivative-free optimization is a field of nonlinear programming which
deals with optimization methods that do not require explicit
computations of the derivative information. In many engineering
applications the objective function values are computed by a
computer simulations which do not provide derivatives of this
functions. It is typical that the function value computations are
comparatively costly and prone to noise. In such cases a polynomial
interpolation model is often used to provide local approximation of the
objective function based on a few carefully selected sample points. To
build a full quadratic interpolation (or regression) model one needs to
have a set of sample points whose cardinality is quadratic in the
dimension of the parameter space. If the dimension of the parameter
space is large it may be too costly to apply optimization on the
whole space. Typically statistical methods are used before the optimization stage to
select a smaller dimensional subspace where optimization is likely to
be most beneficial. Thus, the a priori analysis selects the most
important features for optimization. Alternatively incomplete quadratic
models are built where the Frobenius norm of the Hessian (or its
change) is regularized to keep the models "well behaved". This has been
a successful approach in practice, yet such models only guarantee
linear approximation. On the other hand, when the quadratic models are
built each coefficient of the model (element of the Hessian) is
considered equally important, which means that all the
interaction between variables in are considered. It is well
known, however, that in many practical instances the pairwise
interactions between variables is negligible for many
pairs. Hence the quadratic model approximating
the true function is likely to be sparse. From the sparse
solution recovery theory developed recently in the field of compressed
sensing (CS) we can expect that a sparse quadratic model can be
recovered using much fewer points than the square of the
dimension. This recovery is achieved by minimizing the $l_1$-norm
of the vector of the model coefficients. We will discuss
how we hope to combine theoretical results from CS and DFO to obtain
convergence theory for the methods based on $l_1$ regularized
quadratic models. We will also show preliminary computation results.