Katya Scheinberg

Wednesday, February 17, 2010, 2:00pm, WWH 1314

Derivative-free optimization is a field of nonlinear programming which deals with optimization methods that do not require explicit computations of the derivative information. In many engineering applications the objective function values are computed by a computer simulations which do not provide derivatives of this functions. It is typical that the function value computations are comparatively costly and prone to noise. In such cases a polynomial interpolation model is often used to provide local approximation of the objective function based on a few carefully selected sample points. To build a full quadratic interpolation (or regression) model one needs to have a set of sample points whose cardinality is quadratic in the dimension of the parameter space. If the dimension of the parameter space is large it may be too costly to apply optimization on the whole space. Typically statistical methods are used before the optimization stage to select a smaller dimensional subspace where optimization is likely to be most beneficial. Thus, the a priori analysis selects the most important features for optimization. Alternatively incomplete quadratic models are built where the Frobenius norm of the Hessian (or its change) is regularized to keep the models "well behaved". This has been a successful approach in practice, yet such models only guarantee linear approximation. On the other hand, when the quadratic models are built each coefficient of the model (element of the Hessian) is considered equally important, which means that all the interaction between variables in are considered. It is well known, however, that in many practical instances the pairwise interactions between variables is negligible for many pairs. Hence the quadratic model approximating the true function is likely to be sparse. From the sparse solution recovery theory developed recently in the field of compressed sensing (CS) we can expect that a sparse quadratic model can be recovered using much fewer points than the square of the dimension. This recovery is achieved by minimizing the $l_1$-norm of the vector of the model coefficients. We will discuss how we hope to combine theoretical results from CS and DFO to obtain convergence theory for the methods based on $l_1$ regularized quadratic models. We will also show preliminary computation results.