# Machine Learning Seminar

## Fall 2014

• Mehryar Mohri - NYU Courant
• Date: 09/16/14
• Title: Deep Boosting
• Abstract:
We present a new ensemble learning algorith, DeepBoost, which can use as base classifiers hypothesis set containing deep decision trees, or members of other rich complex families, and succeed in achieving high accuracy without overfitting the data. The key to the success of the algorithm is a capacity-concious criterion for the selection of the hypotheses. We give new data-dependent learning bounds for convex ensembles expressed in terms of the Rademacher complexities of the sub-families composing the base classifier set, and the mixture weight assigned to each sub-family. Our algorithm directly benefits from these guarantees since it seeks to minimize the corresponding learning bound. We give a full description of our algorithm, including the details of its derivation, and report the results of several experiments showing that its performance compares favorably to that of AdaBoost and Logistic Regression and their L_1-regularized variants.
• Paper
• Levent Sagun -NYU Courant
• Date: 09/23/14
• Title: TBD
• Abstract:
We explore basic properties of the loss functions that arise in deep learning algorithms. Then we take a look at spin glass models for which we have great deal of information regarding their landscape. With experimental support we conjecture what we should expect for loss surfaces
• Aryeh Kontorovich
• Date: 09/30/2014 - Change of time 4:00 pm
• Title: Good Margins Make Good Neighbors
• Abstract
Although well-known by practitioners to be an effective classification tool, nearest-neighbor methods have been somewhat neglected by learning theory of late. The goal of this talk is to revive interest in this time-tested technique by recasting it in a modern perspective. We will present a paradigm of margin-regularized 1-nearest neighborclassification which: (i) is Bayes-consistent (ii) yields simple, usable finite-sample error bounds (iii) provides for very efficient algorithms with a principled speed-accuracy tradeoff (iv) allows for near-optimal sample compression. Further extensions include multiclass, regression, and metric dimensionality reduction. I will argue that the regularized 1-nearest neighbor is superior to k-nearest neighbors in several crucial statistical and computational aspects.
• Felix Yu Columbia University
• Date: 10/07/14
• Title: Learning from Label Proportions: Algorithm, Theory and Applications.
• Abstract
Learning from Label Proportions (LLP) is a learning setting, where the training data is provided in groups, and only the proportion of each class in each group is known. The task is to learn a model to predict the class labels of the individual instances. LLP has broad applications in political science, marketing, healthcare, and computer vision. To solve LLP, we propose the pSVM algorithm, which jointly optimizes the latent instance labels and the classification model in a large-margin framework. Unlike the existing works, the approach avoids making restrictive assumptions on the data, leading to state-of-the-art results. To further understand when and why LLP is possible, we provide analysis under an intuitive framework called Empirical Proportion Risk Minimization (EPRM) which learns an instance label classifier to match the given label proportions on the training data. The study sheds lights on privacy protection when releasing group statistics. We have applied the developed tools to solving various challenging problems in computer vision including attribute modeling, pinpointing discriminative video shots for video events, and discovering discriminative image patches in visual recognition. If time permits, I will also introduce my other research efforts, including using circulant structure to speed up binary embedding/deep neural networks/kernel approximation, and learning meaningful attribute representations for visual recognition and retrieval.
• Fall recess
• Date: 10/14/14
• Andres Munoz Medina - NYU Courant
• Date: 10/21/14 and 10/28/14
• Title: An introduction to Bandits
• Abstract
I will present a tutorial on bandit algorithms. First, I will discuss the solutions for the original k-bandit problem. On the second session I will talk about contextual bandits and their connections with supervised learning.
• Krzysztof Choromanski - Google Research
• Date: 04/11/14
• Tilte: Learning how to rank from heavily perturbed statistics - digraph clustering approach
• Abstract
Ranking is one of the most fundamental problems in machine learning with applications in many branches of computer science such as: information retrieval systems, recommendation systems, machine translation and computational biology. Ranking objects based on possibly conflicting preferences is a central problem in voting research and social choice theory. In this talk we present a new simple combinatorial ranking algorithm adapted to the preference-­based setting. We apply this new algorithm to the well­-known scenario where the edges of the preference tournament are determined by the majority­voting model. It outperforms existing methods when it cannot be assumed that there exists global ranking of good enough quality and applies combinatorial techniques that haven't been used in the ranking context before. We conduct several experiments showing the superiority of the new algorithm over existing methods, also these that were designed to handle heavily perturbed statistics. By combining our techniques with those presented by Ailon and Mohri, we obtain purely combinatorial algorithm that answers correctly most of the queries in the heterogeneous scenario, where the preference tournament is only locally of good quality but is not necessarily pseudo-transitive. As a byproduct of our methods, we obtain the algorithm solving clustering problem for the directed planted partition model. To the best of our knowledge it is the first purely combinatorial algorithm tackling this problem.
• Vitaly Kuznetsov - NYU Courant
• Giulia DeSalvo - NYU Courant
• Scott Yang -NYU Courant
• Date: 25/11/14
• Title: Randomized Features for Nonlinear Learning
• Abstract
Linear methods are ubiquitous in machine learning, and by using PDS kernels, their nonlinear extensions are often theoretically straightforward to construct. However, direct usage of kernel methods relies on memorizing the kernel matrix, which scales very poorly with the sample size. In this talk, we will survey some techniques to approximate kernel functions using randomness. In particular, we will show how one can exploit randomization to approximate kernel SVMs and kernel PCA in linear time (with respect to the sample size) as well as delve into the theory for why these methods actually work.
• Robert Almgren -NYU Courant
• Date:16/12/14
• Title: Quantitative Techniques for Interest Rate Products.
• Abstract
I will talk about the quantitative techniques that we use to develop agency execution algorithms for interest rates products, including futures and also cash US Treasuries. I will discuss the special features of these markets, such as pro rata matching, information events, and interrelationships. A special emphasis will be use of a market simulator to develop and test algorithm functionality and short-term pricing signals.

## Spring 2014

• Andres Munoz Medina - NYU Courant.
• Date: 02/04/14. Postponed: 02/11/14
• Title: Learning Theory and Algorithms for Revenue Optimization in Second-Price Auctions with reserve.
• Abstract:
Second-price auctions with reserve play a critical role in the revenue of modern search engine and popular online sites since the revenue of these companies often directly depends on the outcome of such auctions. The choice of the reserve price is the main mechanism through which the auction revenue can be influenced in these electronic markets. We cast the problem of selecting the reserve price to optimize revenue as a learning problem and present a full theoretical analysis dealing with the complex properties of the corresponding loss function. We further give novel algorithms for solving this problem and report the results of several experiments demonstrating their effectiveness.
• Slides
• Liang Huang - CUNY
• Date: 02/18/14
• Title: Latent Variable Max-Violation Perceptron for Scalable Training in Machine Translation.
• Abstract:
While large-scale discriminative training has triumphed in many NLP problems, its definite success on machine translation has been largely elusive. Most recent efforts along this line are not scalable (training on the small dev set with features from top ∼100 most frequent words) and overly complicated. We instead present a very simple yet theoretically motivated approach by extending the recent framework of “violation-fixing perceptron” to handle structured latent variables which correspond to MT derivations. Extensive phrase-based translation experiments on both Chinese-to-English and Spanish-to-English tasks show substantial gains in BLEU by up to +2.3/+2.0 on dev/test over MERT, thanks to 20M+ sparse features. This is the first successful effort of large-scale online discriminative training for MT.
• Yoni Halpern - NYU.
• Date: 02/25/14
• Title:TBD.
• Date: 03/04/14.
• Title: Learning binary representations for fast similarity search in massive databases.
• Abstract:
Binary coding based Approximate Nearest Neighbor (ANN) search in huge databases has attracted much attention recently due to its fast query time and drastically reduced storage needs. There are several challenges in developing a good ANN search system. A fundamental question that comes up often is: how difficult is ANN search in a given dataset? In other words, which data properties affect the quality of ANN search and how? Moreover, for different application scenarios, different types of learning methods are appropriate. In this talk, I will discuss what makes ANN search difficult, and a variety of binary coding techniques for non-negative data, data that lives on a manifold, and matrix data.
• Maria Florina Balcan
• Date: 03/11/14.
• Title: Foundations for Learning in the Age of Big Data.
• Abstract:
With the variety of applications of machine learning across science, engineering, and computing in the age of Big Data, re-examining the underlying foundations of the field has become imperative. In this talk, I will describe new models and algorithms for important emerging paradigms, specifically, interactive learning and distributed learning. For active learning, where the algorithm itself can ask for labels of carefully chosen examples from a large pool of unannotated data with the goal of minimizing human labeling effort, I will present results giving computationally efficient, optimal label complexity algorithms. I will also discuss learning with more general forms of interaction, as well as unexpected implications of these results for classic supervised learning paradigms. For distributed learning, I will discuss a model that for the first time addresses the core question of what are the fundamental communication requirements for achieving accurate learning. Broadly, we consider a framework where massive amounts of data is distributed among several locations, and our goal is to learn a low-error predictor with respect to the overall distribution of data using as little communication, and as few rounds of interaction, as possible. We provide general upper and lower bounds on the amount of communication needed to learn a given class, as well as broadly-applicable techniques for achieving communication-efficient learning.
• Slides
• Spring Recess: No scheduled talk
• Date: 03/18/14.
• Vitaly Kuznestov - NYU Courant.
• Date: 03/25/14.
• Title: TBD.
• Rachel Hodos - NYU Courant.
• Date: 04/15/14.
• Title: Metric Learning for Drug Repurposing.
• Abstract:
Sometimes standard distance (or similarity) metrics to compare data points are not well suited for a particular application. For example, while standard Euclidean distance treats each feature on equal footing, certain features may have more importance than others, e.g. in image classification. Metric learning is a field of machine learning that is aimed at learning more appropriate metrics for a given domain and task. I will give an introduction and review of metric learning, primarily based on the attached article by Kulis. I will focus on global, linear metric learning but will also discuss extensions to nonlinear and local metric learning techniques. I will cover mathematical formulations, as well as specific algorithms and applications.
I'll describe a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes an action in response to the observed context, observing the reward only for that action. The method assumes access to an oracle for solving cost-sensitive classification problems and achieves the statistically optimal regret guarantee with only $\tilde{O}(\sqrt{T})$ oracle calls across all $T$ rounds. Joint work with Alekh Agarwal, Satyen Kale, John Langford, Lihong Li, and Rob Schapire.