#### Non-Parametric Transformation Networks

ConvNets have been very effective in many applications where it is required to learn invariances to within-class nuisance transformations. However, through their architecture, ConvNets only enforce invariance to translation. In this paper, we introduce a new class of convolutional architectures cal…

#### Gradient descent GAN optimization is locally stable

game Generative Adversarial Network gradient

Despite the growing prominence of generative adversarial networks (GANs), optimization in GANs is still a poorly understood topic. In this paper, we analyze the “gradient descent” form of GAN optimization i.e., the natural setting where we simultaneously take small gradient steps in bot…

#### On the convergence properties of GAN training

Generative Adversarial Network gradient

Recent work has shown local convergence of GAN training for absolutely continuous data and generator distributions. In this note we show that the requirement of absolute continuity is necessary: we describe a simple yet prototypical counterexample showing that in the more realistic case of distribu…

#### Generalization Error Bounds for Noisy, Iterative Algorithms

In statistical learning theory, generalization error is used to quantify the degree to which a supervised machine learning algorithm may overfit to training data. Recent work [Xu and Raginsky (2017)] has established a bound on the generalization error of empirical risk minimization based on the mut…

#### Demystifying MMD GANs

Generative Adversarial Network gradient

We investigate the training and performance of generative adversarial networks using the Maximum Mean Discrepancy (MMD) as critic, termed MMD GANs. As our main theoretical contribution, we clarify the situation with bias in GAN loss functions raised by recent work: we show that gradient estimators …

#### GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium

CIFAR Generative Adversarial Network gradient image

Generative Adversarial Networks (GANs) excel at creating realistic images with complex models for which maximum likelihood is infeasible. However, the convergence of GAN training has still not been proved. We propose a two time-scale update rule (TTUR) for training GANs with stochastic gradient des…

#### Weighted Contrastive Divergence

Learning algorithms for energy based Boltzmann architectures that rely on gradient descent are in general computationally prohibitive, typically due to the exponential number of terms involved in computing the partition function. In this way one has to resort to approximation schemes for the evalua…

#### Overcoming catastrophic forgetting with hard attention to the task

Catastrophic forgetting occurs when a neural network loses the information learned with the first task, after training on a second task. This problem remains a hurdle for general artificial intelligence systems with sequential learning capabilities. In this paper, we propose a task-based hard atten…

#### Predicting Shot Making in Basketball Learnt from Adversarial Multiagent Trajectories

Convolutional Neural Network DNN gradient image

In this paper, we predict the likelihood of a player making a shot in basketball from multiagent trajectories. Previous approaches to similar problems center on hand-crafting features to capture domain specific knowledge. Although intuitive, recent work in deep learning has shown this approach is p…

#### Gradient Regularization Improves Accuracy of Discriminative Models

Regularizing the gradient norm of the output of a neural network with respect to its inputs is a powerful technique, first proposed by Drucker & LeCun (1991) who named it Double Backpropagation. The idea has been independently rediscovered several times since then, most often with the goal of m…

#### Momentum and Stochastic Momentum for Stochastic Gradient, Newton, Proximal Point and Subspace Descent Methods

In this paper we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of s…

#### Exploring the Space of Black-box Attacks on Deep Neural Networks

Existing black-box attacks on deep neural networks (DNNs) so far have largely focused on transferability, where an adversarial instance generated for a locally trained model can “transfer” to attack other learning models. In this paper, we propose novel Gradient Estimation black-box att…

#### IHT dies hard: Provable accelerated Iterative Hard Thresholding

We study –both in theory and practice– the use of momentum motions in classic iterative hard thresholding (IHT) methods. By simply modifying plain IHT, we investigate its convergence behavior on convex optimization criteria with non-convex constraints, under standard assumptions. In div…

#### Entropy-SGD optimizes the prior of a PAC-Bayes bound: Data-dependent PAC-Bayes priors via differential privacy

We show that Entropy-SGD (Chaudhari et al., 2016), when viewed as a learning algorithm, optimizes a PAC-Bayes bound on the risk of a Gibbs (posterior) classifier, i.e., a randomized classifier obtained by a risk-sensitive perturbation of the weights of a learned classifier. Entropy-SGD works by opt…

#### Building Robust Deep Neural Networks for Road Sign Detection

car Convolutional Neural Network DNN gradient

Deep Neural Networks are built to generalize outside of training set in mind by using techniques such as regularization, early stopping and dropout. But considerations to make them more resilient to adversarial examples are rarely taken. As deep neural networks become more prevalent in mission-crit…

#### Algorithmic Regularization in Over-parameterized Matrix Recovery

We study the problem of recovering a low-rank matrix $X^star$ from linear measurements using an over-parameterized model. We parameterize the rank-$r$ matrix $X^star$ by $UU^top$ where $Uin mathbb{R}^{dtimes d}$ is a square matrix, whereas the number of linear measurements is much less than $d^2$. …

#### The Robust Manifold Defense: Adversarial Training using Generative Models

DNN gradient image IMAGENET MNIST

Deep neural networks are demonstrating excellent performance on several classical vision problems. However, these networks are vulnerable to adversarial examples, minutely modified images that induce arbitrary attacker-chosen output from the network. We propose a mechanism to protect against these …

#### Mean Field Residual Networks: On the Edge of Chaos

We study randomly initialized residual networks using mean field theory and the theory of difference equations. Classical feedforward neural networks, such as those with tanh activations, exhibit exponential behavior on the average when propagating inputs forward or gradients backward. The exponent…

#### CSGNet: Neural Shape Parser for Constructive Solid Geometry

We present a neural architecture that takes as input a 2D or 3D shape and induces a program to generate it. The in- structions in our program are based on constructive solid geometry principles, i.e., a set of boolean operations on shape primitives defined recursively. Bottom-up techniques for this…

#### Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm

Learning to learn is a powerful paradigm for enabling models to learn from data more effectively and efficiently. A popular approach to meta-learning is to train a recurrent model to read in a training dataset as input and output the parameters of a learned model, or output predictions for new test…

#### Data-Driven Sparse Structure Selection for Deep Neural Networks

Convolutional Neural Network DNN gradient

Deep convolutional neural networks have liberated its extraordinary power on various tasks. However, it is still very challenging to deploy state-of-the-art models into real-world applications due to their high computational complexity. How can we design a compact and effective network without mass…

#### ES Is More Than Just a Traditional Finite-Difference Approximator

gradient Reinforcement Learning

An evolution strategy (ES) variant recently attracted significant attention due to its surprisingly good performance at optimizing neural networks in challenging deep reinforcement learning domains. It searches directly in the parameter space of neural networks by generating perturbations to the cu…

#### Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning

DNN DQN game Genetic Algorithm gradient Reinforcement Learning

Deep artificial neural networks (DNNs) are typically trained via gradient-based learning algorithms, namely backpropagation. Evolution strategies (ES) can rival backprop-based algorithms such as Q-learning and policy gradients on challenging deep reinforcement learning (RL) problems. However, ES ca…

#### On the Relationship Between the OpenAI Evolution Strategy and Stochastic Gradient Descent

gradient MNIST Reinforcement Learning

Because stochastic gradient descent (SGD) has shown promise optimizing neural networks with millions of parameters and few if any alternatives are known to exist, it has moved to the heart of leading approaches to reinforcement learning (RL). For that reason, the recent result from OpenAI showing t…

#### Safe Mutations for Deep and Recurrent Neural Networks through Output Gradients

DNN Genetic Algorithm gradient Reinforcement Learning RNN

While neuroevolution (evolving neural networks) has a successful track record across a variety of domains from reinforcement learning to artificial life, it is rarely applied to large, deep neural networks. A central reason is that while random mutation generally works in low dimensions, a random p…