#### Quantum-assisted learning of hardware-embedded probabilistic graphical models

Mainstream machine learning techniques such as deep learning and probabilistic programming rely heavily on sampling from generally intractable probability distributions. There is increasing interest in the potential advantages of using quantum computing technologies as sampling engines to speed up …

#### Recent Advances in Convolutional Neural Networks

Convolutional Neural Network DNN image language speech

In the last few years, deep learning has led to very good performance on a variety of problems, such as visual recognition, speech recognition and natural language processing. Among different types of deep neural networks, convolutional neural networks have been most extensively studied. Leveraging…

#### Deep Extreme Multi-label Learning

Extreme multi-label learning (XML) or classification has been a practical and important problem since the boom of big data. The main challenge lies in the exponential label space which involves $2^L$ possible label sets when the label dimension $L$ is very large, e.g., in millions for Wikipedia lab…

#### Meta-Learning via Feature-Label Memory Network

Deep learning typically requires training a very capable architecture using large datasets. However, many important learning problems demand an ability to draw valid inferences from small size datasets, and such problems pose a particular challenge for deep learning. In this regard, various researc…

#### Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data

One of the defining properties of deep learning is that models are chosen to have many more parameters than available training data. In light of this capacity for overfitting, it is remarkable that simple algorithms like SGD reliably return solutions with low test error. One roadblock to explaining…

#### Asynchronous Decentralized Parallel Stochastic Gradient Descent

Recent work shows that decentralized parallel stochastic gradient decent (D-PSGD) can outperform its centralized counterpart both theoretically and practically. While asynchronous parallelism is a powerful technology to improve the efficiency of parallelism in distributed machine learning platforms…

#### Characterization of Gradient Dominance and Regularity Conditions for Neural Networks

The past decade has witnessed a successful application of deep learning to solving many challenging problems in machine learning and artificial intelligence. However, the loss functions of deep neural networks (especially nonlinear networks) are still far from being well understood from a theoretic…

#### Photo-Guided Exploration of Volume Data Features

In this work, we pose the question of whether, by considering qualitative information such as a sample target image as input, one can produce a rendered image of scientific data that is similar to the target. The algorithm resulting from our research allows one to ask the question of whether featur…

#### DNN and CNN with Weighted and Multi-task Loss Functions for Audio Event Detection

audio Convolutional Neural Network DNN

This report presents our audio event detection system submitted for Task 2, “Detection of rare sound events”, of DCASE 2017 challenge. The proposed system is based on convolutional neural networks (CNNs) and deep neural networks (DNNs) coupled with novel weighted and multi-task loss fun…

#### Stochastic Weighted Function Norm Regularization

Deep neural networks (DNNs) have become increasingly important due to their excellent empirical performance on a wide range of problems. However, regularization is generally achieved by indirect means, largely due to the complex set of functions defined by a network and the difficulty in measuring …

#### Calibrated Boosting-Forest

Excellent ranking power along with well calibrated probability estimates are needed in many classification tasks. In this paper, we introduce a technique, Calibrated Boosting-Forest that captures both. This novel technique is an ensemble of gradient boosting machines that can support both continuou…

#### Domain Randomization and Generative Models for Robotic Grasping

Deep learning-based robotic grasping has made significant progress the past several years thanks to algorithmic improvements and increased data availability. However, state-of-the-art models are often trained on as few as hundreds or thousands of unique object instances, and as a result generalizat…

#### Multi-task Domain Adaptation for Deep Learning of Instance Grasping from Simulation

Learning-based approaches to robotic manipulation are limited by the scalability of data collection and accessibility of labels. In this paper, we present a multi-task domain adaptation framework for instance grasping in cluttered scenes by utilizing simulated robot experiments. Our neural network …

#### Feature versus Raw Sequence: Deep Learning Comparative Study on Predicting Pre-miRNA

Should we input known genome sequence features or input sequence itself in deep learning framework? As deep learning more popular in various applications, researchers often come to question whether to generate features or use raw sequences for deep learning. To answer this question, we study the pr…

#### Spontaneous Symmetry Breaking in Neural Networks

We propose a framework to understand the unprecedented performance and robustness of deep neural networks using field theory. Correlations between the weights within the same layer can be described by symmetries in that layer, and networks generalize better if such symmetries are broken to reduce t…

#### Intention-Net: Integrating Planning and Deep Learning for Goal-Directed Autonomous Navigation

How can a delivery robot navigate reliably to a destination in a new office building, with minimal prior information? To tackle this challenge, this paper introduces a two-level hierarchical approach, which integrates model-free deep learning and model-based path planning. At the low level, a neura…

#### Generalization in Deep Learning

This paper explains why deep learning can generalize well, despite large capacity and possible algorithmic instability, nonrobustness, and sharp minima, effectively addressing an open problem in the literature. Based on our theoretical insight, this paper also proposes a family of new regularizatio…

#### Emergence of Invariance and Disentangling in Deep Representations

Using established principles from Information Theory and Statistics, we show that in a deep neural network invariance to nuisance factors is equivalent to information minimality of the learned representation, and that stacking layers and injecting noise during training naturally bias the network to…

#### A systematic study of the class imbalance problem in convolutional neural networks

CIFAR Convolutional Neural Network DNN IMAGENET MNIST

In this study, we systematically investigate the impact of class imbalance on classification performance of convolutional neural networks (CNNs) and compare frequently used methods to address the issue. Class imbalance is a common problem that has been comprehensively studied in classical machine l…

#### Efficient Computation in Adaptive Artificial Spiking Neural Networks

Artificial Neural Networks (ANNs) are bio-inspired models of neural computation that have proven highly effective. Still, ANNs lack a natural notion of time, and neural units in ANNs exchange analog values in a frame-based manner, a computationally and energetically inefficient form of communicatio…

#### On orthogonality and learning recurrent networks with long term dependencies

It is well known that it is challenging to train deep neural networks and recurrent neural networks for tasks that exhibit long term dependencies. The vanishing or exploding gradient problem is a well known issue associated with these challenges. One approach to addressing vanishing and exploding g…

#### Mixed Precision Training

DNN Generative Adversarial Network RNN

Deep neural networks have enabled progress in a wide variety of applications. Growing the size of the neural network typically results in improved accuracy. As model sizes grow, the memory and compute requirements for training these models also increases. We introduce a technique to train deep neur…

#### High-dimensional dynamics of generalization error in neural networks

We perform an average case analysis of the generalization dynamics of large neural networks trained using gradient descent. We study the practically-relevant “high-dimensional” regime where the number of free parameters in the network is on the order of or even larger than the number of…

#### Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks

Minimizing non-convex and high-dimensional objective functions is challenging, especially when training modern deep neural networks. In this paper, a novel approach is proposed which divides the training process into two consecutive phases to obtain better generalization performance: Bayesian sampl…

#### Energy-efficient Amortized Inference with Cascaded Deep Classifiers

Deep neural networks have been remarkable successful in various AI tasks but often cast high computation and energy cost for energy-constrained applications such as mobile sensing. We address this problem by proposing a novel framework that optimizes the prediction accuracy and energy cost simultan…