Topic Tag: gradient

home Forums Topic Tag: gradient

 Spatio-Temporal Backpropagation for Training High-performance Spiking Neural Networks

 

Compared with artificial neural networks (ANNs), spiking neural networks (SNNs) are promising to explore the brain-like behaviors since the spikes could encode more spatio-temporal information. Although pre-training from ANN or direct training based on backpropagation (BP) makes the supervised trai…


 Deep Mean-Shift Priors for Image Restoration

  

In this paper we introduce a natural image prior that directly represents a Gaussian-smoothed version of the natural image distribution. We include our prior in a formulation of image restoration as a Bayes estimator that also allows us to solve noise-blind image restoration problems. We show that …


 RRA: Recurrent Residual Attention for Sequence Learning

   

In this paper, we propose a recurrent neural network (RNN) with residual attention (RRA) to learn long-range dependencies from sequential data. We propose to add residual connections across timesteps to RNN, which explicitly enhances the interaction between current state and hidden states that are …


 Expected Policy Gradients

 

We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates across the action when estimating the gradient, instead of relying only on the action in the samp…


 On better training the infinite restricted Boltzmann machines

  

The infinite restricted Boltzmann machine (iRBM) is an extension of the classic RBM. It enjoys a good property of automatically deciding the size of the hidden layer according to specific training data. With sufficient training, the iRBM can achieve a competitive performance with that of the classi…


 A Neural Network Architecture Combining Gated Recurrent Unit (GRU) and Support Vector Machine (SVM) for Intrusion Detection in Network Traffic Data

     

Gated Recurrent Unit (GRU) is a recently published variant of the Long Short-Term Memory (LSTM) network, designed to solve the vanishing gradient and exploding gradient problems. However, its main objective is to solve the long-term dependency problem in Recurrent Neural Networks (RNNs), which prev…


 Learned Optimizers that Scale and Generalize

   

Learning to learn has emerged as an important direction for achieving artificial intelligence. Two of the primary barriers to its adoption are an inability to scale to larger problems and a limited ability to generalize to new tasks. We introduce a learned gradient descent optimizer that generalize…


 Distributed Bayesian Learning with Stochastic Natural-gradient Expectation Propagation and the Posterior Server

  

This paper makes two contributions to Bayesian machine learning algorithms. Firstly, we propose stochastic natural gradient expectation propagation (SNEP), a novel alternative to expectation propagation (EP), a popular variational inference algorithm. SNEP is a black box variational algorithm, in t…


 Deep Residual Bidir-LSTM for Human Activity Recognition Using Wearable Sensors

   

Human activity recognition (HAR) has become a popular topic in research because of its wide application. With the development of deep learning, new ideas have appeared to address HAR problems. Here, a deep network architecture using residual bidirectional long short-term memory (LSTM) cells is prop…


 Towards Neural Machine Translation with Latent Tree Attention

   

Building models that take advantage of the hierarchical structure of language without a priori annotation is a longstanding goal in natural language processing. We introduce such a model for the task of machine translation, pairing a recurrent neural network grammar encoder with a novel attentional…


 Stochastic Gradient Descent: Going As Fast As Possible But Not Faster

 

When applied to training deep neural networks, stochastic gradient descent (SGD) often incurs steady progression phases, interrupted by catastrophic episodes in which loss and gradient norm explode. A possible mitigation of such events is to slow down the learning process. This paper presents a nov…


 Hierarchical Attentive Recurrent Tracking

 

Class-agnostic object tracking is particularly difficult in cluttered environments as target specific discriminative models cannot be learned a priori. Inspired by how the human visual cortex employs spatial attention and separate “where” and “what” processing pathways to ac…


 Speeding-up the decision making of a learning agent using an ion trap quantum processor

We report a proof-of-principle experimental demonstration of a quantum speed-up for learning agents utilizing a small-scale quantum information processor based on radiofrequency-driven trapped ions. The decision-making process of a quantum learning agent within the projective simulation paradigm fo…


 On the Suboptimality of Proximal Gradient Descent for ell^{0} Sparse Approximation

We study the proximal gradient descent (PGD) method for $ell^{0}$ sparse approximation problem as well as its accelerated optimization with randomized algorithms in this paper. We first offer theoretical analysis of PGD showing the bounded gap between the sub-optimal solution by PGD and the globall…


 Deep Learning: A Bayesian Perspective

  

Deep learning is a form of machine learning for nonlinear high dimensional pattern matching and prediction. By taking a Bayesian probabilistic perspective, we provide a number of advantages, with more efficient algorithms for optimisation and hyper-parameter tuning, and an explanation of predictive…


 TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

 

High network communication cost for synchronizing gradients and parameters is the well-known bottleneck of distributed training. In this work, we propose TernGrad that uses ternary gradients to accelerate distributed deep learning in data parallelism. Our approach requires only three numerical leve…


 First-Order Adaptive Sample Size Methods to Reduce Complexity of Empirical Risk Minimization

This paper studies empirical risk minimization (ERM) problems for large-scale datasets and incorporates the idea of adaptive sample size methods to improve the guaranteed convergence bounds for first-order stochastic and deterministic methods. In contrast to traditional methods that attempt to solv…


 Model based learning for accelerated, limited-view 3D photoacoustic tomography

  

Recent advances in deep learning for tomographic reconstructions have shown great potential to create accurate and high quality images with a considerable speed-up. In this work we present a deep neural network that is specifically designed to provide high resolution 3D images from restricted photo…


 A parameterized activation function for learning fuzzy logic operations in deep neural networks

 

We present a deep learning architecture for learning fuzzy logic expressions. Our model uses an innovative, parameterized, differentiable activation function that can learn a number of logical operations by gradient descent. This activation function allows a neural network to determine the relation…


 Boltzmann machines and energy-based models

We review Boltzmann machines and energy-based models. A Boltzmann machine defines a probability distribution over binary-valued patterns. One can learn parameters of a Boltzmann machine via gradient based approaches in a way that log likelihood of data is increased. The gradient and Laplacian of a …


 Imposing higher-level Structure in Polyphonic Music Generation using Convolutional Restricted Boltzmann Machines and Constraints

 

We introduce a method for imposing higher-level structure on generated, polyphonic music. A Convolutional Restricted Boltzmann Machine (C-RBM) as a generative model is combined with gradient descent constraint optimization to provide further control over the generation process. Among other things, …