Topic Tag: speech

home Forums Topic Tag: speech

 Unsupervised Domain Adaptation for Robust Speech Recognition via Variational Autoencoder-Based Data Augmentation

 

Domain mismatch between training and testing can lead to significant degradation in performance in many machine learning scenarios. Unfortunately, this is not a rare situation for automatic speech recognition deployments in real-world applications. Research on robust speech recognition can be regar…


 Attention-based Wav2Text with Feature Transfer Learning

  

Conventional automatic speech recognition (ASR) typically performs multi-level pattern recognition tasks that map the acoustic speech waveform into a hierarchy of speech units. But, it is widely known that information loss in the earlier stage can propagate through the later stages. After the resur…


 BreathRNNet: Breathing Based Authentication on Resource-Constrained IoT Devices using RNNs

   

Recurrent neural networks (RNNs) have shown promising results in audio and speech processing applications due to their strong capabilities in modelling sequential data. In many applications, RNNs tend to outperform conventional models based on GMM/UBMs and i-vectors. Increasing popularity of IoT de…


 Deep Recurrent NMF for Speech Separation by Unfolding Iterative Thresholding

  

In this paper, we propose a novel recurrent neural network architecture for speech separation. This architecture is constructed by unfolding the iterations of a sequential iterative soft-thresholding algorithm (ISTA) that solves the optimization problem for sparse nonnegative matrix factorization (…


 Using NLU in Context for Question Answering: Improving on Facebook’s bAbI Tasks

  

For the next step in human to machine interaction, Artificial Intelligence (AI) should interact predominantly using natural language because, if it worked, it would be the fastest way to communicate. Facebook’s toy tasks (bAbI) provide a useful benchmark to compare implementations for convers…


 Using NLU in Context for Question Answering: Improving on Facebook’s bAbI Tasks

  

For the next step in human to machine interaction, Artificial Intelligence (AI) should interact predominantly using natural language because, if it worked, it would be the fastest way to communicate. Facebook’s toy tasks (bAbI) provide a useful benchmark to compare implementations for convers…


 Unsupervised Machine Learning for Networking: Techniques, Applications and Research Challenges

   

While machine learning and artificial intelligence have long been applied in networking research, the bulk of such works has focused on supervised learning. Recently there has been a rising trend of employing unsupervised machine learning using unstructured raw network data to improve network perfo…


 Language modeling with Neural trans-dimensional random fields

    

Trans-dimensional random field language models (TRF LMs) have recently been introduced, where sentences are modeled as a collection of random fields. The TRF approach has been shown to have the advantages of being computationally more efficient in inference than LSTM LMs with close performance and …


 Mitigating Evasion Attacks to Deep Neural Networks via Region-based Classification

      

Deep neural networks (DNNs) have transformed several artificial intelligence research areas including computer vision, speech recognition, and natural language processing. However, recent studies demonstrated that DNNs are vulnerable to adversarial manipulations at testing time. Specifically, suppo…


 Nonnegative HMM for Babble Noise Derived from Speech HMM: Application to Speech Enhancement

Deriving a good model for multitalker babble noise can facilitate different speech processing algorithms, e.g. noise reduction, to reduce the so-called cocktail party difficulty. In the available systems, the fact that the babble waveform is generated as a sum of N different speech waveforms is not…


 Speech Dereverberation Using Nonnegative Convolutive Transfer Function and Spectro temporal Modeling

This paper presents two single channel speech dereverberation methods to enhance the quality of speech signals that have been recorded in an enclosed space. For both methods, the room acoustics are modeled using a nonnegative approximation of the convolutive transfer function (NCTF), and to additio…


 A segmental framework for fully-unsupervised large-vocabulary speech recognition

    

Zero-resource speech technology is a growing research area that aims to develop methods for speech processing in the absence of transcriptions, lexicons, or language modelling text. Early term discovery systems focused on identifying isolated recurring patterns in a corpus, while more recent full-c…


 Minimax Filter: Learning to Preserve Privacy from Inference Attacks

 

Preserving privacy of continuous and/or high-dimensional data such as images, videos and audios, can be challenging with syntactic anonymization methods which are designed for discrete attributes. Differential privacy, which provides a more formal definition of privacy, has shown more success in sa…


 Using NLU in Context for Question Answering: Improving on Facebook’s bAbI Tasks

  

For the next step in human to machine interaction, Artificial Intelligence (AI) should interact predominantly using natural language because, if it worked, it would be the fastest way to communicate. Facebook’s toy tasks (bAbI) provide a useful benchmark to compare implementations for convers…


 Using NLU in Context for Question Answering: Improving on Facebook’s bAbI Tasks

  

For the next step in human to machine interaction, Artificial Intelligence (AI) should interact predominantly using natural language because, if it worked, it would be the fastest way to communicate. Facebook’s toy tasks (bAbI) provide a useful benchmark to compare implementations for convers…


 Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems

  

Neural models have become ubiquitous in automatic speech recognition systems. While neural networks are typically used as acoustic models in more complex systems, recent studies have explored end-to-end speech recognition systems based on neural networks, which can be trained to directly predict te…


 Sequence Prediction with Neural Segmental Models

Segments that span contiguous parts of inputs, such as phonemes in speech, named entities in sentences, actions in videos, occur frequently in sequence prediction problems. Segmental models, a class of models that explicitly hypothesizes segments, have allowed the exploration of rich segment featur…


 End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks

 

Speech enhancement model is used to map a noisy speech to a clean speech. In the training stage, an objective function is often adopted to optimize the model parameters. However, in most studies, there is an inconsistency between the model optimization criterion and the evaluation criterion on the …


 Training RNNs as Fast as CNNs

   

Recurrent neural networks scale poorly due to the intrinsic difficulty in parallelizing their state computations. For instance, the forward pass computation of $h_t$ is blocked until the entire computation of $h_{t-1}$ finishes, which is a major bottleneck for parallel computing. In this work, we p…


 Gaussian Quadrature for Kernel Features

  

Kernel methods have recently attracted resurgent interest, matching the performance of deep neural networks in tasks such as speech recognition. The random Fourier features map is a technique commonly used to scale up kernel machines, but employing the randomized feature map means that $O(epsilon^{…


 A Deep Reinforcement Learning Chatbot

   

We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ense…


 Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification

   

Improving speech system performance in noisy environments remains a challenging task, and speech enhancement (SE) is one of the effective techniques to solve the problem. Motivated by the promising results of generative adversarial networks (GANs) in a variety of image processing tasks, we explore …


 An embedded segmental K-means model for unsupervised segmentation and clustering of speech

  

Unsupervised segmentation and clustering of unlabelled speech are core problems in zero-resource speech processing. Most approaches lie at methodological extremes: some use probabilistic Bayesian models with convergence guarantees, while others opt for more efficient heuristic techniques. Despite c…


 Unsupervised Submodular Rank Aggregation on Score-based Permutations

 

Unsupervised rank aggregation on score-based permutations, which is widely used in many applications, has not been deeply explored yet. This work studies the use of submodular optimization for rank aggregation on score-based permutations in an unsupervised way. Specifically, we propose an unsupervi…


 Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models

  

With the availability of large databases and recent improvements in deep learning methodology, the performance of AI systems is reaching or even exceeding the human level on an increasing number of complex tasks. Impressive examples of this development can be found in domains such as image classifi…