Topic Tag: speech

home Forums Topic Tag: speech

 Recent Advances in Convolutional Neural Networks

    

In the last few years, deep learning has led to very good performance on a variety of problems, such as visual recognition, speech recognition and natural language processing. Among different types of deep neural networks, convolutional neural networks have been most extensively studied. Leveraging…


 Learning weakly supervised multimodal phoneme embeddings

  

Recent works have explored deep architectures for learning multimodal speech representation (e.g. audio and images, articulation and audio) in a supervised way. Here we investigate the role of combining different speech modalities, i.e. audio and visual information representing the lips movements, …


 Learning Scalable Deep Kernels with Recurrent Structure

     

Many applications in speech, robotics, finance, and biology deal with sequential data, where ordering matters and recurrent structures are common. However, this structure cannot be easily captured by standard kernel functions. To model such structure, we propose expressive closed-form kernel functi…


 LSTM: A Search Space Odyssey

  

Several variants of the Long Short-Term Memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995. In recent years, these networks have become the state-of-the-art models for a variety of machine learning problems. This has led to a renewed interest in …


 Modular Representation of Layered Neural Networks

   

Layered neural networks have greatly improved the performance of various applications including image processing, speech recognition, natural language processing, and bioinformatics. However, it is still difficult to discover or interpret knowledge from the inference provided by a layered neural ne…


 How Important is Syntactic Parsing Accuracy? An Empirical Evaluation on Rule-Based Sentiment Analysis

  

Syntactic parsing, the process of obtaining the internal structure of sentences in natural languages, is a crucial task for artificial intelligence applications that need to extract meaning from natural language text or speech. Sentiment analysis is one example of application for which parsing has …


 DeepSafe: A Data-driven Approach for Checking Adversarial Robustness in Neural Networks

       

Deep neural networks have become widely used, obtaining remarkable results in domains such as computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, and bio-informatics, where they have produced results comparable to human…


 Improving speech recognition by revising gated recurrent units

   

Speech recognition is largely taking advantage of deep learning, showing that substantial benefits can be obtained by modern Recurrent Neural Networks (RNNs). The most popular RNNs are Long Short-Term Memory (LSTMs), which typically reach state-of-the-art performance in many tasks thanks to their a…


 Research on several key technologies in practical speech emotion recognition

  

In this dissertation the practical speech emotion recognition technology is studied, including several cognitive related emotion types, namely fidgetiness, confidence and tiredness. The high quality of naturalistic emotional speech data is the basis of this research. The following techniques are us…


 Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks

  

A method for statistical parametric speech synthesis incorporating generative adversarial networks (GANs) is proposed. Although powerful deep neural networks (DNNs) techniques can be applied to artificially synthesize speech waveform, the synthetic speech quality is low compared with that of natura…


 Learning Latent Representations for Speech Generation and Transformation

An ability to model a generative process and learn a latent representation for speech in an unsupervised fashion will be crucial to process vast quantities of unlabelled speech data. Recently, deep probabilistic generative models such as Variational Autoencoders (VAEs) have achieved tremendous succ…


 Unsupervised Domain Adaptation for Robust Speech Recognition via Variational Autoencoder-Based Data Augmentation

 

Domain mismatch between training and testing can lead to significant degradation in performance in many machine learning scenarios. Unfortunately, this is not a rare situation for automatic speech recognition deployments in real-world applications. Research on robust speech recognition can be regar…


 Attention-based Wav2Text with Feature Transfer Learning

  

Conventional automatic speech recognition (ASR) typically performs multi-level pattern recognition tasks that map the acoustic speech waveform into a hierarchy of speech units. But, it is widely known that information loss in the earlier stage can propagate through the later stages. After the resur…


 BreathRNNet: Breathing Based Authentication on Resource-Constrained IoT Devices using RNNs

   

Recurrent neural networks (RNNs) have shown promising results in audio and speech processing applications due to their strong capabilities in modelling sequential data. In many applications, RNNs tend to outperform conventional models based on GMM/UBMs and i-vectors. Increasing popularity of IoT de…


 Deep Recurrent NMF for Speech Separation by Unfolding Iterative Thresholding

  

In this paper, we propose a novel recurrent neural network architecture for speech separation. This architecture is constructed by unfolding the iterations of a sequential iterative soft-thresholding algorithm (ISTA) that solves the optimization problem for sparse nonnegative matrix factorization (…


 Using NLU in Context for Question Answering: Improving on Facebook’s bAbI Tasks

  

For the next step in human to machine interaction, Artificial Intelligence (AI) should interact predominantly using natural language because, if it worked, it would be the fastest way to communicate. Facebook’s toy tasks (bAbI) provide a useful benchmark to compare implementations for convers…


 Using NLU in Context for Question Answering: Improving on Facebook’s bAbI Tasks

  

For the next step in human to machine interaction, Artificial Intelligence (AI) should interact predominantly using natural language because, if it worked, it would be the fastest way to communicate. Facebook’s toy tasks (bAbI) provide a useful benchmark to compare implementations for convers…


 Unsupervised Machine Learning for Networking: Techniques, Applications and Research Challenges

   

While machine learning and artificial intelligence have long been applied in networking research, the bulk of such works has focused on supervised learning. Recently there has been a rising trend of employing unsupervised machine learning using unstructured raw network data to improve network perfo…


 Language modeling with Neural trans-dimensional random fields

    

Trans-dimensional random field language models (TRF LMs) have recently been introduced, where sentences are modeled as a collection of random fields. The TRF approach has been shown to have the advantages of being computationally more efficient in inference than LSTM LMs with close performance and …


 Mitigating Evasion Attacks to Deep Neural Networks via Region-based Classification

      

Deep neural networks (DNNs) have transformed several artificial intelligence research areas including computer vision, speech recognition, and natural language processing. However, recent studies demonstrated that DNNs are vulnerable to adversarial manipulations at testing time. Specifically, suppo…


 Nonnegative HMM for Babble Noise Derived from Speech HMM: Application to Speech Enhancement

Deriving a good model for multitalker babble noise can facilitate different speech processing algorithms, e.g. noise reduction, to reduce the so-called cocktail party difficulty. In the available systems, the fact that the babble waveform is generated as a sum of N different speech waveforms is not…


 Speech Dereverberation Using Nonnegative Convolutive Transfer Function and Spectro temporal Modeling

This paper presents two single channel speech dereverberation methods to enhance the quality of speech signals that have been recorded in an enclosed space. For both methods, the room acoustics are modeled using a nonnegative approximation of the convolutive transfer function (NCTF), and to additio…


 A segmental framework for fully-unsupervised large-vocabulary speech recognition

    

Zero-resource speech technology is a growing research area that aims to develop methods for speech processing in the absence of transcriptions, lexicons, or language modelling text. Early term discovery systems focused on identifying isolated recurring patterns in a corpus, while more recent full-c…


 Minimax Filter: Learning to Preserve Privacy from Inference Attacks

 

Preserving privacy of continuous and/or high-dimensional data such as images, videos and audios, can be challenging with syntactic anonymization methods which are designed for discrete attributes. Differential privacy, which provides a more formal definition of privacy, has shown more success in sa…


 Using NLU in Context for Question Answering: Improving on Facebook’s bAbI Tasks

  

For the next step in human to machine interaction, Artificial Intelligence (AI) should interact predominantly using natural language because, if it worked, it would be the fastest way to communicate. Facebook’s toy tasks (bAbI) provide a useful benchmark to compare implementations for convers…