Topic Tag: audio

home Forums Topic Tag: audio

 DNN and CNN with Weighted and Multi-task Loss Functions for Audio Event Detection

  

This report presents our audio event detection system submitted for Task 2, “Detection of rare sound events”, of DCASE 2017 challenge. The proposed system is based on convolutional neural networks (CNNs) and deep neural networks (DNNs) coupled with novel weighted and multi-task loss fun…


 Learning weakly supervised multimodal phoneme embeddings

  

Recent works have explored deep architectures for learning multimodal speech representation (e.g. audio and images, articulation and audio) in a supervised way. Here we investigate the role of combining different speech modalities, i.e. audio and visual information representing the lips movements, …


 DeepSafe: A Data-driven Approach for Checking Adversarial Robustness in Neural Networks

       

Deep neural networks have become widely used, obtaining remarkable results in domains such as computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, and bio-informatics, where they have produced results comparable to human…


 Computer Assisted Composition with Recurrent Neural Networks

  

Sequence modeling with neural networks has lead to powerful models of symbolic music data. We address the problem of exploiting these models to reach creative musical goals, by combining with human input. To this end we generalise previous work, which sampled Markovian sequence models under the con…


 BreathRNNet: Breathing Based Authentication on Resource-Constrained IoT Devices using RNNs

   

Recurrent neural networks (RNNs) have shown promising results in audio and speech processing applications due to their strong capabilities in modelling sequential data. In many applications, RNNs tend to outperform conventional models based on GMM/UBMs and i-vectors. Increasing popularity of IoT de…


 Temporal Multimodal Fusion for Video Emotion Classification in the Wild

  

This paper addresses the question of emotion classification. The task consists in predicting emotion labels (taken among a set of possible labels) best describing the emotions contained in short video clips. Building on a standard framework — lying in describing videos by audio and visual fea…


 Depression Scale Recognition from Audio, Visual and Text Analysis

      

Depression is a major mental health disorder that is rapidly affecting lives worldwide. Depression not only impacts emotional but also physical and psychological state of the person. Its symptoms include lack of interest in daily activities, feeling low, anxiety, frustration, loss of weight and eve…


 Continuous Multimodal Emotion Recognition Approach for AVEC 2017

  

This paper reports the analysis of audio and visual features in predicting the emotion dimensions under the seventh Audio/Visual Emotion Subchallenge (AVEC 2017). For visual features we used the HOG (Histogram of Gradients) features, Fisher encodings of SIFT (Scale-Invariant Feature Transform) feat…


 Similarity graphs for the concealment of long duration data loss in music

We present a novel method for the compensation of long duration data gaps in audio signals, in particular music. The concealment of such signal defects is based on a graph that encodes signal structure in terms of time-persistent spectral similarity. A suitable candidate segment for the substitutio…


 A segmental framework for fully-unsupervised large-vocabulary speech recognition

    

Zero-resource speech technology is a growing research area that aims to develop methods for speech processing in the absence of transcriptions, lexicons, or language modelling text. Early term discovery systems focused on identifying isolated recurring patterns in a corpus, while more recent full-c…


 Basic Filters for Convolutional Neural Networks: Training or Design?

 

When convolutional neural networks are used to tackle learning problems based on time series, e.g., audio data, raw one-dimensional data are commonly pre-processed to obtain spectrogram or mel-spectrogram coefficients, which are then used as input to the actual neural network. In this contribution,…


 A Comparison on Audio Signal Preprocessing Methods for Deep Neural Networks on Music Tagging

 

Deep neural networks (DNN) have been successfully applied for music classification tasks including music tagging. In this paper, we investigate the effect of audio preprocessing on music tagging with neural networks. We perform comprehensive experiments involving audio preprocessing using different…


 Launching the Speech Commands Dataset

    

Posted by Pete Warden, Software Engineer, Google Brain Team At Google, we’re often asked how to get started using deep learning for speech and other audio recognition problems, like detecting keywords or commands. And while there are some great open source speech recognition systems like Kaldi th…


 WaveNet: A Generative Model for Raw Audio

WaveNet: A Generative Model for Raw Audio by DeepMind