Topic Tag: text

home Forums Topic Tag: text

 Checkpoint Ensembles: Ensemble Methods from a Single Training Process

   

We present the checkpoint ensembles method that can learn ensemble models on a single training process. Although checkpoint ensembles can be applied to any parametric iterative learning technique, here we focus on neural networks. Neural networks’ composable and simple neurons make it possibl…


 Data Science 101: Sentiment Analysis in R Tutorial

 

Welcome back to Data Science 101! Do you have text data? Do you want to figure out whether the opinions expressed in it are positive or negative? Then you’ve come to the right place! Today, we’re going to get you up to speed on sentiment analysis. By the end of this tutorial you will: U…


 How Important is Syntactic Parsing Accuracy? An Empirical Evaluation on Rule-Based Sentiment Analysis

  

Syntactic parsing, the process of obtaining the internal structure of sentences in natural languages, is a crucial task for artificial intelligence applications that need to extract meaning from natural language text or speech. Sentiment analysis is one example of application for which parsing has …


 Personalized Fuzzy Text Search Using Interest Prediction and Word Vectorization

 

In this paper we study the personalized text search problem. The keyword based search method in conventional algorithms has a low efficiency in understanding users’ intention since the semantic meaning, user profile, user interests are not always considered. Firstly, we propose a novel text s…


 Upper Bound of Bayesian Generalization Error in Non-negative Matrix Factorization

  

Non-negative matrix factorization (NMF) is a new knowledge discovery method that is used for text mining, signal processing, bioinformatics, and consumer analysis. However, its basic property as a learning machine is not yet clarified, as it is not a regular statistical model, resulting that theore…


 A Neural Comprehensive Ranker (NCR) for Open-Domain Question Answering

This paper proposes a novel neural machine reading model for open-domain question answering at scale. Existing machine comprehension models typically assume that a short piece of relevant text containing answers is already identified and given to the models, from which the models are designed to ex…


 KeyVec: Key-semantics Preserving Document Representations

Previous studies have demonstrated the empirical success of word embeddings in various applications. In this paper, we investigate the problem of learning distributed representations for text documents which many machine learning algorithms take as input for a number of NLP tasks. We propose a neur…


 Extracting Ontological Knowledge from Textual Descriptions through Grammar-based Transformation

 

Authoring of OWL-DL ontologies is intellectually challenging and to make this process simpler, many systems accept natural language text as input. A text-based ontology authoring approach can be successful only when it is combined with an effective method for extracting ontological axioms from text…


 Object-oriented Neural Programming (OONP) for Document Understanding

 

We propose Object-oriented Neural Programming (OONP), a framework for semantically parsing documents in specific domains. Basically, OONP reads a document and parses it into a predesigned object-oriented data structure (referred to as ontology in this paper) that reflects the domain-specific semant…


 Glass-Box Program Synthesis: A Machine Learning Approach

Recently proposed models which learn to write computer programs from data use either input/output examples or rich execution traces. Instead, we argue that a novel alternative is to use a glass-box loss function, given as a program itself that can be directly inspected. Glass-box optimization cover…


 EZLearn: Exploiting Organic Supervision in Large-Scale Data Annotation

 

We propose Extreme Zero-shot Learning (EZLearn) for classifying data into potentially thousands of classes, with zero labeled examples. The key insight is to leverage the abundant unlabeled data together with two sources of organic supervision: a lexicon for the annotation classes, and text descrip…


 HDLTex: Hierarchical Deep Learning for Text Classification

 

The continually increasing number of documents produced each year necessitates ever improving information processing methods for searching, retrieving, and organizing text. Central to these information processing methods is document classification, which has become an important application for supe…


 Long Text Generation via Adversarial Training with Leaked Information

     

Automatically generating coherent and semantically meaningful text has many applications in machine translation, dialogue systems, image captioning, etc. Recently, by combining with policy gradient, Generative Adversarial Nets (GAN) that use a discriminative model to guide the training of the gener…


 Piecewise Latent Variables for Neural Variational Text Processing

 

Advances in neural variational inference have facilitated the learning of powerful directed graphical models with continuous latent variables, such as variational autoencoders. The hope is that such models will learn to represent rich, multi-modal latent factors in real-world data, such as natural …


 Attention-based Wav2Text with Feature Transfer Learning

  

Conventional automatic speech recognition (ASR) typically performs multi-level pattern recognition tasks that map the acoustic speech waveform into a hierarchy of speech units. But, it is widely known that information loss in the earlier stage can propagate through the later stages. After the resur…


 Deconvolutional Latent-Variable Model for Text Sequence Matching

  

A latent-variable model is introduced for text matching, inferring sentence representations by jointly optimizing generative and discriminative objectives. To alleviate typical optimization challenges in latent-variable models for text, we employ deconvolutional networks as the sequence decoder (ge…


 Text Compression for Sentiment Analysis via Evolutionary Algorithms

Can textual data be compressed intelligently without losing accuracy in evaluating sentiment? In this study, we propose a novel evolutionary compression algorithm, PARSEC (PARts-of-Speech for sEntiment Compression), which makes use of Parts-of-Speech tags to compress text in a way that sacrifices m…


 Neural Networks for Text Correction and Completion in Keyboard Decoding

    

Despite the ubiquity of mobile and wearable text messaging applications, the problem of keyboard text decoding is not tackled sufficiently in the light of the enormous success of the deep learning Recurrent Neural Network (RNN) and Convolutional Neural Networks (CNN) for natural language understand…


 GaKCo: a Fast GApped k-mer string Kernel using COunting

String Kernel (SK) techniques, especially those using gapped $k$-mers as features (gk), have obtained great success in classifying sequences like DNA, protein, and text. However, the state-of-the-art gk-SK runs extremely slow when we increase the dictionary size ($Sigma$) or allow more mismatches (…


 Leveraging Distributional Semantics for Multi-Label Learning

 

We present a novel and scalable label embedding framework for large-scale multi-label learning a.k.a ExMLDS (Extreme Multi-Label Learning using Distributional Semantics). Our approach draws inspiration from ideas rooted in distributional semantics, specifically the Skip Gram Negative Sampling (SGNS…


 Depression Scale Recognition from Audio, Visual and Text Analysis

      

Depression is a major mental health disorder that is rapidly affecting lives worldwide. Depression not only impacts emotional but also physical and psychological state of the person. Its symptoms include lack of interest in daily activities, feeling low, anxiety, frustration, loss of weight and eve…


 Word Vector Enrichment of Low Frequency Words in the Bag-of-Words Model for Short Text Multi-class Classification Problems

The bag-of-words model is a standard representation of text for many linear classifier learners. In many problem domains, linear classifiers are preferred over more complex models due to their efficiency, robustness and interpretability, and the bag-of-words text representation can capture sufficie…


 Unsupervised, Efficient and Semantic Expertise Retrieval

We introduce an unsupervised discriminative model for the task of retrieving experts in online document collections. We exclusively employ textual evidence and avoid explicit feature engineering by learning distributed word representations in an unsupervised way. We compare our model to state-of-th…


 A segmental framework for fully-unsupervised large-vocabulary speech recognition

    

Zero-resource speech technology is a growing research area that aims to develop methods for speech processing in the absence of transcriptions, lexicons, or language modelling text. Early term discovery systems focused on identifying isolated recurring patterns in a corpus, while more recent full-c…


 Disentangled Variational Auto-Encoder for Semi-supervised Learning

  

In this paper, we develop a novel approach for semi-supervised VAE without classifier. Specifically, we propose a new model called SDVAE, which encodes the input data into disentangled representation and non-interpretable representation, then the category information is directly utilized to regular…