#### Dynamic Island Model based on Spectral Clustering in Genetic Algorithm

How to maintain relative high diversity is important to avoid premature convergence in population-based optimization methods. Island model is widely considered as a major approach to achieve this because of its flexibility and high efficiency. The model maintains a group of sub-populations on diffe…

#### The information bottleneck and geometric clustering

The information bottleneck (IB) approach to clustering takes a joint distribution $P!left(X,Yright)$ and maps the data $X$ to cluster labels $T$ which retain maximal information about $Y$ (Tishby et al., 1999). This objective results in an algorithm that clusters data points based upon the similari…

#### Deep Unsupervised Clustering Using Mixture of Autoencoders

Unsupervised clustering is one of the most fundamental challenges in machine learning. A popular hypothesis is that data are generated from a union of low-dimensional nonlinear manifolds; thus an approach to clustering is identifying and separating these manifolds. In this paper, we present a novel…

#### Intelligent Device Discovery in the Internet of Things – Enabling the Robot Society

The Internet of Things (IoT) is continuously growing to connect billions of smart devices anywhere and anytime in an Internet-like structure, which enables a variety of applications, services and interactions between human and objects. In the future, the smart devices are supposed to be able to aut…

#### DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer

We have witnessed rapid evolution of deep neural network architecture design in the past years. These latest progresses greatly facilitate the developments in various areas such as computer vision and natural language processing. However, along with the extraordinary performance, these state-of-the…

#### Clustering with Missing Features: A Penalized Dissimilarity Measure based approach

Many real-world clustering problems are plagued by incomplete data characterized by missing or absent features for some or all of the data instances. Traditional clustering methods cannot be directly applied to such data without preprocessing by imputation or marginalization techniques. In this art…

#### A Bayesian Nonparametric Method for Clustering Imputation, and Forecasting in Multivariate Time Series

This article proposes a Bayesian nonparametric method for forecasting, imputation, and clustering in sparsely observed, multivariate time series. The method is appropriate for jointly modeling hundreds of time series with widely varying, non-stationary dynamics. Given a collection of $N$ time serie…

#### Learning to Rank Question-Answer Pairs using Hierarchical Recurrent Encoder with Latent Topic Clustering

In this paper, we propose a novel end-to-end neural architecture for ranking answers from candidates that adapts a hierarchical recurrent neural network and a latent topic clustering module. With our proposed model, a text is encoded to a vector representation from an word-level to a chunk-level to…

#### Function space analysis of deep learning representation layers

In this paper we propose a function space approach to Representation Learning and the analysis of the representation layers in deep learning architectures. We show how to compute a weak-type Besov smoothness index that quantifies the geometry of the clustering in the feature space. This approach wa…

#### From Subspaces to Metrics and Beyond: Toward Multi-Diversified Ensemble Clustering of High-Dimensional Data

The emergence of high-dimensional data in various areas has brought new challenges to the ensemble clustering research. To deal with the curse of dimensionality, considerable efforts in ensemble clustering have been made by incorporating various subspace-based techniques. Besides the emphasis on su…

#### Forecasting Across Time Series Databases using Long Short-Term Memory Networks on Groups of Similar Series

With the advent of Big Data, nowadays in many applications databases containing large quantities of similar time series are available. Forecasting time series in these domains with traditional univariate forecasting procedures leaves great potentials for producing accurate forecasts untapped. Recur…

#### A New Spectral Clustering Algorithm

We present a new clustering algorithm that is based on searching for natural gaps in the components of the lowest energy eigenvectors of the Laplacian of a graph. In comparing the performance of the proposed method with a set of other popular methods (KMEANS, spectral-KMEANS, and an agglomerative m…

#### Reliable Learning of Bernoulli Mixture Models

In this paper, we have derived a set of sufficient conditions for reliable clustering of data produced by Bernoulli Mixture Models (BMM), when the number of clusters is unknown. A BMM refers to a random binary vector whose components are independent Bernoulli trials with cluster-specific frequencie…

#### DeepSafe: A Data-driven Approach for Checking Adversarial Robustness in Neural Networks

audio clustering DNN image language MNIST security speech

Deep neural networks have become widely used, obtaining remarkable results in domains such as computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, and bio-informatics, where they have produced results comparable to human…

#### Toward Scalable Machine Learning and Data Mining: the Bioinformatics Case

In an effort to overcome the data deluge in computational biology and bioinformatics and to facilitate bioinformatics research in the era of big data, we identify some of the most influential algorithms that have been widely used in the bioinformatics community. These top data mining and machine le…

#### A generalization of the Jensen divergence: The chord gap divergence

We introduce a novel family of distances, called the chord gap divergences, that generalizes the Jensen divergences (also called the Burbea-Rao distances), and study its properties. It follows a generalization of the celebrated statistical Bhattacharyya distance that is frequently met in applicatio…

#### Demystifying Relational Latent Representations

Latent features learned by deep learning approaches have proven to be a powerful tool for machine learning. They serve as a data abstraction that makes learning easier by capturing regularities in data explicitly. Their benefits motivated their adaptation to relational learning context. In our prev…

#### Robust nonparametric nearest neighbor random process clustering

We consider the problem of clustering noisy finite-length observations of stationary ergodic random processes according to their generative models without prior knowledge of the model statistics and the number of generative models. Two algorithms, both using the $L^1$-distance between estimated pow…

#### Research on several key technologies in practical speech emotion recognition

In this dissertation the practical speech emotion recognition technology is studied, including several cognitive related emotion types, namely fidgetiness, confidence and tiredness. The high quality of naturalistic emotional speech data is the basis of this research. The following techniques are us…

#### A Compressive Sensing Approach to Community Detection with Applications

The community detection problem for graphs asks one to partition the n vertices V of a graph G into k communities, or clusters, such that there are many intracluster edges and few intercluster edges. Of course this is equivalent to finding a permutation matrix P such that, if A denotes the adjacenc…

#### A Dirichlet Mixture Model of Hawkes Processes for Event Sequence Clustering

We propose an effective method to solve the event sequence clustering problems based on a novel Dirichlet mixture model of a special but significant type of point processes — Hawkes process. In this model, each event sequence belonging to a cluster is generated via the same Hawkes process wit…

#### Class-Splitting Generative Adversarial Networks

CIFAR clustering Generative Adversarial Network

Generative Adversarial Networks (GANs) produce systematically better quality samples when class label information is provided., i.e. in the conditional GAN setup. This is still observed for the recently proposed Wasserstein GAN formulation which stabilized adversarial training and allows considerin…

#### Scalable Support Vector Clustering Using Budget

clustering gradient Support Vector Machine

Owing to its application in solving the difficult and diverse clustering or outlier detection problem, support-based clustering has recently drawn plenty of attention. Support-based clustering method always undergoes two phases: finding the domain of novelty and performing clustering assignment. To…

#### Estimating Mutual Information for Discrete-Continuous Mixtures

Estimating mutual information from observed samples is a basic primitive, useful in several machine learning tasks including correlation mining, information bottleneck clustering, learning a Chow-Liu tree, and conditional independence testing in (causal) graphical models. While mutual information i…

#### A Comparative Quantitative Analysis of Contemporary Big Data Clustering Algorithms for Market Segmentation in Hospitality Industry

The hospitality industry is one of the data-rich industries that receives huge Volumes of data streaming at high Velocity with considerably Variety, Veracity, and Variability. These properties make the data analysis in the hospitality industry a big data problem. Meeting the customers’ expect…