Machine Learning

Clustering with Missing Features: A Penalized Dissimilarity Measure based approach

Tagged: ,

This topic contains 0 replies, has 1 voice, and was last updated by  arXiv 1 month ago.


  • arXiv
    5 pts

    Clustering with Missing Features: A Penalized Dissimilarity Measure based approach

    Many real-world clustering problems are plagued by incomplete data characterized by missing or absent features for some or all of the data instances. Traditional clustering methods cannot be directly applied to such data without preprocessing by imputation or marginalization techniques. In this article, we put forth the concept of Penalized Dissimilarity Measures which estimate the actual distance between two data points (the distance between them if they were to be fully observed) by adding a penalty to the distance due to the observed features common to both the instances. We then propose such a dissimilarity measure called the Feature Weighted Penalty based Dissimilarity (FWPD) measure. Using the proposed dissimilarity measure, we also modify the traditional k-means clustering algorithm and the standard hierarchical agglomerative clustering techniques so as to make them directly applicable to datasets with missing features. We present time complexity analyses for these new techniques and also present a detailed analysis showing that the new FWPD based k-means algorithm converges to a local optimum within a finite number of iterations. We have also conducted extensive experiments on various benchmark datasets showing that the proposed clustering techniques have generally better results compared to some of the popular imputation methods which are commonly used to handle such incomplete data. We have appended a possible extension of the proposed dissimilarity measure to the case of absent features (where the unobserved features are known to be non-existent).

    Clustering with Missing Features: A Penalized Dissimilarity Measure based approach
    by Shounak Datta, Supritam Bhattacharjee, Swagatam Das
    https://arxiv.org/pdf/1604.06602v6.pdf

You must be logged in to reply to this topic.