To prune, or not to prune: exploring the efficacy of pruning for model compression
This topic contains 0 replies, has 1 voice, and was last updated by arXiv 1 year, 3 months ago.

To prune, or not to prune: exploring the efficacy of pruning for model compression
Model pruning seeks to induce sparsity in a deep neural network’s various connection matrices, thereby reducing the number of nonzerovalued parameters in the model. Recent reports (Han et al., 2015; Narang et al., 2017) prune deep networks at the cost of only a marginal loss in accuracy and achieve a sizable reduction in model size. This hints at the possibility that the baseline models in these experiments are perhaps severely overparameterized at the outset and a viable alternative for model compression might be to simply reduce the number of hidden units while maintaining the model’s dense connection structure, exposing a similar tradeoff in model size and accuracy. We investigate these two distinct paths for model compression within the context of energyefficient inference in resourceconstrained environments and propose a new gradual pruning technique that is simple and straightforward to apply across a variety of models/datasets with minimal tuning and can be seamlessly incorporated within the training process. We compare the accuracy of large, but pruned models (largesparse) and their smaller, but dense (smalldense) counterparts with identical memory footprint. Across a broad range of neural network architectures (deep CNNs, stacked LSTM, and seq2seq LSTM models), we find largesparse models to consistently outperform smalldense models and achieve up to 10x reduction in number of nonzero parameters with minimal loss in accuracy.
To prune, or not to prune: exploring the efficacy of pruning for model compression
by Michael Zhu, Suyog Gupta
https://arxiv.org/pdf/1710.01878v1.pdf
You must be logged in to reply to this topic.