Machine Learning

Training RNNs as Fast as CNNs

This topic contains 0 replies, has 1 voice, and was last updated by  arXiv 1 year, 8 months ago.

  • arXiv
    5 pts

    Training RNNs as Fast as CNNs

    Recurrent neural networks scale poorly due to the intrinsic difficulty in parallelizing their state computations. For instance, the forward pass computation of $h_t$ is blocked until the entire computation of $h_{t-1}$ finishes, which is a major bottleneck for parallel computing. In this work, we propose an alternative RNN implementation by deliberately simplifying the state computation and exposing more parallelism. The proposed recurrent unit operates as fast as a convolutional layer and 5-10x faster than cuDNN-optimized LSTM. We demonstrate the unit’s effectiveness across a wide range of applications including classification, question answering, language modeling, translation and speech recognition. We open source our implementation in PyTorch and CNTK.

    Training RNNs as Fast as CNNs
    by Tao Lei, Yu Zhang

You must be logged in to reply to this topic.