Exploiting model-level parallelism in recurrent neural network accelerators
Document Type
Conference Proceeding
Publication Date
10-1-2019
Abstract
Recurrent Neural Networks (RNNs) have continued to facilitate rapid progress in a variety of academic and industrial fields, though their complexity continues to make efficient deployment difficult; when the RNN model size is not properly matched to hardware resources, performance can suffer from hardware under-utilization. In this work, we propose to explore model-level parallelism for LSTM-RNN accelerators in different levels of the model using a multicore design. The multi-core design proposed in this work operates in three computing modes: multi-programming mode in which independent models are executed; multithreading mode in which parallelism among layers of an LSTM model is explored and properly scheduled; and helper-core mode in which cores collaborate on a single LSTM layer in a lower model level comparing with multithread mode. Our design can achieve up to 1.98x speedup in 'multi-programming' mode, a 1.91x speedup in 'multithreading' mode and a 1.88x speedup in 'helper-core' mode over the single-core design.
Publication Source (Journal or Book title)
Proceedings - 2019 IEEE 13th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2019
First Page
241
Last Page
248
Recommended Citation
Peng, L., Shi, W., Zhang, J., & Irving, S. (2019). Exploiting model-level parallelism in recurrent neural network accelerators. Proceedings - 2019 IEEE 13th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2019, 241-248. https://doi.org/10.1109/MCSoC.2019.00042