Convolutional Neural Networks (CNNs) have reached out-standing results in several complex visual recognition tasks, such as classification and scene parsing. CNNs are com-posed of multiple filtering layers that perform 2D convolu-Tions over input images. The intrinsic parallelism in such a computation kernel makes it suitable to be effectively ac-celerated on parallel hardware. In this paper we propose a highly flexible and scalable architectural template for accel-eration of CNNs on FPGA devices, based on the cooperation between a set of software cores and a parallel convolution engine that communicate via a tightly coupled L1 shared scratchpad. Our accelerator structure, tested on a Xilinx Zynq XC-Z7045 device, delivers peak performance up to 80 GMAC/s, corresponding to 100 MMAC/s for each DSP slice in the programmable fabric. Thanks to the flexible archi-Tecture, convolution operations can be scheduled in order to reduce input/output bandwidth down to 8 bytes per cy-cle without degrading the performance of the accelerator in most of the meaningful use-cases.
Titolo: | Curbing the roofline: A scalable and flexible architecture for CNNs on FPGA |
Autori: | |
Data di pubblicazione: | 2016 |
Handle: | http://hdl.handle.net/11584/177741 |
ISBN: | 9781450341288 9781450341288 |
Tipologia: | 4.1 Contributo in Atti di convegno |
File in questo prodotto:
File | Descrizione | Tipologia | Licenza | |
---|---|---|---|---|
p376-meloni.pdf | versione editoriale | Administrator Richiedi una copia |