摘要

State-of-the-art convolutional neural networks (CNNs) usually have a large number of layers and filter weights which bring huge computation and communication overheads. A general purpose instruction set architecture (ISA) is flexible but has low code density and high power consumption. The existing CNN-specific accelerators are much more efficient but usually are inflexible or require a complex controller to handle the computation and data transfer of different CNNs. In this brief, we propose a new CNN-specific ISA which embeds the parallel computation and data reuse parameters in the instructions. An instruction generator deploys the instruction parameters according to the feature of CNNs and hardware's computation and storage resources. In addition, a reconfigurable accelerator with 225 multipliers and 24 adder trees is realized to obtain efficient parallel computation and data transfer. Compared with x86 processors, our design has 392 times better energy efficiency and 16 times higher code density. Compared with other state-of-the-art accelerators, our solution has a higher flexibility to support all popular CNNs and a higher energy efficiency.