摘要

Emerging integrated CPU + FPGA hybrid platforms, such as the Extensible Processing Platform architecture from Xilinx [1], offer unprecedented opportunity to achieving both multifunctionality and real-time responsiveness for memory-intensive embedded applications. However, how to cost-effectively synthesize application-specific hardware constructs that fully exploit memory-level parallelism remains to be a key challenge. To address this problem, we propose a new FPGA-based embedded computer architecture, ASTRO (Application-Specific Hardware Traces with Reconfigurable Optimization). Our main contribution is the development of an integrated methodology that focuses on how to construct an application-specific memory access network capable of extracting the maximum amount of memory-level parallelism on a per-application basis. In particular, our proposed ASTRO architecture can (I) perform dynamic memory analysis to maximally extract the target application's instruction, loop and memory-level parallelism for performance enhancement, (2) synthesize highly efficient accelerators that enable parallelized memory accesses, and therefore (3) accomplish effective data orchestration by utilizing the capabilities of modern FPGA devices: abundant distributed block RAMs and reprogrammability. To empirically validate our ASTRO methodology, we have implemented a baseline embedded processor platform, a conventional CPU + accelerator with a centralized single memory, and a prototype ASTRO machine based on Xilinx MicroBlaze technology. Our experimental results show that on average for 10 benchmark applications from SPEC2006 and MiBench [2], the ASTRO machine achieves 8.6 times speedup compared to the baseline embedded processor platform and 1.7 times speedup compared to a conventional CPU + accelerator platform. More interestingly, the ASTRO platform achieves more than 40% reduction in energy-delay product compared to a conventional CPU + accelerator with a centralized memory.

  • 出版日期2015-10