摘要

For scientific computing and high-resolution imaging applications, this paper presents a pipelined reconfigurable processor to implement variable-length single-precision floating-point FFT/IFFT and DCT/IDCT computations compatible with the IEEE 754 standard. In order to minimize the total hardware overhead and power consumption, a reconfigurable radix-4 butterfly (RR4BF) is proposed to reduce 75% adders in comparison to the conventional parallel radix-4 butterfly, and the partially shared Ping-Pong structured register bank (PSPPRB) provides an efficient and specific intermediate data caching mechanism to realize the maximized adder resource utilization ratio in RR4BF and to guarantee the high throughput for the pipelined design. Moreover, fused floating-point 4-input adder and fused floating-point 2-term dot product unit are proposed, which can not only improve about 3 dB signal-to quantization-noise ratio (SQNR), but also save 28% and 19% hardware overhead compared with discrete implementations and previous state-of-the-art design, respectively. Simulation results show that the latency for FFT computations is about 25% of the R4SDF design without any throughput loss, and over 139 dB SQNR is achieved. Logic synthesis results in a 65 nm CMOS technology show that the power consumption ranges from 43.5 mW to 372.3 mW for 16- to 1024-point FFTs at 500 MHz, and the total hardware overhead is equivalent to 543k NAND2 gates.