摘要

The line-based method has been one of the most commonly-used methods of hardware implementation of two-dimensional (2D) discrete wavelet transform (DWT). However, data buffer is required between the row DWT processor and the column DWT processor to solve the data flow mismatch, which increases the on-chip memory size and the output latency. Since the incompatible data flow is induced from the intrinsic property of adopted lifting-based algorithm, a decomposed lifting algorithm (DLA) is presented by rearranging the data path of lifting steps to ensure that image data is processed in raster scan manner in row processor and column processor. Theoretical analysis indicates that the precision issue of DLA outperforms other lifting-based algorithms in terms of round-off noise and internal word-length. A memory-efficient and high-performance line-based architecture is proposed based on DLA without the implementation of data buffer. For an N x M image, only 2N internal memory is required for 5/3 filter and 4N of that is required for 9/7 fitter to perform 2D DWT, where N and M indicate the width and height of an image, Compared with related 2D DWT architectures, the size of on-chip memory is reduced significantly under the same arithmetic cost, memory bandwidth and timing constraint. This design was implemented in SMIC 0.18 mu m CMOS logic fabrication with 32 kbits dual-port RAM and 20 K equivalent 2-input NAND gates in a 1.00 mm x 1.00 mm die, which can process 512 x 512 image under 100 MHz.

全文