摘要

A novel systolic linear-array modular multiplier is presented which ideally performs the parallel modular-multiplication based on the algorithm of Montgomery. The total execution time for an n-bit modular-multiplication is 2n+11 clock cycles. To further increase the throughput the three-stage pipeline architecture is adopted inside the processing element, so that every one bit result outputs at one clock cycle when the pipeline is filled. Each pipeline stage only contains an operation of a one-bit full adder. Moreover, with purely nearest neighbor communication, the interconnect delay is also very short. Therefore, it can work at a high clock frequency. On the other hand, every processing element is simple, mainly consisting of four full adders and fourteen flip-flops. For n-bit modular-multiplication, the cost of the hardware is 46n+184 gates. So this novel linear systolic array for modular-multiplication is a speed and area optimized system, suitable for VLSI implementation. It can be used for modular-exponentiation which is a kernel operation in many public-key crypto-systems such as RSA. With a clock frequency of 200 MHz by using 0.8 μm CMOS technology, the throughput can reach 129 kb/s with a single modular multiplier chip.

全文