摘要

This article proposes an effective way of implementing a multiply accumulate circuit (MAC) for high-speed floating point arithmetic operations. The real-world applications related to digital signal processing and the like demand high-performance computation with greater accuracy. In general, digital signals are represented as a sequence of signed/unsigned fixed/floating point numbers. The final result of a MAC operation can be computed by feeding the mantissa of the previous MAC result as one of the partial products to a Wallace tree multiplier or Braun multiplier. Thus, the separate accumulation circuit can be avoided by keeping the circuit depth still within the bounds of the Wallace tree multiplier, namely O(log(2) n), or Braun multiplier, namely O(n). In this article, three kinds of floating point MACs are proposed. The experimental results show 48.54% of improvement in worst path delay achieved by the proposed floating point MAC using a radix-2 Wallace structure compared with a conventional floating point MAC without a pipeline using a 45nm technology library. The same proposed design gives 39.92% of improvement in worst path delay without a pipeline using a radix-4 Braun structure as compared with a conventional design. In this article, a radix-32 Q(32.32)-format-based floating point MAC is proposed using a Wallace tree/Braun multiplier. Also this article discusses the msb prediction problem and its solution in floating point arithmetic that is not available in modern fused multiply-add designs. The performance results show comparisons between the proposed floating point MAC with various floating point MAC designs for radix-2,-4,-8, and -16. The proposed design has lesser depth than a conventional floating point MAC as well as a lower area requirement than other ways of floating point MAC implementation, both with/without a pipeline.

  • 出版日期2014-11