摘要

A stream processor is a power-efficient, high-level-language programmable option for embedded applications that are computation intensive and admit high levels of data parallelism. Many signal-processing algorithms for communications are well matched to stream-processor architectures, including partially parallel implementations of layered decoding algorithms such as the turbo-decoding message-passing (TDMP) algorithm. Communication among clusters of functional units in the stream processor impose a latency cost during both the message-passing phase and the parity-check phase of the TDMP algorithm with early termination; the inter-cluster communications latency is a significant factor in limiting the throughput of the decoder. We consider two modifications of the schedule for the TDMP algorithm with early termination; each halves the communication required between functional-unit clusters of the stream processor in each iteration. We show that these can provide a substantial increase in the information throughput of the decoder without increasing the probability of error.

  • 出版日期2012

全文