摘要

Multimedia computation has clearly become a primary and demanding application segment for new architectures targeted at portable devices. The main challenge for such architectures is to keep pace with the computational requirements of ever evolving media standards and applications while satisfying the power and energy consumption required to leverage smaller form-factors and longer battery lifetimes. One technique aimed at reducing both the energy consumption and the execution time of an application is Reuse. This technique memorizes the outcome of an instruction or set of instructions so that we can reuse it the next time we perform the same operation with the same inputs. In this paper, we analyze a region reuse schema specially focused on multimedia applications. While the technique appears to be, in theory, a promising vehicle to improve both timing and energy for low-end media applications, we will show that the extra hardware cost required becomes a severe shortcoming as we find the undesirable situation where we have to consume more energy in order to reduce the execution time (hence becoming a poor power-oriented solution). To mitigate the overhead of the reuse hardware, we advocate for exploiting a third variable in the power-time trade-off and we evaluate tolerant region reuse, a technique that relies in the tolerance in the output precision of media algorithms to improve reuse. With this technique, we afford to use less consuming hardware structures that drives benefits in both energy and timing. As a trade-off, tolerant region reuse introduces non-noticeable errors in the output data. The main drawback of tolerant region reuse is the strong reliance on application profiling, the need for careful tuning from the application developer, and the inability of the technique to adapt to the variability of the media contents being used as inputs. To address that inflexibility, we introduce dynamic tolerant region reuse. This novel technique overcomes the drawbacks of tolerant region reuse by allowing the hardware to study the precision quality of the region reuse output. Our mechanism allows the programmer to grant a minimum threshold on signal-to-noise ratio (SNR) while letting the technique adapt to the characteristics of the specific application and workload to minimize time and energy consumption. This leads to greater energy-delay savings while keeps output error below noticeable levels, avoiding at the same time the need of profiling. We studied our mechanism applied to a set of three different processors, from low to high end. As we will show our technique leads to consistent performance improvements in all of our benchmark programs while reducing energy consumption. We can report savings up to 30 percent in the energy*delay factor for all three processors.

  • 出版日期2012-5