摘要

The method of using DBT (dynamic binary translation) to execute the source ISAs binary code on target platforms has been perplexed by low overhead for many years. GPU as a many-core processor has tremendous computational power. Employing GPU as a coprocessor to parallel execute the hot spot of binary code hold a great promise of substantially reduce the overhead of DBT. This paper presents a novel translation framework for constructing the virtual execution environment aiming at accelerating the process of DBT on CPU/GPU based architectures. With parallelizable parts (hot spots) of binary code and their related information, the framework converts the sequential code into PTX form and executes them on GPUs. Under the framework, we need not to rewrite the source code, and the binary compatibility issues between different GPUs are also resolved properly. Experimental results on several programs from CUDA SDK Code Samples and Parboil Benchmark Suite show that the framework can significantly improve the performance, usually have 10X speedup on average compared to X86 native platforms. Especially, when the scale of input become larger, the performance becomes even better.

全文