摘要

System designers exploit cache to make up for performance gap between CPU and main memory. Since data access speed of cache is much faster than that of memory, it is important for database operations to take maximum advantage of cache to obtain higher performance. Disk-based join operation is a common but time-consuming database operation. Unfortunately, most of traditional join algorithms do not take cache into consideration. This paper analyzes low cache utilization problem in traditional join algorithms and proposes a disk-based cache-conscious join algorithm DBCC-Join. Join positional index pair table (JPIPT) is a data structure which specifies the positional index pairs of join tuples in each table. The execution of DBCC-Join consists of two stages: JPIPT construction stage and result output stage. JPIPT construction stage performs cache-conscious construction algorithm on join attributes which are kept in column-oriented model, to obtain join positional index pair table of join results. The obtained JPIPT is used in result output stage to retrieve results in a one-pass sequential scan on each table. To the best of our knowledge, this paper is the first to exploit cache to improve performance of disk-based join algorithm. Experimental results show that compared to traditional join algorithms, DBCC-Join can be improved by a factor of an order of magnitude.

  • 出版日期2010

全文