Adaptive Database Schema Design for Multi-Tenant Data Management

作者:Ni Jiacai*; Li Guoliang; Wang Lijun; Feng Jianhua; Zhang Jun; Li Lei
来源:IEEE Transactions on Knowledge and Data Engineering, 2014, 26(9): 2079-2093.
DOI:10.1109/TKDE.2013.94

摘要

Multi-tenant data management is a major application of Software as a Service (SaaS). For example, many companies want to outsource their data to a third party that hosts a multi-tenant database system to provide data management services. The multi-tenant database system needs to have high performance, low space requirement, and excellent scalability. One big challenge is devising a high-quality database schema. Independent Tables Shared Instances (ITSI) and Shared Tables Shared instances (STSI) are two state-of-the-art approaches to designing the schema. However, they suffer from some limitations. ITSI has poor scalability since it needs to maintain large numbers of tables. STSI achieves good scalability at the expense of poor performance and high space overhead. Thus, an effective schema design method that addresses these problems is needed. In this paper, we propose an adaptive database schema design method for multi-tenant applications. We trade-off ITSI and STSI and find a balance between them to achieve good scalability and high performance with low space requirement. To this end, we identify the important attributes and use them to generate an appropriate number of base tables. For the remaining attributes, we construct supplementary tables. We discuss how to use the kernel matrix to determine the number of the base tables, apply graph-partitioning algorithms to construct the base tables, and evaluate the importance of attributes using the well-known PageRank algorithm. We propose a cost-based model to adaptively generate the base tables and supplementary tables. Our method has the following advantages. First, our method achieves high scalability. Second, our method achieves high performance and can trade-off the performance and space requirement. Third, our method can be easily applied to existing databases (e.g., MySQL) with minor revisions. Fourth, our method can adapt to any schemas and query workloads including both OLAP and OLTP applications. Experimental results on both real and synthetic datasets show that our method achieves high performance and good scalability with low space requirement and outperforms state-of-the-art methods.