摘要

Lambda paradigm, also known as Function as a Service (FaaS), is a novel event-driven concept that allows companies to build scalable and reliable enterprise applications in an off-premise computing data-center as a serverless solution. In practice, however, an important goal for the service provider of a Lambda platform is to devise an efficient way to consolidate multiple Lambda functions in a single host. While the majority of existing resource management solutions use only operating-system level metrics (e.g., average utilization of computing and I/O resources) to allocate the available resources among the submitted workloads in a balanced way, a resource allocation schema that is oblivious to the issue of shared-resource contention can result in a significant performance variability and degradation within the entire platform. This paper proposes a predictive controller scheme that dynamically allocates resources in a Lambda platform. This scheme uses a prediction tool to estimate the future rate of every event stream and takes into account the quality of service enforcements requested by the owner of each Lambda function. This is formulated as an optimization problem where a set of cost functions are introduced (i) to reduce the total QoS violation incidents; (ii) to keep the CPU utilization level within an accepted range; and (iii) to avoid the fierce contention among collocated applications for obtaining shared resources. Performance evaluation is carried out by comparing the proposed solution with an enhanced interference-aware version of three well-known heuristics, namely spread, binpack (the two native clustering solutions employed by Docker Swarm) and best-effort resource allocation schema. Experimental results show that the proposed controller improves the overall performance (in terms of reducing the end-to-end response time) by 14.9 percent on average compared to the best result of the other heuristics. The proposed solution also increases the overall CPU utilization by 18 percent on average (for lightweight workloads), while achieves an average 87 percent (maximum 146 percent) improvement in preventing QoS violation incidents.

  • 出版日期2018-7-1