A path to achieving a self-managed Grid middleware

作者:Nou Ramon*; Julia Ferran; Hogan Kevin; Torres Jordi
来源:Future Generation Computer Systems, 2011, 27(1): 10-19.
DOI:10.1016/j.future.2010.07.002

摘要

Tantamount to the overall performance delivered by a Grid environment is the quality of the middleware on which distributed Grid applications can run. Due to its complex nature, this middleware can be difficult to investigate in full detail and can also be problematic to tune efficiently, especially when running on a production type environment.
Thanks to the BSC Monitoring Framework, a set of tools that can instrument and analyze Java applications as well as the entire system, we were able to undertake both global and fine-grained investigation into one of the most popular Grid middleware of the moment, Globus Toolkit 4. The steps taken, revealed some interesting findings and resulted in the detection of some job management problems in this middleware. Primarily, the main issue was that it was possible to reach a situation which caused jobs to be lost on the node due to an overloading amount of jobs being processed by the system. Again, the BSC-MF was used to investigate this issue further and helped extract a possible solution to prevent the node becoming a point of contention in the architecture. A simple but effective policy was formulated, which prioritized the finishing and acceptance of jobs over the response time and throughput, and was evaluated as a solution to the problem.
It was determined that, due to the dynamic nature of the problem, it could be best resolved by adding self-managing capabilities to the middleware. Using the new policy, a prototype of an autonomous system was built and succeeded in allowing more jobs to be accepted and finished correctly. The improvement over the original GT4 middleware was significant and resulted in better performance by a factor of 30%.
The path from investigation to development, as described in this paper, might serve as a guide to others involved in the field who are interested in extracting knowledge about a Grid node, extending the Grid middleware or adding self-managing behaviour to their applications.

  • 出版日期2011-1