Using automated planning for improving data mining processes

作者:Fernandez Susana*; de la Rosa Tomas; Fernandez Fernando; Suarez Ruben; Ortiz Javier; Borrajo Daniel; Manzano David
来源:Knowledge Engineering Review, 2013, 28(2): 157-173.
DOI:10.1017/S0269888912000409

摘要

This paper presents a distributed architecture for automating data mining (DM) processes using standard languages. DM is a difficult task that relies on an exploratory and analytic process of processing large quantities of data in order to discover meaningful patterns. The increasing heterogeneity and complexity of available data requires some expert knowledge on how to combine the multiple and alternative DM tasks to process the data. Here, we describe DM tasks in terms of Automated Planning, which allows us to automate the DM knowledge flow construction. The work is based on the use of standards that have been defined in both DM and automated-planning communities. Thus, we use PMML (Predictive Model Markup Language) to describe DM tasks. From the PMML, a problem description in PDDL (Planning Domain Definition Language) can be generated, so any current planning system can be used to generate a plan. This plan is, again, translated to a DM workflow description, Knowledge Flow for Machine Learning format (Knowledge Flow file for the WEKA (Waikato Environment for Knowledge Analysis) tool), so the plan or DM workflow can be executed in WEKA.

  • 出版日期2013-6