KDDML (KDD Markup Language) is a middleware language and system designed to support the development of final applications or higher level systems which deploy a mixture of data access, data preprocessing, data mining models extraction and deployment. The KDDML language is XML-based, both for query syntax and data/model representation. A KDDML query is an XML-document where XML tags corresponds to operations on data/models, XML attributes corresponds to parameters of those operations and XML sub-elements define arguments passed to the operators. We will survey operators on data access and preprocessing, model extraction and deployment, and control flow ones. The core of the KDDML system is a KDDML language interpreter with modularity and extensibility requirements as the main goals. Additional data sources, and preprocessing and mining algorithms can be easily plugged-in the system. The KDDML system is implemented in Java and includes of a graphical user interface for editing queries.


The project leader is Prof. Franco Turini. Present researchers on the language and system are Andrea Romei, Miriam Baglioni, Salvatore Ruggieri, Valerio Grossi and Elena Ratti. Past students and researchers working on the project include: Piero Alcamo, Francesco Domenichini, Davide Bruno, Daniele Cerra, Claudia De Angeli, Angela Fazio, Daniela Iozzia, Francesca Pietra, Sandra Zimei and Marlis Valentini.

The KDDML team is part of the Pisa KDD Laboratory.


KDDML Manual and API

Reference papers

Past papers and slides

Master theses (in italian)

The KDDML System

The KDDML system is being developed using Java as programming language, WEKA as data mining library, XML (and related XSL, DOM technologies) as representation language, IBM Alphaworks XML4J as DTD parser, IBM Lotus XSL as XSLT processor, and XQuery as XML query language using the Qizx implementation. Also, the system can access external data sources, including Microsoft SQL Server, PostgreSQL, and Oracle databases.



Related Links

  • Related Systems

  • The most related environment is the YALE system, which allows for composing data preprocessing and mining algorithms.

  • XML as representation language for data mining models

    XML (see W3C, XML.org, XML.com, XML at Microsoft) has been adopted to represent data mining models by the PMML (Predictive Modeling Markup Language) standard. The KDDML system adopts a representation that is quite similar to PMML, yet with some syntactical differences due to hystorical reasons (the core of KDDML was designed about at the same time of PMML).

  • XML Query Languages

  • Current version of KDDML adopts the Qizx implementation of XQuery. A complete list of alternative implementation is available from W3C. A description of the core of XQuery can be found in this paper. Also, see the XQuery home page at W3C. Comparison of XML query languages are reported in the following papers: In addtion to XQuery and to the query languages mentioned in the papers above, recent XML query language proposal include: EquiX, Quilt, TQL, XQL, Xylemme, and many other.

  • SQL Extensions including data mining operators

  • Several extensions of SQL include data mining operators. We recall here: MSQL (vol.3 issue 4), MineRule (vol.2 issue 2), MineSQL, NonStop SQL/MX, PanQ, Inductive Database Framework, OLE DB for UDA and for DM, and MINE RULE.

    Last updated: June 2005, Contact us.