KDDML (KDD Markup Language) is a middleware language and system
designed to support the development of final applications or
higher level systems which deploy a mixture of data access, data
preprocessing, data mining models extraction and deployment.
The KDDML language is XML-based, both for query syntax and
data/model representation. A KDDML query is an XML-document where
XML tags corresponds to operations on data/models, XML attributes
corresponds to parameters of those operations and XML sub-elements
define arguments passed to the operators. We will survey operators
on data access and preprocessing, model extraction and deployment,
and control flow ones.
The core of the KDDML system is a KDDML language interpreter with
modularity and extensibility requirements as the main goals.
Additional data sources, and preprocessing and mining algorithms
can be easily plugged-in the system.
The KDDML system is implemented in Java and includes of a graphical
user interface for editing queries.
The project leader is Prof. Franco Turini.
Present researchers on the language and system are
Andrea Romei,
Miriam Baglioni,
Salvatore Ruggieri,
Valerio Grossi
and
Elena Ratti.
Past students and researchers working on the project include:
Piero Alcamo, Francesco Domenichini,
Davide Bruno, Daniele Cerra, Claudia De
Angeli, Angela Fazio, Daniela Iozzia, Francesca Pietra,
Sandra Zimei
and Marlis Valentini.
The KDDML team is part of the
Pisa KDD Laboratory.
KDDML Manual and API
Reference papers
Past papers and slides
- P. Alcamo, F. Domenichini, F. Turini.
An XML based environment in support of the overall
KDD process, 2000. This is an extended version of the paper appeared
in
Proceedings of the Fourth International Conference on
Flexible Query Answering Systems. Physica-Verlag Heidelberg
New York, 413-424.
- A. Romei
Inside KDDML, October 2002.
Seminar slides on the KDDML internals (in italian).
Master theses (in italian)
- Piero Alcamo, Francesco Domenichini. Un ambiente basato su XML per l'estrazione di
conoscenza, 2000.
- Miriam Baglioni MQL: una proposta di query language per Data Mining, 2001.
- Davide Bruno Estensione e sperimentazione di un sistema di knowledge discovery basato su XML, 2001.
- Andrea Romei Implementazione di un query language per knowledge discovery, 2002.
- Francesca Pietra Estrazione di conoscenza da testi letterari annotati, 2002.
- Claudia De Angeli, Angela Fazio Implementazione parallela di un ambiente di knowledge discovery, 2003.
- Elena Ratti Estensione del sistema KDDML per la scoperta degli itemsets frequenti, 2003.
- Marlis Valentini DPL: un formalismo algebrico per la specifica
della preparazione dei dati all'estrazione di conoscenza, 2003.
- Daniela Iozzia Studio, progettazione ed implementazione di un algoritmo per il calcolo di sequential patterns, 2003.
- Sandra Zimei KDDML: Estensione alla fase di Preprocessing, 2004.
- Daniele Cerra Estensione del linguaggio e del sistema KDDML con operatori per pattern sequenziali, 2005.
- Valerio Grossi Linguaggi grafici per Knoweldge Discovery, 2006.
The KDDML system is being developed using
Java as programming language,
WEKA as data mining library,
XML (and related
XSL,
DOM technologies) as representation language,
IBM Alphaworks XML4J as
DTD parser,
IBM Lotus XSL as XSLT processor, and
XQuery as XML query language using the
Qizx implementation.
Also, the system can access external data sources,
including Microsoft SQL Server,
PostgreSQL,
and Oracle databases.
Features
- Data Access: Relational Databases via JDBC, ARFF Text Files, C4.5 Text Files.
- Mining Model Access: PMML 2.0 standard.
- Data Preprocessing: Attribute Manipulation, Data Reduction and Sampling,
Data Discretization,
Data Cleaning,
Data Transformation.
- Data Mining Algorithms:
Association Rules,
Classification,
Clustering,
Sequential Patterns.
- Postprocessing:
Model Filtering,
Model Application,
Model Evaluation,
Model Meta-Reasoning.
- XQuery and Control Flow Operators.
Download
Related Systems
The most related environment is the YALE
system, which allows for composing data preprocessing and mining
algorithms.
XML as representation language for data mining models
XML (see W3C,
XML.org,
XML.com,
XML at Microsoft)
has been adopted to represent data mining
models by the PMML (Predictive Modeling Markup Language)
standard. The KDDML system adopts a representation that is quite similar to PMML, yet
with some syntactical differences due to hystorical reasons (the core of KDDML was designed about at the same
time of PMML).
XML Query Languages
Current version of KDDML adopts the Qizx
implementation of XQuery.
A complete list of alternative implementation is
available from W3C.
A description of the core of XQuery can be found
in this paper.
Also, see the XQuery home page at W3C.
Comparison of XML query languages are reported in the following
papers:
In addtion to XQuery and to the query languages mentioned in the papers above, recent
XML query language proposal include:
EquiX,
Quilt,
TQL,
XQL,
Xylemme,
and many other.
SQL Extensions including data mining operators
Several extensions of SQL include data mining operators. We recall here:
MSQL (vol.3 issue 4),
MineRule (vol.2 issue 2),
MineSQL,
NonStop SQL/MX,
PanQ,
Inductive Database Framework,
OLE DB for UDA and
for DM,
and MINE RULE.
Last updated: June 2005, Contact us.