YaDT version 1.2.1
Yet another Decision Tree builder
(c) Salvatore Ruggieri,2002-2005
http://www.di.unipi.it/~ruggieri/YaDT
Decision trees are widely used classification models in
data/web mining suites and vertical applications. A recent paper [1]
describes EC4.5, an implementation of entropy-based decision tree construction
algorithms, which vastly improves efficiency over the well-known C4.5 system [2].
The efficiency improvements
reduce the tree construction time up to 1/5 of the time needed by C4.5, yet the same
amount of memory being required.
Based on the achievements of [1] and on many further optimizations, a new
from-scratch implementation
of the entropy-based tree construction algorithm has been designed and implemented
in C++. This new implementation, called YaDT, provides the
benefits of:
and still
it requires a tree construction time up to 1/4 of time needed by EC4.5,
it requires a tree construction main-memory up to 1/3 of memory required by EC4.5.
Summarizing, YaDT is among the world fastest main-memory implementations
of entropy-based tree construction algorithms.
As an example, the following figure compares YaDT with EC4.5 from [1] both in time and
memory (on a Pentium III 650 Mhz computer, Windows 2000) on the forest cover type dataset
(obtained from [4]), which consists of 70Mb of
input with 581.012 cases, each with 54 attributes.
For any further information, please contact:
Salvatore Ruggieri
Dept. of Computer Science,
University of Pisa,
Via F. Buonarroti 2, 56100 Pisa, ITALY
ruggieri@di.unipi.it
References
[0] S. Ruggieri. YaDT: Yet another Decision Tree builder,
Proceedings of the 16th International Conference on Tools with Artificial
Intelligence (ICTAI 2004): 260-265. IEEE Press, November 2004.
[1] S. Ruggieri. Efficient C4.5,
IEEE Transactions on Knowledge and Data Engineering, 14(2):438-444, March-April 2002.
[2] J.R.Quinlan.
C4.5: Programs for Machine Learning, Morgan Kaufmann 1993
[3] Data Mining Group. Predictive Model Markup Language (PMML), version 2.0,
http://www.dmg.org
[4] S. Hettich and S.D. Bay. The UCI KDD Archive,
Irvine, CA: University of California, Department of Information and Computer Science.
http://kdd.ics.uci.edu