YaDT version 1.2.1
Yet another Decision Tree builder

(c) Salvatore Ruggieri,2002-2005
http://www.di.unipi.it/~ruggieri/YaDT

Decision trees are widely used classification models in data/web mining suites and vertical applications. A recent paper [1] describes EC4.5, an implementation of entropy-based decision tree construction algorithms, which vastly improves efficiency over the well-known C4.5 system [2]. The efficiency improvements reduce the tree construction time up to 1/5 of the time needed by C4.5, yet the same amount of memory being required. Based on the achievements of [1] and on many further optimizations, a new from-scratch implementation of the entropy-based tree construction algorithm has been designed and implemented in C++. This new implementation, called YaDT, provides the benefits of:
  • a structured object-oriented programming implementation;
  • portable code, which has been tested under
  • Microsoft Windows 2000/XP,
  • Red Hat Linux 7.2;
  • a documented C++ library of classes;
  • access to DBMS via Microsoft ADO (only under Microsoft Windows);
  • PMML [3] compliant XML output of trees;
  • compressed binary ouput/input of trees;
  • further tree construction features: case weightings, hould-out, error-based pruning;
  • as examples of use of the dT classes, a command line tree builder and a Java GUI are also part of the dT distribution;
  • and still
  • it requires a tree construction time up to 1/4 of time needed by EC4.5,
  • it requires a tree construction main-memory up to 1/3 of memory required by EC4.5.
  • Summarizing, YaDT is among the world fastest main-memory implementations of entropy-based tree construction algorithms. As an example, the following figure compares YaDT with EC4.5 from [1] both in time and memory (on a Pentium III 650 Mhz computer, Windows 2000) on the forest cover type dataset (obtained from [4]), which consists of 70Mb of input with 581.012 cases, each with 54 attributes.

    For any further information, please contact:

    Salvatore Ruggieri
    Dept. of Computer Science,
    University of Pisa,
    Via F. Buonarroti 2, 56100 Pisa, ITALY
    ruggieri@di.unipi.it

    References

    [0] S. Ruggieri. YaDT: Yet another Decision Tree builder, Proceedings of the 16th International Conference on Tools with Artificial Intelligence (ICTAI 2004): 260-265. IEEE Press, November 2004.

    [1] S. Ruggieri. Efficient C4.5, IEEE Transactions on Knowledge and Data Engineering, 14(2):438-444, March-April 2002.

    [2] J.R.Quinlan. C4.5: Programs for Machine Learning, Morgan Kaufmann 1993

    [3] Data Mining Group. Predictive Model Markup Language (PMML), version 2.0, http://www.dmg.org

    [4] S. Hettich and S.D. Bay. The UCI KDD Archive, Irvine, CA: University of California, Department of Information and Computer Science. http://kdd.ics.uci.edu