YaDT: Yet another Decision Tree builder.
YaDT is a from-scratch implementation of the C4.5 entropy-based tree construction
algorithm. It has been designed and implemented in C++ with strong emphasis on
efficiency (time and space) and portability (Windows/Linux). It includes: decision
tree building, tree simplification, bagging, random forest, feature selection,
multi-core parallelism, Python 3 wrapper.
Download software: YaDT 2.3.0 (April 2020) for Windows/Linux.
- S. Ruggieri YaDT: Yet another Decision Tree builder. 16th International Conference on Tools with Artificial Intelligence (ICTAI 2004): 260-265. IEEE Press, November 2004.
- S. Ruggieri. Subtree Replacement in Decision Tree Simplification. 12th SIAM Conference on Data Mining (SDM 2012): 379-390. SIAM, April 2012.
- M. Aldinucci, S. Ruggieri, M. Torquati. Decision tree building on multi-core using FastFlow. Concurrency and Computation: Practice & Experience. Vol. 26, Issue 3, March 2014, 800-820.
- S. Ruggieri. Enumerating Distinct Decision Trees. International Conference on Machine Learning (ICML 2017) JMLR Workshop and Conference Proceedings, 70, August 2017, 2960-2968.
- S. Ruggieri. Complete Search for Feature Selection in Decision Trees. Journal of Machine Learning Research. Vol. 20, Article No 104, 2019.
Following an analytic evaluation of the run-time behavior of the C4.5
algorithm, EC4.5 is a more efficient version of the decision tree builder
algorithm. It improves on C4.5 by adopting the best
among three strategies for computing information gain of continuous
attributes. EC4.5 computes the same decision trees as C4.5 with a
performance gain of up to 5 times.
Obtaining software: a patch from C4.5 release 8 to EC4.5 Beta 1.0 for Linux platforms is available here for educational/research domain. I suggest, however, to download the even faster YaDT tree builder.
Reference paper: S. Ruggieri. Efficient C4.5. IEEE Transactions on Knowledge and Data Engineering. Vol 14, Issue 2, March-April 2002, 438-444.
For other decision tree software see www.KDnuggets.com - Analytics and Data Mining Resources.
QVolume: Estimating the Total Volume of Queries to a Search Engine.
Obtaining software: on GitHub.
- F. Lillo, S. Ruggieri. Estimating the Total Volume of Queries to a Search Engine. IEEE Transactions on Knowledge and Data Engineering. To appear. DOI: 10.1109/TKDE.2021.3054668.
- F. Lillo, S. Ruggieri. Estimating the Total Volume of Queries to Google. 28th World Wide Web Conference on World Wide Web (WebConf 2019) : 1051-1060. ACM, May 2019.
SCube: A Tool for Segregation Discovery.
Obtaining software: on GitHub.
- A. Baroni, S. Ruggieri. SCube: A Tool for Segregation Discovery. 22nd International Conference on Extending Database Technology (EDBT 2019): 542-545. OpenProceedings.org, March 2019.
- A. Baroni, S. Ruggieri. Segregation Discovery in a Social Network of Companies. Journal of Intelligent Information Systems. Vol. 51, Issue 1, August 2018, 71–96.
dd: a Java library for discrimination discovery and sanitization.
A Java library with implementations of algorithms presented in various papers.
Obtaining software: Ver. July 2015.
Reference paper: S. Ruggieri. Using t-closeness anonymity to control for non-discrimination. Transactions on Data Privacy. Vol. 7, Issue 2, August 2014, 99-129.
- k-NN for Discrimination Analysis.
A variant of k-NN classification for discrimination discovery and prevention.
Obtaining software: Ver. 0.1 for Windows.
Reference paper: B. T. Luong, S. Ruggieri, F. Turini. k-NN as an Implementation of Situation Testing for Discrimination Discovery and Prevention. 17th ACM International Conference on Knowledge Discovery and Data Mining (KDD 2011): 502-510. ACM, August 2011.
DCUBE: discrimination discovery in databases.
An analytical tool for discrimination discovery in databases through SQL-queries over
an Oracle database of classification rules.
Obtaining software: visit the DCUBE web site.
Reference paper: S. Ruggieri, D. Pedreschi, F. Turini. Data mining for discrimination discovery. ACM Transactions on Knowledge Discovery from Data. Vol 4, Issue 2, May 2010, Article 9.
Demo paper: S. Ruggieri, D. Pedreschi, F. Turini. DCUBE: Discrimination Discovery in Databases. ACM International Conference on Management of Data (SIGMOD 2010): 1127-1130. ACM, June 2010. Honorable mention at SIGMOD 2010 Best-demo Award Competition.
LP2DD: logic programming to discover discrimination.
A data mining + logic programming system intended as an analytical tool supporting DSS owners and
control authorities in the interactive and iterative process of discrimination discovery.
Obtaining software: LP2DD 1.0 (April 2009) for SWI Prolog on Windows.
Reference paper: S. Ruggieri, D. Pedreschi, F. Turini. Integrating induction and deduction for finding evidence of discrimination. Artificial Intelligence and Law. Vol 18, Issue 1, March 2010, 1-43.
Learning from Polyhedral Sets.
The software implements a learning procedure that abstracts a collection of polyhedra (solutions of linear systems) to a minimal and representative parameterized linear systems.
Checking whether a given polyhedron is obtainable by some parameter instance (and computing such values) is also implemented. The software is written in SWI Prolog.
Obtaining software: lps 1.0 (August 2013).
- S. Ruggieri. Learning from Polyhedral Sets. 23rd International Joint Conference on Artificial Intelligence (IJCAI 2013): 1069-1075. AAAI Press, August 2013.
- S. Ruggieri. Deciding Membership in a Class of Polyhedra. 20th European Conference on Artificial Intelligence (ECAI 2012): 702-707. IOS Press, August 2012.
Typing linear constraints and moding CLP(R) programs.
The software implements a type system for linear constraints and a well-moding
checker for CLP(R) programs.
Obtaining software: clpt 1.3 beta (November 2008).
Reference paper: S. Ruggieri, F. Mesnard Typing linear constraints. ACM Transactions on Programming Languages and Systems. Vol 32, Issue 6, July 2010, Article 21.
Environments for Knowledge Discovery in Databases
KDD Markup Language - Mining Query Language. KDDML-MQL is an environment that supports the specification and execution of
complex Knowledge Discovery in Databases (KDD) processes in the form of high-level
queries. The environment is made of two layers, the bottom one called KDDML and
the top one called MQL.
Obtaining software: visit the KDDML/MQL web site.
Reference paper: A. Romei, S. Ruggieri, F. Turini KDDML: a middleware language and system for knowledge discovery in databases. Data and Knowledge Engineering. Vol 57, Issue 2, May 2006, 179-220.