Decision trees
-
YaDT: Yet another Decision Tree builder.
YaDT is a from-scratch implementation of the C4.5 entropy-based tree construction
algorithm. It has been designed and implemented in C++ with strong emphasis on
efficiency (time and space) and portability (Windows/Linux). It includes: decision
tree building, tree simplification, bagging, random forest, feature selection,
multi-core parallelism, Python 3 wrapper.
Download software: YaDT 2.3.0 (April 2020) for Windows/Linux.
Reference papers:
- S. Ruggieri YaDT: Yet another Decision Tree builder. 16th International Conference on Tools with Artificial Intelligence (ICTAI 2004): 260-265. IEEE Press, November 2004.
- S. Ruggieri. Subtree Replacement in Decision Tree Simplification. 12th SIAM Conference on Data Mining (SDM 2012): 379-390. SIAM, April 2012.
- M. Aldinucci, S. Ruggieri, M. Torquati. Decision tree building on multi-core using FastFlow. Concurrency and Computation: Practice & Experience. Vol. 26, Issue 3, March 2014, 800-820.
- S. Ruggieri. Enumerating Distinct Decision Trees. International Conference on Machine Learning (ICML 2017) JMLR Workshop and Conference Proceedings, 70, August 2017, 2960-2968.
- S. Ruggieri. Complete Search for Feature Selection in Decision Trees. Journal of Machine Learning Research. Vol. 20, Article No 104, 2019.
-
Efficient C4.5.
Following an analytic evaluation of the run-time behavior of the C4.5
algorithm, EC4.5 is a more efficient version of the decision tree builder
algorithm. It improves on C4.5 by adopting the best
among three strategies for computing information gain of continuous
attributes. EC4.5 computes the same decision trees as C4.5 with a
performance gain of up to 5 times.
Obtaining software: a patch from C4.5 release 8 to EC4.5 Beta 1.0 for Linux platforms is available here for educational/research domain. I suggest, however, to download the even faster YaDT tree builder.
Reference paper: S. Ruggieri. Efficient C4.5. IEEE Transactions on Knowledge and Data Engineering. Vol 14, Issue 2, March-April 2002, 438-444.
For other decision tree software see www.KDnuggets.com - Analytics and Data Mining Resources.
Estimation methods
-
QVolume: Estimating the Total Volume of Queries to a Search Engine.
Obtaining software: QVolume on GitHub.
Reference papers:
- F. Lillo, S. Ruggieri. Estimating the Total Volume of Queries to a Search Engine. IEEE Transactions on Knowledge and Data Engineering. Vol. 34 Issue 11, November 2022, 5351-5363.
- F. Lillo, S. Ruggieri. Estimating the Total Volume of Queries to Google. 28th World Wide Web Conference on World Wide Web (WebConf 2019) : 1051-1060. ACM, May 2019.
eXplainable AI (XAI)
-
X-SPELLS - Explaining Short Text Classification with Diverse Synthetic Exemplars and Counter-Exemplars.
A Python library for local explanation of short text classifiers.
Obtaining software: X-SPELLS on GitHub.
Reference paper:
- O. Lampridis, L. State, R. Guidotti, S. Ruggieri. Explaining short text classification with diverse synthetic exemplars and counter-exemplars. Machine Learning Journal. Vol. 112, 4289–4322, 2023.
-
LORE: LOcal Rule-based Exlanations.
A Python library for local explanation of black boxes.
Obtaining software: LORE on GitHub.
Reference paper:
- R. Guidotti, A. Monreale, F. Giannotti, D. Pedreschi, S. Ruggieri, F. Turini. Factual and Counterfactual Explanations for Black-Box Decision Making. IEEE Intelligent Systems. Vol. 34, Issue 6, 14-23, Nov.-Dec. 2019.
-
ECE: Ensemble of Counterfactual Explainers.
A Python library for composing counterfactual explainers.
Obtaining software: ECE on GitHub.
Reference paper:
- R. Guidotti, S. Ruggieri. Ensemble of Counterfactual Explainers. Discovery Science (DS 2021). 358-368. Vol. 12986 of LNCS, Springer, October 2021.
-
InterpretableModels: Stability of Interpretable Models.
A Python library for evaluating the stability of interpretable AI models.
Obtaining software: InterpretableModels on GitHub.
Reference paper:
- R. Guidotti, S. Ruggieri. On The Stability of Interpretable Models. International Joint Conference on Neural Networks (IJCNN 2019) : paper N-19575. IEEE, July 2019.
Discrimination and fairness in AI
-
DD: a Python library for group discrimination/unfairness discovery.
A Python library with implementations of algorithms presented in various papers.
Obtaining software: DD on GitHub.
Reference paper:
- S. Ruggieri, D. Pedreschi, F. Turini. Data mining for discrimination discovery. ACM Transactions on Knowledge Discovery from Data. Vol 4, Issue 2, May 2010, Article 9.
-
dd: a Java library for group discrimination/unfairness discovery and data sanitization.
A Java library with implementations of algorithms presented in various papers.
Obtaining software: Ver. July 2015.
Reference paper: S. Ruggieri. Using t-closeness anonymity to control for non-discrimination. Transactions on Data Privacy. Vol. 7, Issue 2, August 2014, 99-129.
- [discontinued, see DD Python version]
k-NN for Discrimination Analysis. A variant of k-NN classification for individual discrimination discovery and prevention.Obtaining software: Ver. 0.1 for Windows.
Reference paper: B. T. Luong, S. Ruggieri, F. Turini. k-NN as an Implementation of Situation Testing for Discrimination Discovery and Prevention. 17th ACM International Conference on Knowledge Discovery and Data Mining (KDD 2011): 502-510. ACM, August 2011.
- [discontinued, see DD Python version]
DCUBE: discrimination discovery in databases. An analytical tool for group discrimination discovery in databases through SQL-queries over an Oracle database of classification rules.Website: DCUBE web site on webarchive.org.
Reference paper: S. Ruggieri, D. Pedreschi, F. Turini. DCUBE: Discrimination Discovery in Databases. ACM International Conference on Management of Data (SIGMOD 2010): 1127-1130. ACM, June 2010.
- [discontinued, see DD Python version]
LP2DD: logic programming to discover discrimination. A data mining + logic programming system intended as an analytical tool supporting DSS owners and control authorities in the interactive and iterative process of group discrimination discovery.Obtaining software: LP2DD 1.0 (April 2009) for SWI Prolog on Windows.
Reference paper: S. Ruggieri, D. Pedreschi, F. Turini. Integrating induction and deduction for finding evidence of discrimination. Artificial Intelligence and Law. Vol 18, Issue 1, March 2010, 1-43.
Segregation discovery
-
SCube: A Tool for Segregation Discovery.
Obtaining software: SCube on GitHub.
Reference papers:
- A. Baroni, S. Ruggieri. SCube: A Tool for Segregation Discovery. 22nd International Conference on Extending Database Technology (EDBT 2019): 542-545. OpenProceedings.org, March 2019.
- A. Baroni, S. Ruggieri. Segregation Discovery in a Social Network of Companies. Journal of Intelligent Information Systems. Vol. 51, Issue 1, August 2018, 71–96.
Polyhedral analysis
-
Learning from Polyhedral Sets.
The software implements a learning procedure that abstracts a collection of polyhedra (solutions of linear systems) to a minimal and representative parameterized linear systems.
Checking whether a given polyhedron is obtainable by some parameter instance (and computing such values) is also implemented. The software is written in SWI Prolog.
Obtaining software: lps 1.0 (August 2013).
Reference papers:
- S. Ruggieri. Learning from Polyhedral Sets. 23rd International Joint Conference on Artificial Intelligence (IJCAI 2013): 1069-1075. AAAI Press, August 2013.
- S. Ruggieri. Deciding Membership in a Class of Polyhedra. 20th European Conference on Artificial Intelligence (ECAI 2012): 702-707. IOS Press, August 2012.
-
Typing linear constraints and moding CLP(R) programs.
The software implements a type system for linear constraints and a well-moding
checker for CLP(R) programs.
Obtaining software: clpt 1.3 beta (November 2008).
Reference paper: S. Ruggieri, F. Mesnard Typing linear constraints. ACM Transactions on Programming Languages and Systems. Vol 32, Issue 6, July 2010, Article 21.
Environments for Knowledge Discovery in Databases
-
KDD Markup Language - Mining Query Language. KDDML-MQL is an environment that supports the specification and execution of
complex Knowledge Discovery in Databases (KDD) processes in the form of high-level
queries. The environment is made of two layers, the bottom one called KDDML and
the top one called MQL.
Obtaining software: visit the KDDML/MQL web site.
Reference paper: A. Romei, S. Ruggieri, F. Turini KDDML: a middleware language and system for knowledge discovery in databases. Data and Knowledge Engineering. Vol 57, Issue 2, May 2006, 179-220.