Decision trees

  • YaDT: Yet another Decision Tree builder. YaDT is a from-scratch implementation of the C4.5 entropy-based tree construction algorithm. It has been designed and implemented in C++ with strong emphasis on efficiency (time and space) and portability (Windows/Linux). It includes: decision tree building, tree simplification, bagging, random forest, feature selection, multi-core parallelism, Python 3 wrapper.

    Download software: YaDT 2.3.0 (April 2020) for Windows/Linux.

    Reference papers:

  • Efficient C4.5. Following an analytic evaluation of the run-time behavior of the C4.5 algorithm, EC4.5 is a more efficient version of the decision tree builder algorithm. It improves on C4.5 by adopting the best among three strategies for computing information gain of continuous attributes. EC4.5 computes the same decision trees as C4.5 with a performance gain of up to 5 times.

    Obtaining software: a patch from C4.5 release 8 to EC4.5 Beta 1.0 for Linux platforms is available here for educational/research domain. I suggest, however, to download the even faster YaDT tree builder.

    Reference paper: S. Ruggieri. Efficient C4.5. IEEE Transactions on Knowledge and Data Engineering. Vol 14, Issue 2, March-April 2002, 438-444.

    For other decision tree software see www.KDnuggets.com - Analytics and Data Mining Resources.

Estimation methods

eXplainable AI (XAI)

Discrimination and fairness in AI

  • DD: a Python library for group discrimination/unfairness discovery. A Python library with implementations of algorithms presented in various papers.

    Obtaining software: DD on GitHub.

    Reference paper:

  • dd: a Java library for group discrimination/unfairness discovery and data sanitization. A Java library with implementations of algorithms presented in various papers.

    Obtaining software: Ver. July 2015.

    Reference paper: S. Ruggieri. Using t-closeness anonymity to control for non-discrimination. Transactions on Data Privacy. Vol. 7, Issue 2, August 2014, 99-129.

  • k-NN for Discrimination Analysis. A variant of k-NN classification for individual discrimination discovery and prevention.

    Obtaining software: Ver. 0.1 for Windows.

    Reference paper: B. T. Luong, S. Ruggieri, F. Turini. k-NN as an Implementation of Situation Testing for Discrimination Discovery and Prevention. 17th ACM International Conference on Knowledge Discovery and Data Mining (KDD 2011): 502-510. ACM, August 2011.

  • [discontinued, see DD Python version]
    DCUBE: discrimination discovery in databases. An analytical tool for group discrimination discovery in databases through SQL-queries over an Oracle database of classification rules.

    Website: DCUBE web site on webarchive.org.

    Reference paper: S. Ruggieri, D. Pedreschi, F. Turini. DCUBE: Discrimination Discovery in Databases. ACM International Conference on Management of Data (SIGMOD 2010): 1127-1130. ACM, June 2010.

  • LP2DD: logic programming to discover discrimination. A data mining + logic programming system intended as an analytical tool supporting DSS owners and control authorities in the interactive and iterative process of group discrimination discovery.

    Obtaining software: LP2DD 1.0 (April 2009) for SWI Prolog on Windows.

    Reference paper: S. Ruggieri, D. Pedreschi, F. Turini. Integrating induction and deduction for finding evidence of discrimination. Artificial Intelligence and Law. Vol 18, Issue 1, March 2010, 1-43.

Segregation discovery

Polyhedral analysis

  • Learning from Polyhedral Sets. The software implements a learning procedure that abstracts a collection of polyhedra (solutions of linear systems) to a minimal and representative parameterized linear systems. Checking whether a given polyhedron is obtainable by some parameter instance (and computing such values) is also implemented. The software is written in SWI Prolog.

    Obtaining software: lps 1.0 (August 2013).

    Reference papers:

  • Typing linear constraints and moding CLP(R) programs. The software implements a type system for linear constraints and a well-moding checker for CLP(R) programs.

    Obtaining software: clpt 1.3 beta (November 2008).

    Reference paper: S. Ruggieri, F. Mesnard Typing linear constraints. ACM Transactions on Programming Languages and Systems. Vol 32, Issue 6, July 2010, Article 21.


Environments for Knowledge Discovery in Databases

  • KDD Markup Language - Mining Query Language. KDDML-MQL is an environment that supports the specification and execution of complex Knowledge Discovery in Databases (KDD) processes in the form of high-level queries. The environment is made of two layers, the bottom one called KDDML and the top one called MQL.

    Obtaining software: visit the KDDML/MQL web site.

    Reference paper: A. Romei, S. Ruggieri, F. Turini KDDML: a middleware language and system for knowledge discovery in databases. Data and Knowledge Engineering. Vol 57, Issue 2, May 2006, 179-220.