Discrimination discovery / Bias elicitation
dd: a Java library for discrimination discovery and sanitization.
A Java library with implementations of algorithms presented in various papers.
Obtaining software: Ver. July 2015.
Reference paper: S. Ruggieri. Using t-closeness anonymity to control for non-discrimination. Transactions on Data Privacy. Vol. 7, Issue 2, August 2014, 99-129.
- k-NN for Discrimination Analysis.
A variant of k-NN classification for discrimination discovery and prevention.
Obtaining software: Ver. 0.1 for Windows.
Reference paper: B. T. Luong, S. Ruggieri, F. Turini. k-NN as an Implementation of Situation Testing for Discrimination Discovery and Prevention. 17th ACM International Conference on Knowledge Discovery and Data Mining (KDD 2011): 502-510. ACM, August 2011.
DCUBE: discrimination discovery in databases.
An analytical tool for discrimination discovery in databases through SQL-queries over
an Oracle database of classification rules.
Obtaining software: visit the DCUBE web site.
Reference paper: S. Ruggieri, D. Pedreschi, F. Turini. Data mining for discrimination discovery. ACM Transactions on Knowledge Discovery from Data. Vol 4, Issue 2, May 2010, Article 9.
Demo paper: S. Ruggieri, D. Pedreschi, F. Turini. DCUBE: Discrimination Discovery in Databases. ACM International Conference on Management of Data (SIGMOD 2010): 1127-1130. ACM, June 2010. Honorable mention at SIGMOD 2010 Best-demo Award Competition.
LP2DD: logic programming to discover discrimination.
A data mining + logic programming system intended as an analytical tool supporting DSS owners and
control authorities in the interactive and iterative process of discrimination discovery.
Obtaining software: LP2DD 1.0 (April 2009) for SWI Prolog on Windows.
Reference paper: S. Ruggieri, D. Pedreschi, F. Turini. Integrating induction and deduction for finding evidence of discrimination. Artificial Intelligence and Law. Vol 18, Issue 1, March 2010, 1-43.
YaDT: Yet another Decision Tree builder.
YaDT is a new from-scratch implementation of the entropy-based tree construction
algorithm. It has been designed and implemented in C++ with strong emphasis on
efficiency (time and space) and portability (Windows/Linux, 32/64 bit executable).
Obtaining software: YaDT 1.2.5 (October 2010) with 32/64 bit libraries for VisualStudio 2010/GCC 4.1, and YaDT 1.2.3 (February 2007) with libraries for VisualStudio 2005/GCC 4.0, and YaDT 1.2.1 (January 2005) with libraries for VisualStudio 2003/GCC 3.2.
- S. Ruggieri YaDT: Yet another Decision Tree builder. 16th International Conference on Tools with Artificial Intelligence (ICTAI 2004): 260-265. IEEE Press, November 2004.
- M. Aldinucci, S. Ruggieri, M. Torquati. Porting Decision Tree Algorithms to Multicore using FastFlow. 21th European Conference on Machine Learning and 14th Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2010), Part I: 7-23. Vol. 6321 of LNCS, Springer, September 2010.
- S. Ruggieri. Subtree Replacement in Decision Tree Simplification. 12th SIAM Conference on Data Mining (SDM 2012): 379-390. SIAM, April 2012.
- M. Aldinucci, S. Ruggieri, M. Torquati. Decision tree building on multi-core using FastFlow. Concurrency and Computation: Practice & Experience. Vol. (to appear), 2014.
Following an analytic evaluation of the run-time behavior of the C4.5
algorithm, EC4.5 is a more efficient version of the decision tree builder
algorithm. It improves on C4.5 by adopting the best
among three strategies for computing information gain of continuous
attributes. EC4.5 computes the same decision trees as C4.5 with a
performance gain of up to 5 times.
Obtaining software: a patch from C4.5 release 8 to EC4.5 Beta 1.0 for Linux platforms is available here for educational/research domain. I suggest, however, to download the even faster YaDT tree builder.
Reference paper: S. Ruggieri. Efficient C4.5. IEEE Transactions on Knowledge and Data Engineering. Vol 14, Issue 2, March-April 2002, 438-444.
For other decision tree software see www.KDnuggets.com - Analytics and Data Mining Resources.
Learning from Polyhedral Sets.
The software implements a learning procedure that abstracts a collection of polyhedra (solutions of linear systems) to a minimal and representative parameterized linear systems.
Checking whether a given polyhedron is obtainable by some parameter instance (and computing such values) is also implemented. The software is written in SWI Prolog.
Obtaining software: lps 1.0 (August 2013).
- S. Ruggieri. Learning from Polyhedral Sets. 23rd International Joint Conference on Artificial Intelligence (IJCAI 2013): 1069-1075. AAAI Press, August 2013.
- S. Ruggieri. Deciding Membership in a Class of Polyhedra. 20th European Conference on Artificial Intelligence (ECAI 2012): 702-707. IOS Press, August 2012.
Typing linear constraints and moding CLP(R) programs.
The software implements a type system for linear constraints and a well-moding
checker for CLP(R) programs.
Obtaining software: clpt 1.3 beta (November 2008).
Reference paper: S. Ruggieri, F. Mesnard Typing linear constraints. ACM Transactions on Programming Languages and Systems. Vol 32, Issue 6, July 2010, Article 21.
Environments for Knowledge Discovery in Databases
KDD Markup Language - Mining Query Language. KDDML-MQL is an environment that supports the specification and execution of
complex Knowledge Discovery in Databases (KDD) processes in the form of high-level
queries. The environment is made of two layers, the bottom one called KDDML and
the top one called MQL.
Obtaining software: visit the KDDML/MQL web site.
Reference paper: A. Romei, S. Ruggieri, F. Turini KDDML: a middleware language and system for knowledge discovery in databases. Data and Knowledge Engineering. Vol 57, Issue 2, May 2006, 179-220.