Main Page | Class Hierarchy | Class List | Class Members

kddml.Operators.Preprocessing.DiscretizationAlgorithms.NATURAL_BINNING_DISCRETIZATION_RESOLVER Class Reference

Inheritance diagram for kddml.Operators.Preprocessing.DiscretizationAlgorithms.NATURAL_BINNING_DISCRETIZATION_RESOLVER:

kddml.Operators.Preprocessing.DiscretizationAlgorithms.DiscretizationAlgorithmResolverTask kddml.Operators.Preprocessing.PPAlgorithmResolverTask kddml.Operators.AlgorithmResolverTask List of all members.

Public Member Functions

void readParameters (Hashtable< String, KDDMLScalarManager > parameters) throws ResolverException, KDDMLCoreException
void readDiscretizationAttributeStatistics (NumericalStatisticManager stat) throws ResolverException, KDDMLCoreException
Object[] discretize (Double[] values) throws ResolverException
boolean isNumericLabeling ()
String getHistoryDescription ()

Detailed Description

Settings class for the Natural Binning Discretization (EWD) algorithm.
The natural binning discretization method divides the range of a numeric attribute A into k intervals of equal width. The method is also known as Equal Width Discretization (EWD).
Suppose that there are n training instances for which the values of A are known (missing values will be ignored) and suppose that the minimum and maximum value are vmin and vmax respectively. The algorithm sorts the observed values and then divides the number of values between vmin and vmax into k intervals of (approximately) equal width. Thus the intervals have width

w = vmax vmin / k
and the cut points are at vmin+w, vmin+2w, . . . , vmin+(k)w. The number of output intervals k and the width of the interval w are mutually exclusive parameters.
The algorithm takes as input a preprocessing table containing at least a numeric field, representing the discretization attribute.
When the intervals have been computed, the algorithm replaces each training instance value of A with an interval label. Numeric or nominal labeling are allowed.
A Numeric interval label includes the mean, the median, the minimum or maximum calculated on the values belonging to the interval.
A Nominal interval label includes a list of strings, each containing the labels used to replace each training instance value belonging to the interval. The system guarantees that the number of nominal labels is equal to the number of output intervals k. The mapping between intervals computed by the algorithm and nominal labels starts from the interval containing the lowest values6. As an instance, suppose that the algorithm computes the intervals I1 = [6, 35), I2 = [35, 65) and I3 = [65, 95). Moreover suppose that the nominal labels provided are "young", "adult" and "elder" in that order. For each training input instance, a value v of the discretization attribute is replaced with "young", "adult" and "elder" if v belongs to I1, v belongs I2 and v belongs to I3 respectively. By using the nominal interval labeling, the type of the discretization attribute become enumerated.
At present, the algorithm is implemented using (in part) the WEKA system library.

Title: KDDML

Description: Knowledge Discovery in Database Environment

Copyright: Copyright (c) 2003-2005

Company: Universita' di Pisa - Dipartimento di Informatica

Author:
Andrea Romei (romei@di.unipi.it)

Sandra Zimei

Version:
2.0.16


Member Function Documentation

void kddml.Operators.Preprocessing.DiscretizationAlgorithms.NATURAL_BINNING_DISCRETIZATION_RESOLVER.readParameters Hashtable< String, KDDMLScalarManager parameters  )  throws ResolverException, KDDMLCoreException
 

Reads the XML parameters related to a generic algorithm stored in the ALGORITHM entity. An algorithm settings object captures the parameters associated with a particular algorithm. It allows a knowledgeable user to fine tune algorithm parameters. Generally, not all parameters must be specified, however, those specified are taken into account by the KDDML.
Parameters are given as hashtable, where the key is the name of the parameter related to the algorithm and the value is a KDDMLScalar object containing the parameter value. Parameter value is checked by the interpreter layer and its type is correct.

Parameters:
parameters Hashtable the parameters related to the algorithm. The key of the hashtable is the name of the parameter. The value of the hashtable is a KDDMLScalar representing the value of the parameter.
Exceptions:
ResolverException if a resolving error occurs.
KDDMLCoreException if a level core error occurs.

Implements kddml.Operators.AlgorithmResolverTask.

void kddml.Operators.Preprocessing.DiscretizationAlgorithms.NATURAL_BINNING_DISCRETIZATION_RESOLVER.readDiscretizationAttributeStatistics NumericalStatisticManager  stat  )  throws ResolverException, KDDMLCoreException
 

Reads the data statistics related to the input discretization attribute. Data statistic can be used to provide additional information to preprocessing algorithm, such as the minimum and maximum value of the attribute.

Parameters:
stat NumericalStatisticManager
Exceptions:
ResolverException 
KDDMLCoreException 

Implements kddml.Operators.Preprocessing.DiscretizationAlgorithms.DiscretizationAlgorithmResolverTask.

Object [] kddml.Operators.Preprocessing.DiscretizationAlgorithms.NATURAL_BINNING_DISCRETIZATION_RESOLVER.discretize Double[]  values  )  throws ResolverException
 

Main method that discretizes the input values related to an attribute. Input values are given as array of Doubles where missing values are represented as null objects. The operator returns the discretized values as array, where missing values are represented as null object. The order in wich values appear in the arrays corresponds to the order in wich they appear in the preprocessing table. So, the size of the input and output array is equal to the total number of instances. According to the labeling technique, the result of a discretization process can be either numeric (e.g. the mean of the bin) or nominal (e.g. a labels used to replace each instance value belonging to the bin). In the first case, the method returs an array of Double objects. Otherwise, it returns an array of String objects.

Parameters:
values Double[] the input values to discretize. Null objects correspond to missing values for that instance.
Returns:
Object[] the discretized values. Returns an array of Double if the method isNumericLabeling() returns true. Returns an array of String if the method isNumericLabeling() return false. Null objects correspond to missing values for that instance.
Exceptions:
ResolverException if an error occurs.

Implements kddml.Operators.Preprocessing.DiscretizationAlgorithms.DiscretizationAlgorithmResolverTask.

boolean kddml.Operators.Preprocessing.DiscretizationAlgorithms.NATURAL_BINNING_DISCRETIZATION_RESOLVER.isNumericLabeling  ) 
 

Specifies the type of labeling to be used. Return true if the result of the discretization process is numeric (e.g. the mean of the bin). Returns false if the result of the discretization process is nominal (e.g. a labels used to replace each instance value belonging to the bin).

Returns:
boolean

Implements kddml.Operators.Preprocessing.DiscretizationAlgorithms.DiscretizationAlgorithmResolverTask.

String kddml.Operators.Preprocessing.DiscretizationAlgorithms.NATURAL_BINNING_DISCRETIZATION_RESOLVER.getHistoryDescription  )  [virtual]
 

Returns a description of the actions performed by this preprocessing algorithm. This description will be reported in the history related to the preprocessing data source.

Returns:
String
Exceptions:
KDDMLCoreException 

Implements kddml.Operators.Preprocessing.PPAlgorithmResolverTask.


Generated on Thu Feb 23 13:04:55 2006 for kddml by  doxygen 1.4.3