Main Page | Class Hierarchy | Class List | Class Members

kddml.Operators.Preprocessing.DiscretizationAlgorithms.EQUAL_FREQUENCY_DISCRETIZATION_RESOLVER Class Reference

Inheritance diagram for kddml.Operators.Preprocessing.DiscretizationAlgorithms.EQUAL_FREQUENCY_DISCRETIZATION_RESOLVER:

kddml.Operators.Preprocessing.DiscretizationAlgorithms.DiscretizationAlgorithmResolverTask kddml.Operators.Preprocessing.PPAlgorithmResolverTask kddml.Operators.AlgorithmResolverTask List of all members.

Public Member Functions

void readParameters (Hashtable< String, KDDMLScalarManager > parameters) throws ResolverException, KDDMLCoreException
void readDiscretizationAttributeStatistics (NumericalStatisticManager stat) throws ResolverException, KDDMLCoreException
Object[] discretize (Double[] values) throws ResolverException
boolean isNumericLabeling ()
String getHistoryDescription ()

Detailed Description

Resolver class for the Equal Frequency Discretization (EFD) algorithm.
The EFD technique divides the range of a numeric attribute A into k intervals containing the same number of samples. Suppose there are n training instances for which the values of A are known (missing values will be ignored). More in details, the algorithm sorts the observed values and then divides the sorted values into k intervals so that each interval contains (approximately2) the same number of training instances. Thus each interval contains n/k (possibly duplicated) adjacent values. The number of output intervals k and the number of required samples for each interval are mutually exclusive parameters.
When the intervals have been computed, the algorithm replaces each training instance value of A with an interval label.
The algorithm takes as input a preprocessing table containing at least a numeric field, representing the discretization attribute.
When the intervals have been computed, the algorithm replaces each training instance value of A with an interval label. Numeric or nominal labeling are allowed.
A Numeric interval label includes the mean, the median, the minimum or maximum calculated on the values belonging to the interval.
A Nominal interval label includes a list of strings, each containing the labels used to replace each training instance value belonging to the interval. The system guarantees that the number of nominal labels is equal to the number of output intervals k. The mapping between intervals computed by the algorithm and nominal labels starts from the interval containing the lowest values6. As an instance, suppose that the algorithm computes the intervals I1 = [6, 35), I2 = [35, 65) and I3 = [65, 95). Moreover suppose that the nominal labels provided are "young", "adult" and "elder" in that order. For each training input instance, a value v of the discretization attribute is replaced with "young", "adult" and "elder" if v belongs to I1, v belongs I2 and v belongs to I3 respectively. By using the nominal interval labeling, the type of the discretization attribute become enumerated.
At present, the algorithm uses a proprietary Java implementation.

Title: KDDML

Description: Knowledge Discovery in Database Environment

Copyright: Copyright (c) 2003-2005

Company: Universita' di Pisa - Dipartimento di Informatica

Author:
Andrea Romei (romei@di.unipi.it)
Version:
2.0.16


Member Function Documentation

void kddml.Operators.Preprocessing.DiscretizationAlgorithms.EQUAL_FREQUENCY_DISCRETIZATION_RESOLVER.readParameters Hashtable< String, KDDMLScalarManager parameters  )  throws ResolverException, KDDMLCoreException
 

Reads the XML parameters related to a generic algorithm stored in the ALGORITHM entity. An algorithm settings object captures the parameters associated with a particular algorithm. It allows a knowledgeable user to fine tune algorithm parameters. Generally, not all parameters must be specified, however, those specified are taken into account by the KDDML.
Parameters are given as hashtable, where the key is the name of the parameter related to the algorithm and the value is a KDDMLScalar object containing the parameter value. Parameter value is checked by the interpreter layer and its type is correct.

Parameters:
parameters Hashtable the parameters related to the algorithm. The key of the hashtable is the name of the parameter. The value of the hashtable is a KDDMLScalar representing the value of the parameter.
Exceptions:
ResolverException if a resolving error occurs.
KDDMLCoreException if a level core error occurs.

Implements kddml.Operators.AlgorithmResolverTask.

void kddml.Operators.Preprocessing.DiscretizationAlgorithms.EQUAL_FREQUENCY_DISCRETIZATION_RESOLVER.readDiscretizationAttributeStatistics NumericalStatisticManager  stat  )  throws ResolverException, KDDMLCoreException
 

Reads the data statistics related to the input discretization attribute. Data statistic can be used to provide additional information to preprocessing algorithm, such as the minimum and maximum value of the attribute.

Parameters:
stat NumericalStatisticManager
Exceptions:
ResolverException 
KDDMLCoreException 

Implements kddml.Operators.Preprocessing.DiscretizationAlgorithms.DiscretizationAlgorithmResolverTask.

Object [] kddml.Operators.Preprocessing.DiscretizationAlgorithms.EQUAL_FREQUENCY_DISCRETIZATION_RESOLVER.discretize Double[]  values  )  throws ResolverException
 

Main method that discretizes the input values related to an attribute. Input values are given as array of Doubles where missing values are represented as null objects. The operator returns the discretized values as array, where missing values are represented as null object. The order in wich values appear in the arrays corresponds to the order in wich they appear in the preprocessing table. So, the size of the input and output array is equal to the total number of instances. According to the labeling technique, the result of a discretization process can be either numeric (e.g. the mean of the bin) or nominal (e.g. a labels used to replace each instance value belonging to the bin). In the first case, the method returs an array of Double objects. Otherwise, it returns an array of String objects.

Parameters:
values Double[] the input values to discretize. Null objects correspond to missing values for that instance.
Returns:
Object[] the discretized values. Returns an array of Double if the method isNumericLabeling() returns true. Returns an array of String if the method isNumericLabeling() return false. Null objects correspond to missing values for that instance.
Exceptions:
ResolverException if an error occurs.

Implements kddml.Operators.Preprocessing.DiscretizationAlgorithms.DiscretizationAlgorithmResolverTask.

boolean kddml.Operators.Preprocessing.DiscretizationAlgorithms.EQUAL_FREQUENCY_DISCRETIZATION_RESOLVER.isNumericLabeling  ) 
 

Specifies the type of labeling to be used. Return true if the result of the discretization process is numeric (e.g. the mean of the bin). Returns false if the result of the discretization process is nominal (e.g. a labels used to replace each instance value belonging to the bin).

Returns:
boolean

Implements kddml.Operators.Preprocessing.DiscretizationAlgorithms.DiscretizationAlgorithmResolverTask.

String kddml.Operators.Preprocessing.DiscretizationAlgorithms.EQUAL_FREQUENCY_DISCRETIZATION_RESOLVER.getHistoryDescription  )  [virtual]
 

Returns a description of the actions performed by this preprocessing algorithm. This description will be reported in the history related to the preprocessing data source.

Returns:
String
Exceptions:
KDDMLCoreException 

Implements kddml.Operators.Preprocessing.PPAlgorithmResolverTask.


Generated on Thu Feb 23 13:04:54 2006 for kddml by  doxygen 1.4.3