Main Page | Class Hierarchy | Class List | Class Members

kddml.Operators.Preprocessing.PP_MARK_DUPLICATES_RESOLVER Class Reference

Inheritance diagram for kddml.Operators.Preprocessing.PP_MARK_DUPLICATES_RESOLVER:

kddml.Operators.Preprocessing.InstanceLevelDependentTransformation kddml.Operators.Preprocessing.PreprocessingResolver kddml.Operators.OperatorResolver kddml.Operators.HWResourcesDescription kddml.Operators.Preprocessing.PP_MERGE_DUPLICATES_RESOLVER List of all members.

Public Member Functions

void readAttributes (Hashtable< String, KDDMLScalarManager > parameters) throws ResolverException, KDDMLCoreException

Protected Member Functions

void readDataStatistics (DataStatisticsManager stat, DataStatisticsManager PPstat) throws ResolverException, KDDMLCoreException
String getHistoryDescription ()
Instances[] runCore (Instances tuple, Instances metatuple) throws ResolverException, KDDMLCoreException

Detailed Description

The operator marks duplicated instances. Two instances are considered duplicates on the basis of a key composed by a list of attributes. As an example, consider the attributes temperature and outlook as keys. In this case, two instances are duplicates if they have the same values for those attributes. When two instances are selected as duplicates, all key attributes are marked (i.e. a string is added to the preprocessing information) with a specified value. Notice that the operator only marks duplicates and the value of the mark is inserted as preprocessing information of the output table. In other words, no physical data are affected by the operator. Duplicated instances can be joined by using the PP_MERGE_DUPLICATES operator.


Member Function Documentation

void kddml.Operators.Preprocessing.PP_MARK_DUPLICATES_RESOLVER.readAttributes Hashtable< String, KDDMLScalarManager parameters  )  throws ResolverException, KDDMLCoreException [virtual]
 

Reads the XML attributes related to a generic preprocessing operator. An operator settings object captures the attributes associated with a particular operator. It allows a knowledgeable user to fine tune operator parameters. Generally, not all parameters must be specified, however, those specified are taken into account by the KDDML.
Attributes are given as hashtable, where the key is the name of the attribute related to the operator and the value is a KDDMLScalar object containing the attribute value. Attribute value is checked by the interpreter layer and its type is correct.

Parameters:
parameters Hashtable the attributes related to the operator. The key of the hashtable is the name of the operator. The value of the hashtable is a KDDMLScalar representing the value of the operator.
Exceptions:
ResolverException if a resolving error occurs.
KDDMLCoreException if a level core error occurs.

Implements kddml.Operators.Preprocessing.PreprocessingResolver.

void kddml.Operators.Preprocessing.PP_MARK_DUPLICATES_RESOLVER.readDataStatistics DataStatisticsManager  stat,
DataStatisticsManager  PPstat
throws ResolverException, KDDMLCoreException [protected]
 

Reads data statistics related to input preprocessing table. Data statistic can be used to provide additional information to preprocessing operator, such as the number of total instances or attributes belonging to the data source. By default, this method do nothing. It can be overried inside operator implementation if necessary.

Parameters:
stat DataStatisticsManager statistics related to physical instances
PPstat DataStatisticsManager statistics related to preprocessing instances
Exceptions:
ResolverException if a resolving error occurs.
KDDMLCoreException if a level core error occurs.

Reimplemented from kddml.Operators.Preprocessing.PreprocessingResolver.

String kddml.Operators.Preprocessing.PP_MARK_DUPLICATES_RESOLVER.getHistoryDescription  )  [protected, virtual]
 

Returns a description of the actions performed by this preprocessing operator. This description will be reported in the history related to the preprocessing data source.

Returns:
String
Exceptions:
KDDMLCoreException 

Implements kddml.Operators.Preprocessing.PreprocessingResolver.

Reimplemented in kddml.Operators.Preprocessing.PP_MERGE_DUPLICATES_RESOLVER.

Instances [] kddml.Operators.Preprocessing.PP_MARK_DUPLICATES_RESOLVER.runCore Instances  tuple,
Instances  metatuple
throws ResolverException, KDDMLCoreException [protected, virtual]
 

Core operator method. Given the physical tuples and the related preprocessing tuples as weka.core.Instances, the operator returns the modified instances as two-dimensional array.

Parameters:
tuple Instances the entire input dataset as weka.core.Instances
metatuple Instances the entire input preprocessing dataset as weka.core.Instances. The number of metatuple coincides with the number of physical tuples.
Returns:
Instances[] a two-dimensional array containing the calculated output instances. The first element contains the physical instaces; the second element contains the related preprocessing instances. Output schema of instances must be compatible with input schemata and the number of ouput instances must coincide with the number of input instances.
Exceptions:
ResolverException if a resolving error occurs.
KDDMLCoreException if a level core error occurs.

Implements kddml.Operators.Preprocessing.InstanceLevelDependentTransformation.

Reimplemented in kddml.Operators.Preprocessing.PP_MERGE_DUPLICATES_RESOLVER.


Generated on Thu Feb 23 13:04:53 2006 for kddml by  doxygen 1.4.3