Oracle® Data Mining Application Developer's Guide, 10g Release 2 (10.2) Part Number B14340-01 |
|
|
View PDF |
This chapter provides information to help you get started using the Oracle Data Mining Java API. It describes the general design of the API, and it explains how to use the API to perform major mining operations in your application.
See Also:
Oracle Data Mining Java API Reference (javadoc).
JDM 1.0 javadoc at http://www.oracle.com/technology/products/bi/odm
This chapter includes the following topics:
The samples included in this chapter are taken from the Data Mining sample applications available on the Database companion CD. When you install the companion CD, the Data Mining sample applications are copied to the following directory.
$ORACLE_HOME/rdbms/demo (on Unix) or (%ORACLE_HOME%\rdbms\demo (on NT)
To obtain a listing of the sample applications , simply type the following on Unix:
ls $ORACLE_HOME/rdbms/demo/dm*
Use an equivalent command on other operating systems.
Table 7-1 lists the Java sample applications.
Table 7-1 The Java Sample Applications for Data Mining
See Also:
Oracle Data Mining Administrator's Guide for information about installing, running, and viewing the sample programs.The ODM Java API requires Oracle Database 10g Release 2 (10.2)and J2SE 1.4.2.
To use the ODM Java API, include the following libraries in your CLASSPATH
:
$ORACLE_HOME/rdbms/jlib/jdm.jar $ORACLE_HOME/rdbms/jlib/ojdm_api.jar $ORACLE_HOME/rdbms/jlib/xdb.jar $ORACLE_HOME/jdbc/lib/ojdbc14.jar $ORACLE_HOME/oc4j/j2ee/home/lib/connector.jar $ORACLE_HOME/jlib/orai18n.jar $ORACLE_HOME/jlib/orai18n-mapping.jar $ORACLE_HOME/lib/xmlparserv2.jar
The first job of a data mining application is to connect to the Data Mining Server (DMS), which is the data mining engine and metadata repository within the Oracle Database.
Note:
The JDM API uses the general term DME (Data Mining Engine). In the ODM Java API, the term DME refers to the Oracle DMS.The DMS connection is encapsulated in a Connection
object, which provides the framework for a data mining application. The Connection
object serves the following purposes:
Authenticates users
Supports retrieval and storage of named objects
Supports the execution of mining tasks
Provides version information for the JDM implementation and provider
The DMS Connection
object is described in detail in "Features of a DMS Connection".
A Connection
is created from a ConnectionFactory
, an interface provided by the JDM standard API. You can lookup a ConnectionFactory
from the JNDI server, or you can create a ConnectionFactory
using an OraConnectionFactory
object.
//Create OraConnectionFactory javax.datamining.resource.ConnectionFactory connFactory = oracle.dmt.jdm.resource.OraConnectionFactory();
//Setup the initial context to connect to the JNDI server Hashtable env = new Hashtable(); env.put( Context.INITIAL_CONTEXT_FACTORY, "oracle.dmt.jdm.resource.OraConnectionFactory" ); env.put( Context.PROVIDER_URL, "http://myHost:myPort/myService" ); env.put( Context.SECURITY_PRINCIPAL, "user" ); env.put( Context.SECURITY_CREDENTIALS, "password" ); InitialContext jndiContext = new javax.naming.InitialContext( env ); // Perform JNDI lookup to obtain the connection factory javax.datamining.resource.ConnectionFactory dmeConnFactory = (ConnectionFactory) jndiContext.lookup("java:comp/env/jdm/MyServer"); //Lookup ConnectionFactory javax.datamining.resource.ConnectionFactory connFactory = (ConnectionFactory) jndiContext.lookup("java:comp/env/jdm/MyServer");
You can choose to pre-create the JDBC connection to the DMS, or you can manage it through the ODM Java API. If you pre-create the JDBC connection, your data mining application can access the connection caching features of JDBC. When the ODM Java API manages the JDBC connection, caching is not available to your application.
See Also:
Oracle Database JDBC Developer's Guide and Reference for information about connection caching.To pre-create the JDBC connection, create an OracleDataSource
for an OraConnectionFactory
.
//Create an OracleDataSource OracleDataSource ods = new OracleDataSource(); ods.setURL(URL); ods.setUser(user); ods.setPassword(password); //Create a connection factory using the OracleDataSource javax.datamining.resource.ConnectionFactory connFactory = oracle.dmt.jdm.resource.OraConnectionFactory(ods); //Create DME Connection javax.datamining.resource.Connection dmeConn = connFactory.getConnection();
To manage the JDBC connection within the ODM Java API, create an empty ConnectionSpec
instance using the getConnectionSpec()
method of OraConnectionFactory
.
//Create ConnectionSpec ConnectionSpec connSpec = m_dmeConnFactory.getConnectionSpec(); connSpec.setURI("jdbc:oracle:thin:@host:port:sid"); connSpec.setName("user"); connSpec.setPassword("password"); //Create DME Connection javax.datamining.resource.Connection m_dmeConn = m_dmeConnFactory.getConnection(connSpec);
In the ODM Java API, the DMS Connection
is the primary factory object. The Connection
instantiates the object factories using the getFactory
method. The Connection
object provides named object lookup, persistence, and task execution features.
The Connection.getFactory
method creates a factory object. For example, to create a factory for the PhysicalDataSet
object, pass the absolute name of the object to this method. The getFactory
method creates an instance of PhysicalDataSetFactory
.
javax.datamining.data.PhysicalDataSetFactory pdsFactory = dmeConn.getFactory("javax.datamining.data.PhysicalDataSet");
The Connection
object provides methods for retrieving metadata about mining objects.
Method | Description |
---|---|
getCreationDate |
Returns the creation date of the specified named object.
getCreationDate(java.lang.String objectName, NamedObject objectType) returns java.util.Date |
getDescription |
Returns the description of the specified mining object.
getDescription(java.lang.String objectName, NamedObject objectType) returns java.lang.String |
getObjectNames |
Returns a collection of the names of the objects of the specified type.
getObjectNames(NamedObject objectType) returns java.util.Collection |
You can obtain additional information about persistent mining objects by querying the Oracle data dictionary tables.
The Connection
object provides methods for retrieving mining objects and saving them in the DMS. Persistent objects are stored as database objects. Transient objects are stored in memory.
Method | Description |
---|---|
saveObject |
Saves the named object in the metadata repository associated with the connection.
saveObject(java.lang.String name, MiningObject object, boolean replace) |
retrieveObject |
Retrieves a copy of the specified named object from the metadata repository associated with the connection.
retrieveObject(java.lang.String objectIdentifier) returns MiningObject |
retrieveObject |
Retrieves a copy of the object with the specified name and type from the metadata repository associated with the connection.
retrieveObject(java.lang.String name, NamedObject objectType) returns MiningObject |
The Connection
object provides an execute
method, which can execute mining tasks either asynchronously or synchronously. The DMS uses the database Scheduler to execute mining tasks, which are stored in the user's schema as Scheduler jobs.
Task Execution | execute method syntax |
---|---|
asynchronous |
execute(java.lang.String taskName) returns ExecutionHandle |
synchronous |
execute(Task task,java.lang.Long timeout)) returns ExecutionHandle |
Synchronous execution is typically used with single record scoring, but it may be used in other contexts as well.
See Also:
Oracle Database Administrator's Guide for information about the database Scheduler.
The Connection
object provides methods for obtaining information about the DMS at runtime.
Method | Description |
---|---|
getMetaData |
Returns information about the underlying DMS instance represented through an active connection. ConnectionMetaData provides version information for the JDM implementation and Oracle Database.
getMetaData() returns ConnectionMetaData |
getSupportedFunctions |
Returns an array of mining functions that are supported by the implementation.
getSupportedFunctions() returns MiningFunction[] |
getSupportedAlgorithms |
Returns an array of mining algorithms that are supported by the specified mining function.
getSupportedAlgorithms(MiningFunction function) returns MiningAlgorithm[] |
supportsCapability |
Returns true if the specified combination of mining capabilities is supported. If an algorithm is not specified, returns true if the specified function is supported.
supportsCapability(MiningFunction function, MiningAlgorithm algorithm, MiningTask taskType) returns boolean |
The Connection
object provides methods for retrieving JDM standard version information and Oracle version information.
Method | Description |
---|---|
getVersion |
Returns the version of the JDM Standard API. It must be "JDM 1.0" for the first release of JDM.
getVersion() returns String |
getMajorVersion |
Returns the major version number. For the first release of JDM, this is "1".
getMajorVersion() returns int |
getMinorVersion |
Returns the minor version number. For the first release of JDM, this is "0".
getMinorVersion() returns int |
getProviderName |
Returns the provider name as "Oracle Corporation".
getProviderName() returns String |
getProviderVersion |
Returns the version of the Oracle Database that shipped the Oracle Data Mining Java API jar file.
getProviderVersion() returns String |
Object factories are central to the design of JDM. The ODM Java API uses object factories for instantiating mining objects.
javax.datamining
is the base package for the JDM standard defined classes.
oracle.dmt.jdm
is the base package for the Oracle extensions to the JDM standard.
The packages in the JDM standard API are organized by mining functions and algorithms. For example, the javax.datamining.supervised
package contains all the classes that support supervised functions. It has subpackages for classification and regression classes.
javax.datamining.supervised.classification javax.datamining.supervised.regression
Similarly, javax.datamining.algorithm
is the base package for all algorithms. Each algorithm has its own subpackage. The JDM standard supports algorithms such as naive bayes and support vector machines.
javax.datamining.algorithm.naivebayes javax.datamining.algorithm.svm
The ODM Java API follows a similar package structure for the extensions. For example, the ODM Java API supports Feature Extraction, a non-JDM standard function, and the Non-Negative Matrix Factorization algorithm that is used for feature extraction.
oracle.dmt.jdm.featureextraction oracle.dmt.jdm.algorithm.nmf
The JDM standard has core packages that define common classes and packages for tasks, model details, rules and statistics. Figure 7-1 illustrates the inheritance hierarchy of the named objects.
Figure 7-1 JDM Named Objects Class Diagram
The JDM standard defines physical and logical data objects to describe the mining attribute characteristics of the data as well as statistical computations for describing the data.
In the ODM Java API, only physical data objects are supported. Data can be logically represented with database views. The DBMS_STATS
package can be used for statistical computations.
The javax.datamining.data
package contains all the data-related classes. The class diagram in Figure 7-2 illustrates the class relationships of the data objects supported by the ODM Java API.
Figure 7-2 Data Objects in Oracle Data Mining Java API
The following code illustrates the creation of a PhysicalDataSet
object. It refers to the view DMUSER.MINING_DATA_BUILD_V
and specifies the column cust_id
as case-id using the PhysicalAttributeRole
.
//Create PhysicalDataSetFactory PhysicalDataSetFactory pdsFactory = (PhysicalDataSetFactory)m_dmeConn.getFactory ("javax.datamining.data.PhysicalDataSet"); //Create a PhysicalDataSet object PhysicalDataSet buildData = pdsFactory.create("DMUSER.MINING_DATA_BUILD_V", false); //Create PhysicalAttributeFactory PhysicalAttributeFactory paFactory = (PhysicalAttributeFactory)m_dmeConn.getFactory ("javax.datamining.data.PhysicalAttribute"); //Create PhysicalAttribute object PhysicalAttribute pAttr = paFactory.create ("cust_id", AttributeDataType.integerType, PhysicalAttributeRole.caseId ); //Add the attribute to the PhysicalDataSet object buildData.addAtribute(pAttr); //Save the physical data set object dmeConn.saveObject("JDM_BUILD_PDS", buildData, true);
In the ODM Java API, the BuildSettings
object is saved as a table in the database. The settings table is compatible with the DBMS_DATA_MINING.CREATE_MODEL
procedure. The name of the settings table must be unique in the user's schema. Figure 7-3 illustrates the build settings class hierarchy.
The following code illustrates the creation and storing of a classification settings object with a tree algorithm.
//Create a classification settings factory ClassificationSettingsFactory clasFactory = (ClassificationSettingsFactory)dmeConn.getFactory ("javax.datamining.supervised.classification.ClassificationSettings"); //Create a ClassificationSettings object ClassificationSettings clas = clasFactory.create(); //Set target attribute name clas.setTargetAttributeName("AFFINITY_CARD"); //Create a TreeSettingsFactory TreeSettingsFactory treeFactory = (TreeSettingsFactory)dmeConn.getFactory ("javax.datamining.algorithm.tree.TreeSettings"); //Create TreeSettings instance TreeSettings treeAlgo = treeFactory.create(); treeAlgo.setBuildHomogeneityMetric(TreeHomogeneityMetric.entropy); treeAlgo.setMaxDepth(10); treeAlgo.setMinNodeSize( 10, SizeUnit.count ); //Set algorithm settings in the classification settings clas.setAlgorithmSettings(treeAlgo); //Save the build settings object in the database dmeConn.saveObject("JDM_TREE_CLAS", clas, true);
The ODM Java API uses the DBMS_SCHEDULER
infrastructure for executing mining tasks either synchronously or asynchronously in the database. A mining task is saved as a DBMS_SCHEDULER
job in the user's schema. Its initial state is DISABLED
. When the user calls the execute
method in the DMS Connection
, the job state is changed to ENABLED
.
The class diagram in Figure 7-4 illustrates the different types of tasks that are available in the ODM Java API.
DBMS_SCHEDULER
provides additional scheduling and resource management features. You can extend the capabilities of ODM tasks by using the Scheduler infrastructure.
See Also:
Oracle Database Administrator's Guide for information about the database scheduler.The javax.datamining.task.BuildTask
class is used to build a mining model. Prior to building a model, a PhysicalDataSet
object and a BuildSettings
object must be saved.
The following code illustrates the building of a tree model using the PhysicalDataSet
described in "Describing the Mining Data" and the BuildSettings
described in "Build Settings".
//Create BuildTaskFactory BuildTaskFactory buildTaskFactory = dmeConn.getFactory("javax.datamining.task.BuildTask"); //Create BuildTask object BuildTask buildTask = buildTaskFactory.create ( "JDM_BUILD_PDS","JDM_TREE_CLAS","JDM_TREE_MODEL"); //Save BuildTask object dmeConn.saveObject("JDM_BUILD_TASK", buildTask, true); //Execute build task asynchronously in the database ExecutionHandle execHandle = dmeConn.execute("JDM_BUILD_TASK"); //Wait for completion of the task
After building a model using the BuildTask
, a model object is persisted in the database. It can be retrieved to explore the model details.
The class diagram in Figure 7-5 illustrates the different types of model objects and model details objects supported by the ODM Java API.
Figure 7-5 Model and Model Detail Class Diagram
The following code illustrates the retrieval of the classification tree model built in "Building a Mining Model" and its TreeModelDetail
.
//Retrieve classification model from the DME ClassificationModel treeModel = (ClassificationModel)dmeConn.retrieveObject ( "JDM_TREE_MODEL", NamedObject.model); //Retrieve tree model detail from the model TreeModelDetail treeDetail = (TreeModelDetail)treeModel.getModelDetail(); //Get the root node TreeNode rootNode = treeDetail.getRootNode(); //Get child nodes TreeNode[] childNodes = rootNode.getChildren(); //Get details of the first child node int nodeId = childNodes[0].getIdentifier(); long caseCount = childNodes[0].getCaseCount(); Object prediction = childNodes[0].getPrediction();
Once a supervised model has been built, it can be evaluated using a test operation. The JDM standard defines two types of test operations: one that takes the mining model as input, and the other that takes the apply output table with the actual and predicted value columns.
javax.datamining.supervised.TestTask
is the base class for the model- based test tasks, and javax.datamining.supervised.TestMetricsTask
is the base class for the apply output table-based test tasks.
The test operation creates and persists a test metrics object in the DMS. For classification model testing, either of the following can be used:
javax.datamining.supervised.classification.ClassificationTestTask javax.datamining.supervised.classification.ClassificationTestMetricsTask
Both of these tasks create a named object:
javax.datamining.supervised.classification.ClassificationTestMetrics
The ClassificationTestMetrics
named object is stored as a table in the user's schema. The name of the table is the name of the object. The confusion matrix, lift results, and ROC associated with the ClassificationTestMetrics
object are stored in separate tables whose names are the ClassificationTestMetrics
object name followed by the suffix _CFM
, _LFT
, or _ROC
. Tools such as Oracle Discoverer can display the test results by querying these tables.
Similarly for regression model testing, either of the following can be used:
javax.datamining.supervised.regression.RegressionTestTask javax.datamining.supervised.regression.RegressionTestMtericsTask
Both these tasks create a named object
javax.datamining.supervised.regression.RegressionTestMetrics
and store it as a table in the user schema.
The class diagram in Figure 7-6 illustrates the test metrics class hierarchy. It refers to "Build Settings" for the class hierarchy of test tasks.
The following code illustrates the test of a tree model JDM_TREE_MODEL
using the ClassificationTestTask
on the dataset MINING_DATA_TEST_V
.
//Create & save PhysicalDataSpecification PhysicalDataSet testData = m_pdsFactory.create( "MINING_DATA_TEST_V", false ); PhysicalAttribute pa = m_paFactory.create("cust_id", AttributeDataType.integerType, PhysicalAttributeRole.caseId ); testData.addAttribute( pa ); m_dmeConn.saveObject( "JDM_TEST_PDS", testData, true ); //Create ClassificationTestTaskFactory ClassificationTestTaskFactory testTaskFactory = (ClassificationTestTaskFactory)dmeConn.getFactory( "javax.datamining.supervised.classification.ClassificationTestTask"); //Create, store & execute Test Task ClassificationTestTask testTask = testTaskFactory.create( "JDM_TEST_PDS", "JDM_TREE_MODEL", "JDM_TREE_TESTMETRICS" ); testTask.setNumberOfLiftQuantiles(10); testTask.setPositiveTargetValue(new Integer(1)); //Save TestTask object dmeConn.saveObject("JDM_TEST_TASK", testTask, true); //Execute test task asynchronously in the database ExecutionHandle execHandle = dmeConn.execute("JDM_TEST_TASK"); //Wait for completion of the task ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE); //Explore the test metrics after successful completion of the task if(ExecutionState.success.equals(execStatus.getState())) { //Retrieve the test metrics object ClassificationTestMetrics testMetrics = (ClassificationTestMetrics)dmeConn.getObject("JDM_TREE_TESTMETRICS"); //Retrieve confusion matrix and accuracy Double accuracy = testMetrics.getAccuracy(); ConfusionMatrix cfm = testMetrics.getConfusionMatrix(); //Retrieve lift Lift lift = testMetrics.getLift(); //Retrieve ROC ReceiverOperatingCharacterics roc = testMetrics.getROC(); }
In the preceding example, a test metrics object is stored as a table called JDM_TREE_TESTMETRICS
. The confusion matrix is stored in the JDM_TREE_TESTMETRICS_CFM
table, lift is stored in the JDB_TREE_TESTMETRICS_LFT
table, and ROC
is stored in the JDM_TREE_TESTMETRICS_ROC
table. You can use BI tools like Oracle Discoverer to query these tables and create reports.
All supervised models can be applied to data to find the prediction. Some of the unsupervised models, such as clustering and feature extraction, support the apply operation to find the cluster id or feature id for new records.
The JDM standard API provides an ApplySettings
object to specify the type of output for the scored results. javax.datamining.task.apply.ApplySettings
is the base class for all apply settings. In the ODM Java API, the ApplySettings
object is transient; it is stored in the Connection
context, not in the database.
The class diagram in Figure 7-7 illustrates the class hierarchy of the apply settings available in the ODM Java API.
In the ODM Java API, default apply settings produce the apply output table in fixed format. The list in Table 7-2 illustrates the default output formats for different functions.
Table 7-2 Default Output Formats for Different Functions
Mining Function | ||||
---|---|---|---|---|
Classification without Cost |
Case ID |
Prediction |
Probability |
|
Classification with Cost |
Case ID |
Prediction |
Probability |
Cost |
Regression |
Case ID |
Prediction |
|
|
Clustering |
Case ID |
Cluster ID |
Probability |
|
Feature extraction |
Case ID |
Feature ID |
Value |
|
All types of apply settings support source and destination attribute mappings. For example, if the original apply table has customer name and age columns that need to be carried forward to the apply output table, it can be done by specifying the source destination mappings in the apply settings.
In the ODM Java API, classification apply settings support map by rank, top prediction, map by category, and map all predictions. Regression apply settings support map prediction value. Clustering apply settings support map by rank, map by cluster id, map top cluster, and map all clusters. Feature extraction apply settings support map by rank, map by feature id, map top feature, and map all features.
The following code illustrates the applying of a tree model JDM_TREE_MODEL
using ClassificationApplyTask
on the dataset MINING_DATA_APPLY_V
.
//Create & save PhysicalDataSpecification PhysicalDataSet applyData = m_pdsFactory.create( "MINING_DATA_APPLY_V", false ); PhysicalAttribute pa = m_paFactory.create("cust_id", AttributeDataType.integerType, PhysicalAttributeRole.caseId ); applyData.addAttribute( pa ); m_dmeConn.saveObject( "JDM_APPLY_PDS", applyData, true ); //Create ClassificationApplySettingsFactory ClassificationApplySettingsFactory applySettingsFactory = (ClassificationApplySettingsFactory)dmeConn.getFactory( "javax.datamining.supervised.classification. ClassificationApplySettings"); //Create & save ClassificationApplySettings ClassificationApplySettings clasAS = applySettingsFactory.create(); m_dmeConn.saveObject( "JDM_APPLY_SETTINGS", clasAS, true); //Create DataSetApplyTaskFactory DataSetApplyTaskFactory applyTaskFactory = (DataSetApplyTaskFactory)dmeConn.getFactory( "javax.datamining.task.apply.DataSetApplyTask"); //Create, store & execute apply Task DataSetApplyTask applyTask = m_dsApplyFactory.create( " JDM_APPLY_PDS ", "JDM_TREE_MODEL", " JDM_APPLY_SETTINGS ", "JDM_APPLY_OUTPUT_TABLE"); //Save ApplyTask object dmeConn.saveObject("JDM_APPLY_TASK", applyTask, true); //Execute test task asynchronously in the database ExecutionHandle execHandle = dmeConn.execute("JDM_APPLY_TASK"); //Wait for completion of the task ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE);
The class javax.datamining.supervised.classification.CostMatrix
is used to represent the costs of the false positive and false negative predictions. It is used for classification problems to specify the costs associated with the false predictions.
In the ODM Java API, cost matrix is supported in apply and test operations for all classification models. For the decision tree algorithm, a cost matrix can be specified at build time. For more information about cost matrix, see Oracle Data Mining Concepts.
The following code illustrates how to create a cost matrix object where the target has two classes: YES
(1) and NO
(0). Suppose a positive (YES
) response to the promotion generates $2 and the cost of the promotion is $1. Then the cost of misclassifying a positive responder is $2. The cost of misclassifying a non-responder is $1.
//Create category set factory & cost matrix factory CategorySetFactory catSetFactory = (CategorySetFactory)m_dmeConn.getFactory( "javax.datamining.data.CategorySet" ); CostMatrixFactory costMatrixFactory = (CostMatrixFactory)m_dmeConn.getFactory( "javax.datamining.supervised.classification.CostMatrix"); //Create categorySet CategorySet catSet = m_catSetFactory.create(AttributeDataType.integerType); //Add category values catSet.addCategory(new Integer(0), CategoryProperty.valid); catSet.addCategory(new Integer(1), CategoryProperty.valid); //create cost matrix CostMatrix costMatrix = m_costMatrixFactory.create(catSet); costMatrix.setValue(new Integer(0), new Integer(0), 0); costMatrix.setValue(new Integer(1), new Integer(1), 0); costMatrix.setValue(new Integer(0), new Integer(1), 2); costMatrix.setValue(new Integer(1), new Integer(0), 1); //Save cost matrix in the DME dmeConn.saveObject("JDM_COST_MATRIX", costMatrix);
Prior probabilities are used for classification problems if the actual data has a different distribution for target values than the data provided for the model build. A user can specify the prior probabilities in the classification function settings, using setPriorProbabilitiesMap
. For more information about prior probabilities, see Oracle Data Mining Concepts.
Note:
Priors are not supported with decision trees.The following code illustrates how to create a PriorProbabilities
object, when the target has two classes: YES
(1) and NO
(0), and probability of YES
is 0.05, probability of NO
is 0.95.
//Set target prior probabilities Map priorMap = new HashMap(); priorMap.put(new Double(0), new Double(0.7)); priorMap.put(new Double(1), new Double(0.3)); buildSettings.setPriorProbabilitiesMap("affinity_card", priorMap);
The ODM Java API provides oracle.dmt.jdm.task.OraPredictTask
and oracle.dmt.jdm.task.OraExplainTask
for generating predictions and explaining attribute importance. These tasks automate the predict and explain operations for data mining novice users.
OraPredictTask
predicts the value of a target column based on cases where the target is not null. OraPredictTask
uses known data values to automatically create a model and populate the unknown values in the target.
OraExplainTask
identifies attribute columns that are important for explaining the variation of values in a given column. OraExplainTask
analyzes the data and builds a model that identifies the important attributes and ranks their importance.
Both of these tasks do the automated data preparation where needed.
The following code illustrates OraPredictTask
and OraExplainTask
.
//Predict task //Create predict task factory and task object OraPredictTaskFactory predictFactory = (OraPredictTaskFactory)m_dmeConn.getFactory( "oracle.dmt.jdm.task.OraPredictTask"); OraPredictTask predictTask = m_predictFactory.create( "MINING_DATA_BUILD_V", //Input table "cust_id", //Case id column "affinity_card", //target column "JDM_PREDICTION_TABLE"); //prediction output table //Save predict task object dmeConn.saveObject("JDM_PREDICT_TASK", predictTask, true); //Execute test task asynchronously in the database ExecutionHandle execHandle = dmeConn.execute("JDM_PREDICT_TASK"); //Wait for completion of the task ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE); //Explain task //Create explain task factory and task object OraExplainTaskFactory explainFactory = (OraExplainTaskFactory)m_dmeConn.getFactory( "oracle.dmt.jdm.task.OraExplainTask"); OraExplainTask explainTask = m_explainFactory.create( "MINING_DATA_BUILD_V", //Input table "affinity_card", //explain column "JDM_EXPLAIN_TABLE"); //explain output table //Save predict task object dmeConn.saveObject("JDM_EXPLAIN_TASK", explainTask, true); //Execute test task asynchronously in the database ExecutionHandle execHandle = dmeConn.execute("JDM_ EXPLAIN_TASK"); //Wait for completion of the task ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE);
In the ODM Java API, data must be prepared before building, applying, or testing a model. The oracle.dmt.jdm.task.OraTransformationTask
class supports common transformations used in data mining: binning, normalization, clipping, and text transformations. For more information about transformations, see Oracle Data Mining Concepts.
The class diagram in Figure 7-8 illustrates the OraTransformationTask
and its relationship with other objects.
Figure 7-8 OraTransformationTask and its Relationship With Other Objects
Binning is the process of grouping related values together, thus reducing the number of distinct values for an attribute. Having fewer distinct values typically leads to a more compact model and one that builds faster, but it can also lead to some loss in accuracy.
The class diagram in Figure 7-9 illustrates the binning transformation classes.
Figure 7-9 OraBinningTransformation Class Diagram
Here, OraBinningTransformation
contains all the settings required for binning. The ODM Java API supports top-n, custom binning for categorical attributes, and equi-width, quantile and custom binning for numerical attributes. After running the binning transformations, it creates a transformed table and bin boundary tables in the user's schema. The user can specify the bin boundary table names, or the system will generate the names for the bin boundary tables. This facilitates the reusing of the bin boundary tables that are created for binning build data for apply and test data.
The following code illustrates the binning operation on the view MINING_BUILD_DATA_V
//Create binning transformation instance OraBinningTransformFactory binXformFactory = (OraBinningTransformFactory)dmeConn.getFactory( "oracle.dmt.jdm.transform.binning.OraBinningTransform"); OraBinningTransform binTransform = m_binXformFactory.create( "MINING_DATA_BUILD_V", // name of the input data set "BINNED_DATA_BUILD_V", // name of the transformation result true); // result of the transformation is a view // Specify the number of numeric bins binTransform.setNumberOfBinsForNumerical(10); // Specify the number of categoric bins binTransform.setNumberOfBinsForCategorical(8); // Specify the list of excluded attributes String[] excludedList = new String[]{"CUST_ID", "CUST_GENDER"}; binTransform.setExcludeColumnList(excludedList); // Specify the type of numeric binning: equal-width or quantile ( default is quantile ) binTransform.setNumericalBinningType(binningType); // Specify the type of categorical binning as Top-N: by default it is none binTransform.setCategoricalBinningType(OraCategoricalBinningType.top_n); //Create transformation task OraTransformationTask xformTask = m_xformTaskFactory.create(binTransform); //Save transformation task object dmeConn.saveObject("JDM_BINNING_TASK", xformTask, true); //Execute transformation task asynchronously in the database ExecutionHandle execHandle = dmeConn.execute("JDM_ BINNING _TASK"); //Wait for completion of the task ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE);
Normalizing converts individual attribute values in such a way that all attribute values lie in the same range. Normally, values are converted to be in the range 0.0 to 1.0 or the range -1 to +1. Normalization ensures that attributes do not receive artificial weighting caused by differences in the ranges that they span.
The class diagram in Figure 7-10 illustrates the normalization transformation classes.
Figure 7-10 OraNormalizeTransformation Class Diagram
Here, OraNormalizeTransformation
contains all the settings required for normalization. The ODM Java API supports z-Score, min-max, and linear scale normalizations. Normalization is required for SVM, NMF, and k-Means algorithms.
The following code illustrates normalization on the view MINING_BUILD_DATA_V
.
//Create OraNormalizationFactory OraNormalizeTransformFactory normalizeXformFactory = (OraNormalizeTransformFactory)m_dmeConn.getFactory( "oracle.dmt.jdm.transform.normalize.OraNormalizeTransform"); //Create OraNormalization OraNormalizeTransform normalizeTransform = m_normalizeXformFactory.create( "MINING_DATA_BUILD_V", // name of the input data set "NORMALIZED_DATA_BUILD_V", // name of the transformation result true, // result of the transformation is a view OraNormalizeType.z_Score, //Normalize type new Integer(6) ); //Rounding number // Specify the list of excluded attributes String[] excludedList = new String[]{"CUST_ID", "CUST_GENDER"}; normalizeTransform.setExcludeColumnList(excludedList); //Create transformation task OraTransformationTask xformTask = m_xformTaskFactory.create(normalizeTransform); //Save transformation task object dmeConn.saveObject("JDM_NORMALIZE_TASK", xformTask, true); //Execute transformation task asynchronously in the database ExecutionHandle execHandle = dmeConn.execute("JDM_NORMALIZE_TASK"); //Wait for completion of the task ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE);
Some computations on attribute values can be significantly affected by extreme values. One approach to achieving a more robust computation is to either winsorize or trim the data using clipping transformations.
Winsorizing involves setting the tail values of a particular attribute to some specified value. For example, for a 90% winsorization, the bottom 5% are set equal to the minimum value in the 6th percentile, while the upper 5% are set equal to the value corresponding to the maximum value in the 95th percentile.
Trimming "removes" the tails in the sense that trimmed values are ignored in further values. This is achieved by setting the tails to NULL.
The class diagram in Figure 7-11 illustrates the clipping transformation classes.
Figure 7-11 OraClippingTransformation Class Diagram
Here, OraClippingTransformation
contains all the settings required for clipping. The ODM Java API supports winsorize and trim types of clipping.
The following code illustrates clipping on the view MINING_BUILD_DATA_V
.
//Create OraClippingTransformFactory OraClippingTransformFactory clipXformFactory = (OraClippingTransformFactory)dmeConn.getFactory( "oracle.dmt.jdm.transform.clipping.OraClippingTransform"); //Create OraClippingTransform OraClippingTransform clipTransform = clipXformFactory.create( "MINING_DATA_BUILD_V", // name of the input data set "WINSORISED_DATA_BUILD_V", // name of the transformation result true );// result of the transformation is a view //Specify the list of excluded attributes String[] excludedList = new String[]{"CUST_ID", "CUST_GENDER"}; clipTransform.setExcludeColumnList(excludedList); //Specify the type of clipping clipTransform.setClippingType(OraClippingType.winsorize); // Specify the tail fraction as 3% of values on both ends clipTransform.setTailFraction(0.03); //Create and save transformation task OraTransformationTask xformTask = xformTaskFactory.create(clipTransform); //Save transformation task object dmeConn.saveObject("JDM_CLIPPING_TASK", xformTask, true); //Execute transformation task asynchronously in the database ExecutionHandle execHandle = dmeConn.execute("JDM_CLIPPING_TASK"); //Wait for completion of the task ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE);
Text columns need to be transformed to nested table structure to do the mining on text columns. This transformation converts the text columns to nested table columns. A features table is created by text transformation. A model build text data column features table must be used for apply and test tasks to get the correct results.
The class diagram in Figure 7-12 illustrates the text transformation classes.
Figure 7-12 Text Transformation Class Diagram
Here, OraTextTransformation
is used to specify the text columns and the feature tables associated with the text columns.
The following code illustrates clipping on the table MINING_BUILD_TEXT
.
//Create OraTextTransformFactory OraTextTransformFactory textXformFactory = dmeConn.getFactory( "oracle.dmt.jdm.transform.text.OraTextTransform"); //Create OraTextTransform OraTextTransform txtXform = (OraTextTransformImpl)textXformFactory.create( "MINING_BUILD_TEXT", // name of the input data set "NESTED_TABLE_BUILD_TEXT ", // name of the transformation result "CUST_ID", //Case id column new String[] { "COMMENTS" } ); //Text column names ); //Create transformation task OraTransformationTask xformTask = m_xformTaskFactory.create(txtXform); //Save transformation task object dmeConn.saveObject("JDM_TEXTXFORM_TASK", xformTask, true); //Execute transformation task asynchronously in the database ExecutionHandle execHandle = dmeConn.execute("JDM_TEXTXFORM_TASK"); //Wait for completion of the task ExecutionStatus execStatus = execHandle.waitForCompletion (Integer.MAX_VALUE);