Skip Headers
Oracle® Data Mining Administrator's Guide
10g Release 2 (10.2)

Part Number B14338-02
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

4 Installing and Using the Demo Programs

A number of demo programs are available with Oracle Data Mining. These programs illustrate the many features of the PL/SQL API, the Data Mining SQL functions, the Java API, and the BLAST table functions.

The demo programs create a set of models in the database. You can experiment with these models using either the APIs or Oracle Data Miner. You can examine the sample source code, which includes numerous comments, to familiarize yourself with the Oracle Data Mining APIs, and you can create your own models by modifying the samples.

This chapter includes the following sections:

Requirements Checklist

Several installation and configuration steps must be completed before you can run the Data Mining demo programs. Many of these steps are explained in this chapter. Some steps, such as installing Oracle Database, are explained in other chapters of this manual.

Complete these steps before attempting to run the Data Mining demo programs:

  1. Install Oracle Database 10g Release 2 (10.2) Enterprise Edition with the sample schemas, or obtain the connection information to an existing installation. See Chapter 1.

  2. Obtain the demo programs by installing the Database Companion or by downloading from the Oracle Technology Network web site. See "Obtaining the Demo Programs".

  3. Create a user ID for running the demo programs. Grant the necessary privileges to this user. See "Creating a Demo User" and "Granting Access Rights".

  4. Populate the schema of the demo user with objects used by the demo programs. See "Populating the Schema"

  5. To run the Java demos, check the requirements described in "Preparing to Run the Java Programs".

  6. To run the BLAST demo, check the requirements described in "Preparing to Run the BLAST Demo".

Obtaining the Demo Programs

The Oracle Data Mining sample programs are provided with Oracle Database Enterprise Edition. They are also available for download from the Oracle Technology Network.

Downloading from OTN

You can download a zip file containing the demo programs and documentation from the Oracle Data Mining page on OTN. Use the following URL.

http://www.oracle.com/technology/products/bi/odm/index.html

Extract the contents of the zip file, making sure to preserve the directory structure. The files will be copied to a directory called DATA MINING demos on your computer.

The zip file contains documentation similar to the information in this chapter. To access the documentation, open the file index.html in a browser.

Ensure that the SQL programs are located in a directory where you can access them using SQL*Plus. You may want to copy the Java programs to a Java development environment such as jDeveloper.

Installing Oracle Database Companion

The Database Companion installation process copies the Oracle Data Mining demo programs, along with examples and demos of other database features, to the \RDBMS\demo subdirectory under ORACLE_HOME.

Note:

In Oracle Database 10.1, Data Mining was installed in ORACLE_HOME\dm, and the Data Mining demo programs were installed in ORACLE_HOME\dm\demo\sample.

In Oracle 10g Release 2 (10.2), Data Mining is installed in the ORACLE_HOME\RDBMS directory, and the Data Mining demo programs are installed in ORACLE_HOME\RDBMS\demo.

To install the Database Companion on a Windows platform, take these steps:

  1. From the Companion installation directory, run SETUP.EXE.

    Oracle Universal Installer opens and displays the Welcome page. Click Next to advance to each page.

  2. On the Select a Product to Install page, select Oracle Database 10g Products 10.2.0.0.0.

    Database Companion installation: select Oracle Database 10g Products
  3. On the Specify Home Details page, select the Oracle home in which you installed Oracle Database 10g Release 2. Do not rely on the default setting to be correct.

  4. On the Product-Specific Prerequisite Checks page, verify that all checks succeeded. If any checks failed, then you must correct the problem before proceeding.

  5. On the Summary page, review your previous choices, then click Install.

  6. On the End of Installation page, confirm that the installation was successful.

Verifying the Installation

You can find the SQL demos by searching for dm*.sql in ORACLE_HOME\RDBMS\demo. You can find the Java demos by searching for dm*.java.

On a Windows platform, you can use Windows File Manager to list the SQL programs as follows.

Windows File Manager: list data mining SQL demo programs

You can use Windows File Manager to list the Java programs as follows.

Windows File Manager: list data mining Java demo programs

Creating a Demo User

Data Mining demo users require several database permissions, as well as SELECT access to tables in the SH sample schema. Users that will primarily use the demo programs, and not be mining large data sets, do not need personal tablespaces.

Note:

In Oracle Database 10.1, a data mining user account, called DMUSER, was provided with the software. In Oracle Database 10g Release 2 (10.2), DMUSER is no longer provided; you must create your own data mining user accounts.

To create a demo user for Oracle Data Mining, take these steps:

  1. Create a user name and password.

  2. Run the dmshgrants script to grant the necessary privileges to the user.

  3. Run the dmsh script to populate the user's schema with objects that support the demo programs.

Using SQL*Plus to Create the User

To create the user in SQL*Plus, log in as the SYSTEM user and type a command like the following:

CREATE USER dmuser IDENTIFIED BY dmpsw
     DEFAULT TABLESPACE users
     TEMPORARY TABLESPACE temp
     QUOTA UNLIMITED on users;

This command creates the user dmuser with the password dmpsw. It provides default access to two tablespaces shared by several other sample schemas.

Using Enterprise Manager to Create the User

To create the user in Enterprise Manager, follow steps like these:

  1. Open Enterprise Manager and log in as the SYSTEM user.

  2. On the Administration page, choose Users.

  3. On the Users page, click the Create button.

  4. On the Create Users page, provide information like this.

    Enterprise Manager: create user for data mining demo programs

    You can use the Quota page to grant unlimited quota to the USERS schema.

Granting Access Rights

The dmshgrants script grants database privileges to a demo user. These privileges are the same as those listed in "Access Rights".

The dmshgrants script is provided by the Database Companion installation. You can locate it in the \rdbms\demo subdirectory of ORACLE_HOME. The dmshgrants script is also included with the Data Mining demos in the zip file you can download from OTN.

In addition to granting database privileges, dmshgrants grants SELECT rights on these tables in the SH schema:


COUNTRIES
CUSTOMERS
PRODUCTS
SUPPLEMENTARY_DEMOGRAPHICS
SALES

For text mining, dmshgrants grants access rights to an Oracle Text package:


EXECUTE ON ctxsys.ctx_ddl

If you do not have access to dmshgrants, you can set permissions using the SQL GRANT command or Enterprise Manager Database Control.

Run DMSHGRANTS

The following command, executed in SQL*Plus on a Windows platform, passes two arguments to dmshgrants: The password for the SH user (shpsw in this case) and the user name receiving the access rights (dmuser).

@%ORACLE_HOME%\rdbms\demo\dmshgrants shpsw dmuser

On Linux, this is the equivalent command:

@$ORACLE_HOME/rdbms/demo/dmshgrants shpsw dmuser

Populating the Schema

The dmsh script populates the schema of the demo user with tables and views that reference the source data in the SH schema. It also creates tables and indexes that support text mining.

The dmsh script, like dmshgrants, is provided by the Database Companion installation. and is included with the Data Mining demos in the zip file you can download from OTN.

Run DMSH

To run dmsh, log in to SQL*Plus as the data mining user (in this case, dmuser with the password dmpsw).

SQL>Connect dmuser/dmpsw
SQL>@%ORACLE_HOME%\rdbms\demo\dmsh
SQL>commit;

Sample Schemas

The Oracle Data Mining demo programs use data in the SH sample schema. If the sample schemas were not originally installed in the database, you must install SH.

To install the sample schemas after installation of Oracle Database, use Database Configuration Assistant. For an example that uses Database Configuration Assistant, see "Adding the Data Mining Option to a Database".

To install SH without the other sample schemas, run the sh_main script. The sh_main.sql file is located in \demo\schema\sales_history in the Oracle home directory. You must run the script as SYS.

See Also:

Oracle Database Sample Schemas for more details about installing the sample schemas.

A Sample User Scenario

Let's assume that you have access to Oracle Database 10g Release 2 (10.2), which has been installed on a Windows host with the Data Mining option, the sample schemas, and the sample programs. You can log in to this database as SYS using SQL*Plus.

Several Data Mining users (dmuser1 and dmuser2) have already been created in this database. You want to create your own user ID (dmuser3) and run the demo programs to create some starter models in your schema. To accomplish this, you would perform the following steps.

  1. Log in to the database as SYS and create the dmuser3 user.

    > sqlplus
        Enter user-name: sys / as sysdba
        Enter password: sys_password
    SQL> create user dmuser3 identified by dmuser3_password 
         default tablespace users temporary tablespace temp 
         quota unlimited on users;
    
    
  2. Run dmshgrants.sql to grant privileges to dmuser3.

    SQL> @ %ORACLE_HOME%\rdbms\demo\dmshgrants SH_password dmuser3
    
    
  3. Connect as dmuser3 and run dmsh.sql to populate the dmuser3 schema with views and tables needed by the demo programs. Save your changes in the database.

    SQL> connect dmuser3/dmuser3_password
    SQL> @ %ORACLE_HOME%\rdbms\demo\dmsh
    SQL> commit;
    
    

You can now run any of the PL/SQL data mining demos. For example, while logged in to SQL*Plus as dmuser3, you could run the Naive Bayes PL/SQL demo with the following statement.

SQL>@ %ORACLE_HOME%\rdbms\demo\dmnbdemo

You can also run any of the Java data mining demos, if you have Java 1.4.2 or higher and your CLASSPATH is set as described in "Preparing to Run the Java Programs". For example, you could run the Naive Bayes Java demo with a command like the following at the operating system prompt.

>java dmnbdemo myserver:1521:orcl dmuser3 dmuser3_password

Note that the BLAST demo uses different data sets and requires a separate setup procedure. See "Preparing to Run the BLAST Demo".

Note:

All the demo programs are re-executable. They start by deleting the results of the previous run before executing the current run.

Examining the Data

The dmsh.sql script creates views, tables, and indexes in the user's schema. The views define columns of customer data from tables in the SH schema. This data is used by the Data Mining demo programs. The tables reference the same columns in SH, but they include an extra COMMENTS column for text mining. The indexes are used to extract terms from the text in the COMMENTS column and build a nested table column.

Customer Data for Data Mining

Views in the data mining user's schema define columns of data from the CUSTOMERS, SALES, PRODUCTS, COUNTRIES, and SUPPLEMENTARY_DEMOGRAPHICS tables in the SH schema. You can list these views with the following SQL statements.

SQL>connect dmuser3/dmuser3_password
SQL>select view_name from user_views;

The views are listed in Table 4-1.

Table 4-1 Views Used by the Data Mining Demo Programs

View Name Description

MINING_DATA_APPLY_STR_V

Scoring data for o-cluster

MINING_DATA_BUILD_STR_V

Build data for o-cluster

MINING_DATA_APPLY_V

Scoring data for data mining (not text mining)

MINING_DATA_BUILD_V

Build data for data mining (not text mining)

MINING_DATA_TEST_V

Test data for data mining (not text mining)

MARKET_BASKET_V

Data for association rules

MINING_DATA_ONE_CLASS_V

Data for one-class SVM


You can see the references to tables in SH by listing the view definitions. The definition of the view MINING_DATA_BUILD_V is shown as follows.

SQL>set long 1000000
SQL>set longc 100000
SQL>select text from all_views where 
    owner='DMUSER3'and view_name='MINING_DATA_BUILD_V';

      SELECT a.CUST_ID, a.CUST_GENDER, 2003-a.CUST_YEAR_OF_BIRTH AGE, 
             a.CUST_MARITAL_STATUS, c.COUNTRY_NAME, a.CUST_INCOME_LEVEL,
             b.EDUCATION, b.OCC UPATION, b.HOUSEHOLD_SIZE, b.YRS_RESIDENCE,
             b.AFFINITY_CARD, b.BULK_PACK_DISKETTES, b.FLAT_PANEL_MONITOR,
             b.HOME_THEATER_PACKAGE, b.BOOKKEEPING _APPLICATION, 
             b.PRINTER_SUPPLIES, b.Y_BOX_GAMES, b.OS_DOC_SET_KANJI 
       FROM  sh.customers a, 
             sh.supplementary_demographics b, 
             sh.countries c 
       WHERE a.CUST_ID = b.CUST_ID AND a.country_id = c.country_id 
             AND a.cust_id between 101501 and 103000 

The views are used to build, test, and score the sample models. Each view has a CUSTOMER_ID column, which is the case ID, and an AFFINITY_CARD column, which is the target used by the predictive models. Most of the views provide data for 1500 customers (1500 rows). The view used by the one-class SVM model has data for 940 customers.

The columns of build data in the MINING_DATA_BUILD_V view are listed in the following example.

SQL> describe MINING_DATA_BUILD_V

CUST_ID                    NOT NULL            NUMBER
CUST_GENDER                NOT NULL            CHAR(1)
AGE                                            NUMBER
CUST_MARITAL_STATUS                            VARCHAR2(20)
COUNTRY_NAME               NOT NULL            VARCHAR2(40)
CUST_INCOME_LEVEL                              VARCHAR2(30)
EDUCATION                                      VARCHAR2(21)
OCCUPATION                                     VARCHAR2(21)
HOUSEHOLD_SIZE                                 VARCHAR2(21)
YRS_RESIDENCE                                  NUMBER
AFFINITY_CARD                                  NUMBER(10)
BULK_PACK_DISKETTES                            NUMBER(10)
FLAT_PANEL_MONITOR                             NUMBER(10)
HOME_THEATER_PACKAGE                           NUMBER(10)
BOOKKEEPING_APPLICATION                        NUMBER(10)
PRINTER_SUPPLIES                               NUMBER(10)
Y_BOX_GAMES                                    NUMBER(10)
OS_DOC_SET_KANJI                               NUMBER(10)

Market Basket Data for Association Rules

The association rules demos use the MARKET_BASKET_V data set, which includes columns of products from the PRODUCTS table and the CUSTOMER_ID column from the CUSTOMERS table in SH. The columns of the MARKET_BASKET_V view are listed in the following example.

SQL> describe MARKET_BASKET_V

CUST_ID                     NOT NULL         NUMBER
EXTENSION_CABLE                              NUMBER
FLAT_PANEL_MONITOR                           NUMBER
CD_RW_HIGH_SPEED_5_PACK                      NUMBER
ENVOY_256MB_40GB                             NUMBER
ENVOY_AMBASSADOR                             NUMBER
EXTERNAL_8X_CD_ROM                           NUMBER
KEYBOARD_WRIST_REST                          NUMBER
SM26273_BLACK_INK_CARTRIDGE                  NUMBER
MOUSE_PAD                                    NUMBER
MULTIMEDIA_SPEAKERS_3INCH                    NUMBER
OS_DOC_SET_ENGLISH                           NUMBER
SIMM_16MB_PCMCIAII_CARD                      NUMBER
STANDARD_MOUSE                               NUMBER 

Customer Data for Text Mining

The text mining demos use the same customer data from tables in SH, but they include either an extra text column or a collection type column. The collection type is a nested table of type DM_NESTED_NUMERICALS.

You can list these tables with the following SQL statements.

SQL>connect dmuser3/dmuser3_password
SQL>select table_name from user_tables where table_name like '%MINING%';

The tables are listed in Table 4-2.

Table 4-2 Tables Used by the Text Mining Demo Programs

Table Name Description

MINING_APPLY_NESTED_TEXT

Apply table with COMMENTS column as DM_NESTED_NUMERICALS

MINING_BUILD_NESTED_TEXT

Build table with COMMENTS column as DM_NESTED_NUMERICALS

MINING_TEST_NESTED_TEXT

Test table with COMMENTS column as DM_NESTED_NUMERICALS

MINING_APPLY_TEXT

Apply table with COMMENTS column as VARCHAR2(4000)

MINING_BUILD_TEXT

Build table with COMMENTS column as VARCHAR2(4000)

MINING_TEST_TEXT

Test table with COMMENTS column as VARCHAR2(4000)


In the MINING_BUILD_TEXT, MINING_TEST_TEXT, and MINING_APPLY_TEXT tables, the COMMENTS column is of type VARCHAR2(4000).

SQL> describe MINING_BUILD_TEXT
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 CUST_ID                                   NOT NULL NUMBER
 CUST_GENDER                               NOT NULL CHAR(1)
 AGE                                                NUMBER
 CUST_MARITAL_STATUS                                VARCHAR2(20)
 COUNTRY_NAME                              NOT NULL VARCHAR2(40)
 CUST_INCOME_LEVEL                                  VARCHAR2(30)
 EDUCATION                                          VARCHAR2(21)
 OCCUPATION                                         VARCHAR2(21)
 HOUSEHOLD_SIZE                                     VARCHAR2(21)
 YRS_RESIDENCE                                      NUMBER
 AFFINITY_CARD                                      NUMBER(10)
 BULK_PACK_DISKETTES                                NUMBER(10)
 FLAT_PANEL_MONITOR                                 NUMBER(10)
 HOME_THEATER_PACKAGE                               NUMBER(10)
 BOOKKEEPING_APPLICATION                            NUMBER(10)
 PRINTER_SUPPLIES                                   NUMBER(10)
 Y_BOX_GAMES                                        NUMBER(10)
 OS_DOC_SET_KANJI                                   NUMBER(10)
 COMMENTS                                           VARCHAR2(4000)

In the MINING_*_NESTED_TEXT tables, the COMMENTS column is of type DM_NESTED_NUMERICALS.

SQL> describe MINING_BUILD_NESTED_TEXT
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 CUST_ID                                   NOT NULL NUMBER
 CUST_GENDER                               NOT NULL CHAR(1)
 AGE                                                NUMBER
 CUST_MARITAL_STATUS                                VARCHAR2(20)
 COUNTRY_NAME                              NOT NULL VARCHAR2(40)
 CUST_INCOME_LEVEL                                  VARCHAR2(30)
 EDUCATION                                          VARCHAR2(21)
 OCCUPATION                                         VARCHAR2(21)
 HOUSEHOLD_SIZE                                     VARCHAR2(21)
 YRS_RESIDENCE                                      NUMBER
 AFFINITY_CARD                                      NUMBER(10)
 BULK_PACK_DISKETTES                                NUMBER(10)
 FLAT_PANEL_MONITOR                                 NUMBER(10)
 HOME_THEATER_PACKAGE                               NUMBER(10)
 BOOKKEEPING_APPLICATION                            NUMBER(10)
 PRINTER_SUPPLIES                                   NUMBER(10)
 Y_BOX_GAMES                                        NUMBER(10)
 OS_DOC_SET_KANJI                                   NUMBER(10)
 COMMENTS                                           DMSYS.DM_NESTED_NUMERICALS

The process of extracting terms from a text column into a nested table column is described in "Text Mining Demo Programs" and in the Oracle Data Mining Application Developer's Guide.

PL/SQL Demo Programs

The PL/SQL demo programs illustrate the use of the DBMS_DATA_MINING package for creating models and the DBMS_DATA_MINING_TRANSFORM package for performing transformations on the mining data.

Table 4-3 lists the PL/SQL demo programs.

Table 4-3 Mining Functions in PL/SQL demos

Mining Function Algorithm Program File

Association Rules

Apriori

dmardemo.sql

Attribute Importance

Minimum Descriptor Length

dmaidemo.sql

Classification

Adaptive Bayes Network

dmabdemo.sql

Classification

Decision Tree

dmdtdemo.sql

Classification

Decision Tree (cross validation)

dmdtxvlddemo.sql

Classification

Naive Bayes

dmnbdemo.sql

Classification

Support Vector Machines

dmsvcdem.sql

Classification

Support Vector Machines (One Class)

dmsvodem.sql

Clustering

k-Means

dmkmdemo.sql

Clustering

O-Cluster

dmocdemo.sql

Feature Extraction

Non-Negative Matrix Factorization

dmnmdemo.sql

Regression

Support Vector Machines

dmsvrdem.sql

Text Mining

Term extraction using CTX procedures

dmtxtfe.sql

Text Mining

Non-Negative Matrix Factorization

dmtxtnmf.sql

Text Mining

Support Vector Machines (Classification)

dmtxtsvm.sql


Note:

The PL/SQL text mining demos are described in "Text Mining in PL/SQL".

See Also:

Oracle Database PL/SQL Packages and Types Reference and Oracle Data Mining Application Developer's Guide for information on the Oracle Data Mining PL/SQL API. See Oracle Database SQL Reference for information on the SQL functions for data mining.

PL/SQL Program Summaries

Summary descriptions of the PL/SQL demo programs are provided in Table 4-4. For detailed descriptions of the demo programs, see the comments in the source code.

Table 4-4 Overview of the PL/SQL Demo Programs

Mining Function Description

Classification

The classification programs demonstrate various preprocessing techniques and perform the following steps:

  • Build a classification model using training data

  • Display model details and settings

  • Test the model by applying the model on the test data

  • Present test metrics, such as confusion matrix, lift, and ROC

  • Apply the model on the scoring data

  • Present apply results

  • Present ranked apply results, influenced by a cost matrix

The dmdtxvlddemo.sql program demonstrates cross-validation techniques for decision tree based-classification. With minor modifications, this program can be used to perform cross validation using other models/algorithms.

Regression

dmsvrdem.sql uses different test metrics, but otherwise performs most of the same steps used in the classification programs. Selected attributes of the input data are preprocessed (normalized).

Association

dmardemo.sql builds an association model and presents frequent itemsets and association rules as output. Selected attributes of the input data are preprocessed (binned).

Clustering

dmkmdemo.sql (k-Means) and dmocdemo.sql (0-cluster) build clustering models and present cluster details, such as rules, centroid, and histogram for each cluster as output. The models are scored, and the probabilities associated with each cluster are returned as output. Selected attributes of the input data are preprocessed (normalized).

Feature Extraction

dmnmdemo.sql builds a feature extraction model and presents model details as the output. The model is scored, and each feature ID is associated with a probability. Selected attributes of the input data are preprocessed (normalized).

Attribute Importance

dmaidemo.sql builds an attribute importance model and presents a list of important attributes as the output of model details. Selected attributes of the input data are preprocessed (binned).


Data Mining SQL Functions

Note:

The SQL functions for Data Mining are documented in Oracle Database SQL Reference. Information about these functions is also provided in Oracle Data Mining Application Developer's Guide, where they are referred to as SQL scoring functions.

Some of the PL/SQL demo programs use Data Mining SQL functions to apply models created with the DBMS_DATA_MINING package. The Data Mining functions can also be used to apply models created with the Java API.

The programs that demonstrate the Data Mining functions are listed in Table 4-5.

Table 4-5 Data Mining SQL Functions in the Demo Programs

Program Name Algorithm SQL Functions Used

dmkmdemo.sql

k-Means

CLUSTER_IDCLUSTER_PROBABILITYCLUSTER_SET

dmocdemo.sql

O-Cluster

CLUSTER_ID

dmnmdemo.sql

NMF

FEATURE_IDFEATURE_SETFEATURE_VALUE

dmdtdemo.sql

Decision Tree

PREDICTIONPREDICTION_COSTPREDICTION_DETAILSPREDICTION_SET

dmsvcdem.sql

SVM Classification

PREDICTIONPREDICTION_PROBABILITYPREDICTION_SET

dmsvodem.sql

SVM One Class

PREDICTIONPREDICTION_PROBABILITY

dmsvrdem.sql

SVM Regression

PREDICTION

dmtxtsvm.sql

Text Mining

PREDICTIONPREDICTION_PROBABILITY


Running the PL/SQL Programs

In SQL*Plus, use commands like the following to execute the sample programs and list the models created by them.

>sqlplus dmuser3/dmuser3_password
SQL> set serveroutput on
SQL> set echo on
SQL> @ %ORACLE_HOME%\rdbms\demo\program_name
.
.
.
.
SQL> set linesize 200
SQL> set pagesize 100 
SQL> select NAME, FUNCTION_NAME, ALGORITHM_NAME, TARGET_ATTRIBUTE from DM_USER_MODELS; 
 
NAME                   FUNCTION_NAME           ALGORITHM_NAME                  TARGET_ATTRIBUTE
---------------------  ----------------------  ------------------------------  -----------------
T_NMF_SAMPLE           FEATURE_EXTRACTION      NONNEGATIVE_MATRIX_FACTOR                    
T_SVM_CLAS_SAMPLE      CLASSIFICATION          SUPPORT_VECTOR_MACHINES         AFFINITY_CARD
AR_SH_SAMPLE           ASSOCIATION_RULES       APRIORI_ASSOCIATION_RULES                    
AI_SH_SAMPLE           ATTRIBUTE_IMPORTANCE    MINIMUM_DESCRIPTION_LENGTH      AFFINITY_CARD
ABN_SH_CLAS_SAMPLE     CLASSIFICATION          ADAPTIVE_BAYES_NETWORK          AFFINITY_CARD
DT_SH_CLAS_SAMPLE      CLASSIFICATION          DECISION_TREE                   AFFINITY_CARD
NB_SH_CLAS_SAMPLE      CLASSIFICATION          NAIVE_BAYES                     AFFINITY_CARD
SVMC_SH_CLAS_SAMPLE    CLASSIFICATION          SUPPORT_VECTOR_MACHINES         AFFINITY_CARD
OC_SH_CLUS_SAMPLE      CLUSTERING              O_CLUSTER                                    
KM_SH_CLUS_SAMPLE      CLUSTERING              KMEANS                                       
NMF_SH_SAMPLE          FEATURE_EXTRACTION      NONNEGATIVE_MATRIX_FACTOR                    
SVMR_SH_REGR_SAMPLE    REGRESSION              SUPPORT_VECTOR_MACHINES         AGE          

Java Demo Programs

The Java demos illustrate the features of the Oracle Data Mining Java API, which implements Oracle-specific extensions to the Java Data Mining (JDM) 1.0 standard.

The Java demos demonstrate data preprocessing and the basic mining functions. Additional Java samples demonstrate predictive analytics, import/export, and text mining.

Table 4-6 lists the Java programs that illustrate the basic mining functions.

Table 4-6 Mining Functions in Java Demos

Mining Function Algorithm Program File

Association Rules

Apriori

dmardemo.java

Attribute Importance

Minimum Descriptor Length

dmaidemo.java

Classification

Adaptive Bayes Network

dmabdemo.java

Classification

Decision Trees

dmtreedemo.java

Classification

Naive Bayes

dmnbdemo.java

Classification

Support Vector Machines

dmsvcdemo.java

Classification

Support Vector Machines (one class)

dmsvodemo.java

Clustering

K-Means

dmkmdemo.java

Clustering

O-Cluster

dmocdemo.java

Feature Extraction

Non-Negative Matrix Factorization

dmnmdemo.java

Regression

Support Vector Machines

dmsvrdemo.java

Text Mining

Non-Negative Matrix Factorization

dmtxtnmfdemo.java

Text Mining

Support Vector Machines (Classification)

dmtxtsvmdemo.java


Table 4-7 lists the Java programs that illustrate special mining tasks. These features are all supported in the PL/SQL API as well, since the Java API is layered on the PL/SQL API.

Table 4-7 Mining Tasks in Java Demos

Mining Task Description Demo Program File

Data Transformations

Binning, clipping, and normalization

dmxfdemo.java

Predictive Analytics

Automated predict and explain

dmpademo.java

Model Export/Import

To/from Data Pump dump file

dmexpimpdemo.java

Classification Model Scoring

Ways of applying an NB model

dmapplydemo.java


Note:

The Java text mining demos are described in "Text Mining in Java".

See Also:

Oracle Data Mining Java API Reference (javadoc) and the Oracle Data Mining Application Developer's Guide for information on the Java API.

Java Program Summaries

Summary descriptions of the Java demo programs are provided in Table 4-8. For detailed descriptions, see the comments in the source code.

Table 4-8 Overview of the Java Demo Programs

Mining Function or Task Description

Classification

The classification programs demonstrate various preprocessing techniques and perform the following steps:

  • Build a classification model using training data

  • Display model details and settings

  • Test the model by applying the model on the test data

  • Present test metrics, such as confusion matrix, lift, and ROC

  • Apply the model on the scoring data

  • Present apply results

  • Present ranked apply results, influenced by a cost matrix

The dmapplydemo.java program demonstrates several ways of applying a Naive Bayes model.

Regression

dmsvrdemo.java uses different test metrics, but otherwise performs most of the same steps used in the classification programs. Selected attributes of the input data are preprocessed (normalized).

Association

dmardemo.java builds an association model and presents frequent itemsets and association rules as output. Selected attributes of the input data are preprocessed (binned).

Clustering

dmkmdemo.java (k-Means) and dmocdemo.java (0-cluster) build clustering models and present cluster details, such as rules, centroid, and histogram for each cluster as output. The models are scored, and the probabilities associated with each cluster are returned as output. Selected attributes of the input data are preprocessed (normalized).

Feature Extraction

dmnmdemo.java builds a feature extraction model and presents model details as the output. The model is scored, and each feature ID is associated with a probability. Selected attributes of the input data are preprocessed (normalized).

Attribute Importance

dmaidemo.java builds an attribute importance model and presents a list of important attributes as the output of model details. Selected attributes of the input data are preprocessed (binned).

Data Transformations

dmxfdemo.java demonstrates binning, clipping, and normalization transformations.

Predictive Analytics

dmpademo.java demonstrates PREDICT and EXPLAIN functions.

Model Import/Export

dmexpimpdemo.java builds a Naive Bayes model, exports it to a dump file, then imports it from the dump file.


Preparing to Run the Java Programs

Before running the Java programs, do the following:

  1. Check that the version of Java you are using is 1.4.2. You can execute the following in a command window to check the version of Java:

    >java -version
    
    
  2. Ensure that your CLASSPATH includes the following JAR files:

    %ORACLE_HOME%\rdbms\jlib\jdm.jar
                %ORACLE_HOME%\rdbms\jlib\ojdm_api.jar
                %ORACLE_HOME%\rdbms\jlib\xdb.jar
                %ORACLE_HOME%\jdbc\lib\ojdbc14.jar
                %ORACLE_HOME%\oc4j\j2ee\home\lib\connector.jar
                %ORACLE_HOME%\jlib\orai18n.jar   
                %ORACLE_HOME%\jlib\orai18n-mapping.jar
                %ORACLE_HOME%\lib\xmlparserv2.jar
    
    
  3. Compile the programs. Use the javac executable in the \jdk\bin\ directory within Oracle home.

    >javac program_name
    

Running the Java Programs

Use the following syntax to execute the Java demo programs. Use the java executable in the \jdk\bin\ directory within Oracle home.

>java program_name connect_string user_name user_password

The connection string specifies your Oracle database connection. It identifies the machine hosting the database, the port through which the connection is made, and the name of the database instance (the Oracle system identifier).

host_name:port:SID

For example, the following command executes the Naive Bayes demo dmnbdemo.java as dmuser3 in the database instance orcl on host mach05 at port 1521.

>java dmnbdemo mach05:1521:orcl dmuser3 dmuser3_password

You can list the models created by the Java programs with commands like the following in SQL*Plus.

>sqlplus dmuser3/dmuser3_password
SQL> set linesize 200
SQL> set pagesize 100 
SQL> select NAME, FUNCTION_NAME, ALGORITHM_NAME, TARGET_ATTRIBUTE 
            from dm_user_models where NAME like '%JDM'; 

NAME                FUNCTION_NAME          ALGORITHM_NAME                 TARGET_ATTRIBUTE
-----------------   ---------------------  ----------------------------   ------------------
TXTNMFMODEL_JDM     FEATURE_EXTRACTION     NONNEGATIVE_MATRIX_FACTOR
ARMODEL_JDM         ASSOCIATION_RULES      APRIORI_ASSOCIATION_RULES
TREEMODEL_JDM       CLASSIFICATION         DECISION_TREE                  AFFINITY_CARD
AIMODEL_JDM         ATTRIBUTE_IMPORTANCE   MINIMUM_DESCRIPTION_LENGTH     AFFINITY_CARD
ABNMODEL_JDM        CLASSIFICATION         ADAPTIVE_BAYES_NETWORK         AFFINITY_CARD
NBMODEL_JDM         CLASSIFICATION         NAIVE_BAYES                    AFFINITY_CARD
SVMCMODEL_JDM       CLASSIFICATION         SUPPORT_VECTOR_MACHINES        AFFINITY_CARD
SVMOMODEL_JDM       CLASSIFICATION         SUPPORT_VECTOR_MACHINES
KMMODEL_JDM         CLUSTERING             KMEANS
OCMODEL_JDM         CLUSTERING             O_CLUSTER
NMFMODEL_JDM        FEATURE_EXTRACTION     NONNEGATIVE_MATRIX_FACTOR
SVMRMODEL_JDM       REGRESSION             SUPPORT_VECTOR_MACHINES        AGE
TXTSVMMODEL_JDM     CLASSIFICATION         SUPPORT_VECTOR_MACHINES        AFFINITY_CARD

Text Mining Demo Programs

Oracle Data Mining can mine text columns that have undergone pre-processing by Oracle Text routines.

Oracle Text is a technology for building text query and document classification applications. It provides indexing, word and theme searching, and viewing capabilities for text. Oracle Text is included in a general installation of Oracle Database Enterprise Edition, and therefore is already present in a database installed according to the instructions in Chapter 1.

The pre-processing steps for text mining create nested table columns of type DM_NESTED_NUMERICALS from columns of type VARCHAR2 or CLOB. Each row of the nested table specifies an attribute name and a value. The type definition is as follows.

CREATE OR REPLACE TYPE DM_NESTED_NUMERICAL AS OBJECT
  (attribute_name VARCHAR2(30),
   value          NUMBER)
/
CREATE OR REPLACE TYPE DM_NESTED_NUMERICALS AS TABLE OF DM_NESTED_NUMERICAL

Terms extracted from text documents into nested tables can become generic attributes in training or scoring data. Classification, clustering, and feature-extraction models can be built using these attributes.

Sample text mining programs in both PL/SQL and Java illustrate classification and feature extraction of a pre-processed text column.

Text Mining in PL/SQL

Three PL/SQL demo programs illustrate the process of text mining. One program illustrates the pre-processing that is required to prepare the data for mining. The other two programs build models that use the transformed text.

Text Transformation Demo

To prepare a column for text mining using the PL/SQL API, you must use Oracle Text routines to perform the following general steps:

  1. Create a domain index on the column.

  2. Use the index to extract terms from the column to a temporary table.

  3. Populate a column of type DM_NESTED_NUMERICALS with the terms in the temporary table.

The process of term extraction using Oracle Text is illustrated in the demo program dmtxtfe.sql. The source code contains extensive comments that explain the steps involved in transforming text into a set of features that can be mined using Oracle Data Mining.

More details about text transformation are provided in the Oracle Data Mining Application Developer's Guide.

Text Transformation for the PL/SQL Text Mining Demo Programs

The dmsh.sql script performs the text transformation required by the PL/SQL text mining demos. There are two such demos: dmtxtnmf.sql, which builds a feature extraction model using Non-Negative Matrix Factorization, and dmtxtsvm.sql, which builds a classification model using Support Vector Machines. Both of these programs use the tables that have a nested table column of comment data.

MINING_BUILD_NESTED_TEXT
MINING_TEST_NESTED_TEXT
MINING_APPLY_NESTED_TEXT

The Sample Text Mining Models (PL/SQL)

You can run the PL/SQL text mining demo programs, dmtxtnmf.sql and dmtxtsvm.sql, like the other PL/SQL programs. The models created by these programs are listed in the following example.

SQL> @ %ORACLE_HOME%\rdbms\demo\dmtxtnmf.sql
SQL> @ %ORACLE_HOME%\rdbms\demo\dmtxtsvm.sql
SQL> select NAME, FUNCTION_NAME, ALGORITHM_NAME, TARGET_ATTRIBUTE  
           from dm_user_models;NAME               FUNCTION_NAME      ALGORITHM_NAME            TARGET_ATTRIBUTE ----------------   ------------------ ------------------------  ----------------T_NMF_SAMPLE       FEATURE_EXTRACTION NONNEGATIVE_MATRIX_FACTOR T_SVM_CLAS_SAMPLE  CLASSIFICATION     SUPPORT_VECTOR_MACHINES   AFFINITY_CARD

Text Mining in Java

Two Java demo programs illustrate the process of text mining. One builds a feature extraction model, the other builds a classification model.

Text Transformation for the Java Demo Text Mining Programs

The Oracle Data Mining Java API provides an interface that handles the term extraction process. If you are developing data mining applications in Java, you do not need to use Oracle Text directly. However, you must ensure that Oracle Text is present in the database.

The OraTextTransform interface is used to perform text transformation within the Java text mining demos. There are two such demos: dmtxtnmfdemo.java, which builds a feature extraction model using Non-Negative Matrix Factorization, and dmtxtsvmdemo.java, which builds a classification model using Support Vector Machines. Both of these programs create build, test, and apply data sets from the tables that have a text column of comment data.

MINING_BUILD_TEXT
MINING_TEST_TEXT
MINING_APPLY_TEXT

The Sample Text Mining Models (Java)

You can run the Java text mining demo programs, dmtxtnmfdemo.java and dmtxtsvmdemo.java, like the other Java programs. The models created by these programs are shown in the following example.

> java dmtxtnmfdemo host:port:SID dmuser3 dmuser3_password
> java dmtxtsvmdemo host:port:SID dmuser3 dmuser3_password
>sqlplus dmuser3/dmuser3_password
SQL> select NAME, FUNCTION_NAME, ALGORITHM_NAME, TARGET_ATTRIBUTE 
           from dm_user_models;NAME               FUNCTION_NAME       ALGORITHM_NAME            TARGET_ATTRIBUTE ----------------   ------------------  ------------------------  ----------------txtnmfModel_jdm    FEATURE_EXTRACTION  NONNEGATIVE_MATRIX_FACTOR txtsvmModel_jdm    CLASSIFICATION      SUPPORT_VECTOR_MACHINES   AFFINITY_CARD

BLAST Demo Program

The Oracle implementation of the Basic Local Alignment Search Tool (BLAST) is demonstrated in the demo program, dmbldemo.sql. This program provides examples of sequence matching queries using the BLAST table functions.

Note:

The BLAST algorithm detects local alignments in nucleotide and protein databases. BLAST performs a kind of data mining, since it finds regions of similarity embedded in otherwise unrelated sequences. However, BLAST does not use Oracle Data Mining technology. It is a completely independent tool.

For more information about the Oracle implementation of BLAST, see Oracle Data Mining Application Developer's Guide

The BLAST demo program and configuration scripts are listed in Table 4-9.

Table 4-9 BLAST Demo Files

File Name Description

dmbldemo.sql

BLAST demo program.

dmblcoli.sql

Script that creates the Ecoli data set.

dmblprot.sql

Script that creates the Swiss Protein data set.

dmblprot.txt

Text for SQL*Loader to load into the Swiss Protein data set.

dmbl.ctl

Control file for SQL*Loader to use in loading the Swiss Protein data set.


Preparing to Run the BLAST Demo

The demo BLAST table functions in dmbldemo.sql use two data sets: SwissProt and ecoli10. To prepare these data sets, log in to SQL*Plus as the data mining user and run the dmblprot and dmblcoli scripts as shown in the following example.

SQL>connect dmuser3/dmuser3_password
SQL>@ %ORACLE_HOME%\rdbms\demo\dmblprot.sql
SQL>@ %ORACLE_HOME%\rdbms\demo\dmblcoli.sql

Exit SQL*Plus and use the SQL*Loader utility to load data into the SwissProt database in the schema of the data mining user. From the command prompt, change to the \rdbms\demo directory under Oracle home and execute the following command.

>sqlldir dmuser3/dmuser3_password 
         control=dmblprot.ctl data=dmblprot.txt log=dmblprot.log

Running the BLAST Table Functions

The demo program dmbldemo.sql contains multiple invocations of BLAST table functions. You can run them all at once by running the dmbldemo.sql script, or you can copy individual table functions to the SQL*Plus command line and execute them individually.

To run the demo program, log in to SQL*Plus as the data mining user and run the dmbldemo.sql script as shown in the following example.

SQL>connect dmuser3/dmuser3_password
SQL> @ %ORACLE_HOME%\rdbms\demo\dmbldemo.sql

The following example shows how you could execute the sample BLASTP_MATCH table function at the SQL*Plus command line. BLASTP_MATCH is run against all the human proteins in the SwissProt database.

SQL>connect dmuser3/dmuser3_password
SQL>SET serveroutput ON
SQL>SET trimspool ON  
SQL>SET pages 10000
SQL>column seq_id format a10
SQL>select T_SEQ_ID AS seq_id, score, EXPECT as evalue
         from TABLE(
              BLASTP_MATCH (
                   (select sequence from query_db),
                    CURSOR(SELECT seq_id, seq_data 
                           FROM swissprot
                           WHERE organism = 'Homo sapiens (Human)'),
                   1,
                  -1,
                   0,
                   0,
                  'BLOSUM62',
                  10,
                   0,
                   0,
                   0,
                   0,
                   0)
          )
     order by score,seq_id;

The output of this table function is shown as follows.

SEQ_ID          SCORE     EVALUE
---------- ---------- ----------
P58107             49 7.24297332
P31947            169 8.8130E-14
P27348            198 3.8228E-17
Q04917            198 3.8228E-17
P31946            205 5.8977E-18