Oracle® XML Developer's Kit Programmer's Guide 10g Release 2 (10.2) Part Number B14252-01 |
|
|
View PDF |
This chapter contains these topics:
This section contains the following topics:
This chapter assumes that you are familiar with the following topics:
XML Pipeline Definition Language. This XML vocabulary enables you to describe the processing relations between XML resources. If you require a more thorough introduction to the Pipeline Definition Language, consult the XML resources listed in "Related Documents" of the preface.
Document Object Model (DOM). DOM is an in-memory tree representation of the structure of an XML document.
Simple API for XML (SAX). SAX is a standard for event-based XML parsing.
XML Schema language. Refer to Chapter 5, "Using the Schema Processor for Java" for an overview and links to suggested reading.
The Oracle XML Pipeline processor is based on the W3C XML Pipeline Definition Language Version 1.0 Note. The W3C Note defines an XML vocabulary rather than an API. You can find the Pipeline specification at the following URL:
http://www.w3.org/TR/xml-pipeline/
"Pipeline Definition Language Standard for the XDK for Java" describes the differences between the Oracle XDK implementation of the Oracle XML Pipeline processor and the W3C Note.
The Oracle XML Pipeline processor is built on the XML Pipeline Definition Language. The processor can take an input XML pipeline document and execute pipeline processes according to derived dependencies. A pipeline document, which is written in XML, specifies the processes to be executed in a declarative manner. You can associate Java classes with processes by using the <processdef/>
element in the pipeline document.
Use the Pipeline processor for mutistage processing, which occurs when you process XML components sequentially or in parallel. The output of one stage of processing can become the input of another stage of processing. You can write a pipeline document that defines the inputs and outputs of the processes. Figure 7-1 illustrates a possible pipeline sequence.
In addition to the XML Pipeline processor itself, the XDK provides an API for processes that you can pipe together in a pipeline document. Table 7-2 summarizes the classes provided in the oracle.xml.pipeline.processes
package.
The typical stages of processing XML in a pipeline are as follows:
Parse the input XML documents. The oracle.xml.pipeline.processes
package includes DOMParserProcess
for DOM parsing and SAXParserProcess
for SAX parsing.
Validate the input XML documents.
Serialize or transform the input documents. Note that the Pipeline processor does not enable you to connect the SAX parser to the XSLT processor, which requires a DOM.
In multistage processing, SAX is ideal for filtering and searching large XML documents. You should use DOM when you need to change XML content or require efficient dynamic access to the content.
See Also:
"Processing XML in a Pipeline" to learn how to write a pipeline document that provides the input for a pipeline applicationThe oracle.xml.pipeline.controller.Process
class is the base class for all pipeline process definitions. The classes in the oracle.xml.pipeline.processes
package extend this base class. To create a customized pipeline process, you need to create a class that extends the Process
class.
At the minimum, every custom process should override the do-nothing initialize()
and execute()
methods of the Process
class. If the customized process accepts SAX events as input, then it should override the SAXContentHandler()
method to return the appropriate ContentHandler
that handles incoming SAX events. It should also override the SAXErrorHandler()
method to return the appropriate ErrorHandler
. Table 7-1 provides further descriptions of the preceding methods.
Table 7-1 Methods in the oracle.xml.pipeline.controller.Process Class
Class | Description |
---|---|
initialize() |
Initializes the process before execution.
Call |
execute() |
Executes the process.
Call Call Call |
SAXContentHandler() |
Returns the SAX ContentHandler .
If dependencies from other processes are not available at this time, then return |
SAXErrorHandler() |
Returns the SAX ErrorHandler .
If you do not override this method, then the JAXB processor uses the default error handler implemented by this class to handle SAX errors. |
See Also:
Oracle Database XML Java API Reference to learn about theoracle.xml.pipeline.processes
packageThis section contains the following topics:
The XML Pipeline processor is accessible through the following packages:
oracle.xml.pipeline.controller
, which provides an XML Pipeline controller that executes XML processes in a pipeline based on dependencies.
oracle.xml.pipeline.processes
, which provides wrapper classes for XML processes that can be executed by the XML Pipeline controller. The oracle.xml.pipeline.processes
package contains the classes that you can use to design a pipeline application framework. Each class extends the oracle.xml.pipeline.controller.Process
class.
Table 7-2 lists the components in the package. You can connect these components and processes through a combination of the XML Pipeline processor and a pipeline document.
Table 7-2 Classes in oracle.xml.pipeline.processes
Class | Description |
---|---|
CompressReaderProcess |
Receives compressed XML and outputs parsed XML. |
CompressWriterProcess |
Receives XML parsed with DOM or SAX and outputs compressed XML. |
DOMParserProcess |
Parses incoming XML and outputs a DOM tree. |
SAXParserProcess |
Parses incoming XML and outputs SAX events. |
XPathProcess |
Accepts a DOM as input, uses an XPath pattern to select one or more nodes from an XML Document or an XML DocumentFragment , and outputs a Document or DocumentFragment . |
XSDSchemaBuilder |
Parses an XML schema and outputs a schema object for validation. This process is built into the XML Pipeline processor and builds schema objects used for validating XML documents. |
XSDValProcess |
Validates against a local schema, analyzes the results, and reports errors if necessary. |
XSLProcess |
Accepts DOM as input, applies an XSL stylesheet, and outputs the result of the transformation. |
XSLStylesheetProcess |
Receives an XSL stylesheet as a stream or DOM and creates an XSLStylesheet object. |
Figure 7-2 illustrates how to pass a pipeline document to a Java application that uses the XML Pipeline processor, configure the processor, and execute the pipeline.
The basic steps are as follows:
Instantiate a pipeline document, which forms the input to the pipeline execution. Create the object by passing a FileReader
to the constructor as follows:
PipelineDoc pipe; FileReader f; pipe = new PipelineDoc((Reader)f, false);
Instantiate a pipeline processor. PipelineProcessor
is the top-level class that executes the pipeline. Table 7-3 describes some of the available methods.
Table 7-3 PipelineProcessor Methods
Method | Description |
---|---|
executePipeline() |
Executes the pipeline based on the PipelineDoc set by invoking setPipelineDoc() . |
getExecutionMode() |
Gets the type of execution mode: PIPELINE_SEQUENTIAL or PIPELINE_PARALLEL . |
setErrorHandler() |
Sets the error handler for the pipeline. This call is mandatory to execute the pipeline. |
setExecutionMode() |
Sets the execution mode. PIPELINE_PARALLEL is the default and specifies that the processes in the pipeline should execute in parallel. PIPELINE_SEQUENTIAL specifies that the processes in the pipeline should execute sequentially. |
setForce() |
Sets execution behavior. If TRUE , then the pipeline executes regardless of whether the target is up-to-date with respect to the pipeline inputs. |
setPipelineDoc() |
Sets the PipelineDoc object for the pipeline. |
The following statement instantiates the pipeline processor:
proc = new PipelineProcessor();
Set the processor to the pipeline document. For example:
proc.setPipelineDoc(pipe);
Set the execution mode for the processor and perform any other needed configuration. For example, set the mode by passing a constant to PipelineProcessor.setExecutionMode()
.
The following statement specifies sequential execution:
proc.setExecutionMode(PipelineConstants.PIPELINE_SEQUENTIAL);
Instantiate an error handler. The error handler must implement the PipelineErrorHandler
interface. For example:
errHandler = new PipelineSampleErrHdlr(logname);
Set the error handler for the processor by invoking setErrorHandler()
. For example:
proc.setErrorHandler(errHandler);
Execute the pipeline. For example:
proc.executePipeline();
See Also:
Oracle Database XML Java API Reference to learn about the oracle.xml.pipeline
subpackages
Demo programs for the XML Pipeline processor are included in $ORACLE_HOME/xdk/demo/java/pipeline
. Table 7-4 describes the XML files and Java source files that you can use to test the utility.
Table 7-4 Pipeline Processor Sample Files
File | Description |
---|---|
README |
A text file that describes how to set up the Pipeline processor demos. |
PipelineSample.java |
A sample Pipeline processor application. The program takes pipedoc.xml as its first argument. |
PipelineSampleErrHdlr.java |
A sample program to create an error handler used by PipelineSample . |
book.xml |
A sample XML document that describes a series of books. This document is specified as an input by pipedoc.xml , pipedoc2.xml , and pipedocerr.xml . |
book.xsl |
An XSLT stylesheet that transforms the list of books in book.xml into an HTML table. |
book_err.xsl |
An XSLT stylesheet specified as an input by the pipedocerr.xml pipeline document. This stylesheet contains an intentional error. |
id.xsl |
An XSLT stylesheet specified as an input by the pipedoc3.xml pipeline document. |
items.xsd |
An XML schema document specified as an input by the pipedoc3.xml pipeline document. |
pipedoc.xml |
A pipeline document. This document specifies that process p1 should parse book.xml with DOM, process p2 should parse book.xsl and create a stylesheet object, and process p3 should apply the stylesheet to the DOM to generate myresult.html . |
pipedoc2.xml |
A pipeline document. This document specifies that process p1 should parse book.xml with SAX, process p2 should generate compressed XML compxml from the SAX events, and process p3 should regenerate the XML from the compressed stream as myresult2.html . |
pipedoc3.xml |
A pipeline document. This document specifies that a process p5 should parse po.xml with DOM, process p1 should select a single node from the DOM tree with an XPath expression, process p4 should parse items.xsd and generate a schema object, process p6 should validate the selected node against the schema, process p3 should parse id.xsl and generate a stylesheet object, and validated node to produce myresult3.html . |
pipedocerr.xml |
A pipeline document. This document specifies that process p1 should parse book.xml with DOM, process p2 should parse book_err.xsl and generate a stylesheet object if it encounters no errors and apply an inline stylesheet if it encounters errors, and process p3 should apply the stylesheet to the DOM to generate myresulterr.html . Because book_err.xsl contains an error, the program should write the text contents of the input XML to myresulterr.html . |
po.xml |
A sample XML document that describes a purchase order. This document is specified as an input by pipedoc3.xml . |
Documentation for how to compile and run the sample programs is located in the README
. The basic steps are as follows:
Change into the $ORACLE_HOME/xdk/demo/java/pipeline
directory (UNIX) or %ORACLE_HOME%\xdk\demo\java\pipeline
directory (Windows).
Make sure that your environment variables are set as described in "Setting Up the Java XDK Environment".
Run make
(UNIX) or Make.bat
(Windows) at the system prompt to generate class files for PipelineSample.java
and PipelineSampleErrHdler.java
and run the demo programs. The programs write output files to the log
subdirectory.
Alternatively, you can run the demo programs manually by using the following syntax:
java PipelineSample pipedoc pipelog [ seq | para ]
The pipedoc
option specifies which pipeline document to use. The pipelog
option specifies the name of the pipeline log file, which is optional unless you specify seq
or para
, in which case a filename is required. If you do not specify a log file, then the program generates pipeline.log
by default. The seq
option processes threads sequentially; para
processes in parallel. If you specify neither seq
or para
, then the default is parallel processing.
View the files generated from the pipeline, which are all named with the initial string myresult
, and the log files.
The command-line interface for the XML Pipeline processor is named orapipe
. The Pipeline processor is packaged with Oracle database. By default, the Oracle Universal Installer installs the utility on disk in $ORACLE_HOME/bin
.
Before running the utility for the first time, make sure that your environment variables are set as described in "Setting Up the Java XDK Environment". Run orapipe
at the operating system command line with the following syntax:
orapipe options pipedoc
The pipedoc
is the pipeline document, which is required. Table 7-5 describes the available options for the orapipe
utility.
Table 7-5 orapipe Command-Line Options
Option | Purpose |
---|---|
-help |
Prints the help message |
-log logfile |
Writes errors and messages to the specified log file. The default is pipeline.log . |
-noinfo |
Does not log informational items. The default is on. |
-nowarning |
Does not log warnings. The default is on. |
-validate |
Validates the input pipedoc with the pipeline schema. Validation is turned off by default. If outparam feature is used, then validate fails with the current pipeline schema because this is an additional feature. |
-version |
Prints the release version. |
-sequential |
Executes the pipeline in sequential mode. The default is parallel. |
-force |
Executes pipeline even if target is up-to-date. By default no force is specified. |
-attr name value |
Sets the value of $ name to the specified value . For example, if the attribute name is source and the value is book.xml , then you can pass this value to an element in the pipeline document as follows: <input ... label="$source"> . |
This section contains the following topics:
To use the Oracle XML Pipeline processor, you must create an XML document according to the rules of the Pipeline Definition Language specified in the W3C Note.
The W3C specification defines the XML processing components and the inputs and outputs for these processes. The XML Pipeline processor includes support for the following XDK components:
XML parser
XML compressor
XML Schema validator
XSLT processor
The XML Pipeline processor executes a sequence of XML processing according to the rules in the pipeline document and returns a result. Example 7-1 shows pipedoc.xml
, which is a sample pipeline document included in the demo directory.
Example 7-1 pipedoc.xml
<pipeline xmlns="http://www.w3.org/2002/02/xml-pipeline" xml:base="http://example.org/"> <param name="target" select="myresult.html"/> <processdef name="domparser.p" definition="oracle.xml.pipeline.processes.DOMParserProcess"/> <processdef name="xslstylesheet.p" definition="oracle.xml.pipeline.processes.XSLStylesheetProcess"/> <processdef name="xslprocess.p" definition="oracle.xml.pipeline.processes.XSLProcess"/> <process id="p2" type="xslstylesheet.p" ignore-errors="false"> <input name="xsl" label="book.xsl"/> <outparam name="stylesheet" label="xslstyle"/> </process> <process id="p3" type="xslprocess.p" ignore-errors="false"> <param name="stylesheet" label="xslstyle"/> <input name="document" label="xmldoc"/> <output name="result" label="myresult.html"/> </process> <process id="p1" type="domparser.p" ignore-errors="true"> <input name="xmlsource" label="book.xml "/> <output name="dom" label="xmldoc"/> <param name="preserveWhitespace" select="true"></param> <error name="dom"> <html xmlns="http://www/w3/org/1999/xhtml"> <head> <title>DOMParser Failure!</title> </head> <body> <h1>Error parsing document</h1> </body> </html> </error> </process> </pipeline>
In Example 7-1, three processes are called and associated with Java classes in the oracle.xml.pipeline.processes
package. The pipeline document uses the <processdef/>
element to make the following associations:
domparser.p
is associated with the DOMParserProcess
class
xslstylesheet.p
is associated with the XSLStylesheetProcess
class
xslprocess.p
is associated with the XSLProcess
class
The PipelineSample
program accepts the pipedoc.xml
document shown in Example 7-1 as input along with XML documents book.xml
and book.xsl
. The basic design of the pipeline is as follows:
Parse the incoming book.xml
document and generate a DOM tree. This task is performed by DOMParserProcess
.
Parse book.xsl
as a stream and generate an XSLStylesheet
object. This task is performed by XSLStylesheetProcess
.
Receive the DOM of book.xml
as input, apply the stylesheet object, and write the result to myresult.html
. This task is performed by XSLProcess
.
Note the following aspects of the processing architecture used in the pipeline document:
The target information set, http://example.org/myresult.html
, is inferred from the default value of the target
parameter and the xml:base
setting.
The process p2
has an input of book.xsl
and an output parameter with the label xslstyle
, so it has to run to produce the input for p3
.
The p3
process depends on input parameter xslstyle
and document xmldoc
.
The p3
process has an output parameter with the label http://example.org/myresult.html
, so it has to run to produce the target.
The process p1
depends on input document book.xml
and outputs xmldoc
, so it has to run to produce the input for p3
.
In Example 7-1, more than one order of processing can satisfy all of the dependencies. Given the rules, the XML Pipeline processor must process p3
last but can process p1
and p2
in either order or process them in parallel.
The PipelineSample.java
source file illustrates a basic pipeline application. You can use the application with any of the pipeline documents in Table 7-4 to parse and transform an input XML document.
The basic steps of the program are as follows:
Perform the initial setup. The program declares references of type FileReader
(for the input XML file), PipelineDoc
(for the input pipeline document), and PipelineProcessor
(for the processor). The first argument is the pipeline document, which is required. If a second argument is received, then it is stored in the logname
String. The following code fragment illustrates this technique:
public static void main(String[] args) { FileReader f; PipelineDoc pipe; PipelineProcessor proc; if (args.length < 1) { System.out.println("First argument needed, other arguments are ". "optional:"); System.out.println("pipedoc.xml <output_log> <'seq'>"); return; } if (args.length > 1) logname = args[1]; ...
Create a FileReader
object by passing the first command-line argument to the constructor as the filename. For example:
f = new FileReader(args[0]);
Create a PipelineDoc
object by passing the reference to the FileReader
object. The following example casts the FileReader
to a Reader
and specifies no validation:
pipe = new PipelineDoc((Reader)f, false);
Instantiate an XML Pipeline processor. The following statement instantiates the pipeline processor:
proc = new PipelineProcessor();
Set the processor to the pipeline document. For example:
proc.setPipelineDoc(pipe);
Set the execution mode for the processor and perform any other configuration. The following code fragment uses a condition to determine the execution mode. If three or more arguments are passed to the program, then it sets the mode to sequential or parallel depending on which argument is passed. For example:
String execMode = null; if (args.length > 2) { execMode = args[2]; if(execMode.startsWith("seq")) proc.setExecutionMode(PipelineConstants.PIPELINE_SEQUENTIAL); else if (execMode.startsWith("para")) proc.setExecutionMode(PipelineConstants.PIPELINE_PARALLEL); }
Instantiate an error handler. The error handler must implement the PipelineErrorHandler
interface. The program uses the PipelineSampleErrHdler
shown in PipelineSampleErrHdlr.java
. The following code fragment illustrates this technique:
errHandler = new PipelineSampleErrHdlr(logname);
Set the error handler for the processor by invoking setErrorHandler()
. The following statement illustrates this technique:
proc.setErrorHandler(errHandler);
Execute the pipeline. The following statement illustrates this technique:
proc.executePipeline();
An application calling the XML Pipeline processor must implement the PipelineErrorHandler
interface to handle errors received from the processor. Set the error handler in the processor by calling setErrorHandler()
. When writing the error handler, you can choose to throw an exception for different types of errors.
The oracle.xml.pipeline.controller.PipelineErrorHandler
interface declares the methods shown in Table 7-6, all of which return void
.
Table 7-6 PipelineErrorHandler Methods
Method | Description |
---|---|
error(java.lang.String msg, PipelineException e) |
Handles PipelineException errors. |
fatalError(java.lang.String msg, PipelineException e) |
Handles fatal PipelineException errors. |
warning(java.lang.String msg, PipelineException e) |
Handles PipelineException warnings. |
info(java.lang.String msg) |
Prints optional, additional information about errors. |
The first three methods in Table 7-6 receive a reference to an oracle.xml.pipeline.controller.PipelineException
object. The following methods of the PipelineException
class are especially useful:
getExceptionType()
, which obtains the type of exception thrown
getProcessId()
, which obtains the process ID where the exception occurred
getMessage()
, which returns the message string of this Throwable
error
The PipelineSampleErrHdler.java
source file implements a basic error handler for use with the PipelineSample
program. The basic steps are as follows:
Implement a constructor. The constructor accepts the name of a log file and wraps it in a FileWriter
object as follows:
PipelineSampleErrHdlr(String logFile) throws IOException { log = new PrintWriter(new FileWriter(logFile)); }
Implement the error()
method. This implementation prints the process ID, exception type, and error message. It also increments a variable holding the error count. For example:
public void error (String msg, PipelineException e) throws Exception { log.println("\nError in: " + e.getProcessId()); log.println("Type: " + e.getExceptionType()); log.println("Message: " + e.getMessage()); log.println("Error message: " + msg); log.flush(); errCount++; }
Implement the fatalError()
method. This implementation follows the pattern of error()
. For example:
public void fatalError (String msg, PipelineException e) throws Exception { log.println("\nFatalError in: " + e.getProcessId()); log.println("Type: " + e.getExceptionType()); log.println("Message: " + e.getMessage()); log.println("Error message: " + msg); log.flush(); errCount++; }
Implement the warning()
method. This implementation follows the basic pattern of error()
except it increments the warnCount
variable rather than the errCount
variable. For example:
public void warning (String msg, PipelineException e) throws Exception { log.println("\nWarning in: " + e.getProcessId()); log.println("Message: " + e.getMessage()); log.println("Error message: " + msg); log.flush(); warnCount++; }
Implement the info()
method. Unlike the preceding methods, this method does not receive a PipelineException
reference as input. The following implementation prints the String
received by the method and increments the value of the warnCount
variable:
public void info (String msg) { log.println("\nInfo : " + msg); log.flush(); warnCount++; }
Implement a method to close the PrintWriter
. The following code implements the method closeLog()
, which prints the number of errors and warnings and calls PrintWriter.close()
:
public void closeLog() { log.println("\nTotal Errors: " + errCount + "\nTotal Warnings: " + warnCount); log.flush(); log.close(); }
See Also:
Oracle Database XML Java API Reference to learn about thePipelineErrorHandler
interface and the PipelineException
class