Oracle9iAS TopLink Foundation Library Guide Release 2 (9.0.3) Part Number B10064-01 |
|
Designing for peak efficiency ensures that your TopLink application is fast, smooth, and accurate. This chapter discusses how to optimize TopLink-enabled applications. It discusses
Performance consideration should be factored into every part of the development cycle. This means that you should be aware of performance issues in your design and implementation. This does not mean, however, that you should try to optimize performance in the first iteration. Optimizations that complicate the design or implementation should be left until the final iteration of your application. However, you should plan for these performance optimizations from your first iteration to make it easier to integrate them later.
The single most important aspect of performance optimization is knowing what to optimize. To improve the performance of your application, you must fully understand exactly what areas of your application have performance problems. You must also fully understand the causes of performance problems.
TopLink provides a diverse set of features to optimize performance. Most of these features can be turned on or off in the descriptors and/or database session and result in a global system performance improvement, without any changes to application code.
When optimizing the performance of your application, you should first check to see if a TopLink feature can solve the optimization problem. If no such feature is present then you should consider more complex optimizations, such as those provided in the later sections of this chapter.
Certain read and write operations can be optimized through TopLink. The following two key concepts are used to optimize reading:
Table 6-1 lists the read optimization features provided with TopLink.
An application often asks the user to choose a particular element from a list. The list displays only a subset of the information contained in the objects, and therefore it is wasteful to query all of the information for all of the objects from the database. It is possible to query only the information required to display in the list, and then, when the user chooses one, read only that object from the database.
TopLink has two features, partial object reading and report query, that allow the performance of these types of operations to be optimized.
Partial object reading is a query designed to extract only the required information from a selected record in a database, rather than all of the information the record contains.
When using partial object reading, the object is not fully populated, so it cannot be cached. Consequently, the object cannot be edited. Because the primary key is required to re-query the object (so it can be edited for example), and because TopLink does not automatically include the primary key information in a partially populated object, the primary key must be explicitly specified as a partial attribute.
/* Read all the employees from the database, ask the user to choose one and return it. This must read in all the information for all of the employees.*/ List list; // Fetch data from database and add to list box. Vector employees = (Vector) session.readAllObjects(Employee.class); list.addAll(employees); // Display list box. .... // Get selected employee from list. Employee selectedEmployee = (Employee) list.getSelectedItem(); return selectedEmployee;
/* Read all the employees from the database, ask the user to choose one and return it. This uses partial object reading to read just the last name of the employees. Note that TopLink does not automatically include the primary key of the object. If this is needed to select the object for a query, it must be specified as a partial attribute so that it can be included. In this way, the object can easily be read for editing. */
List list; // Fetch data from database and add to list box. ReadAllQuery query = new ReadAllQuery(Employee.class); query.addPartialAttribute("lastName"); // add this if the primary key is required for re-querying the object query.addPartialAttribute("id");/* TopLink does not automatically include the primary key of the object. If this is needed to select the object for a query, it must be specified as a partial attribute so that it can be included.*/
query.addPartialAttribute("id"); query.dontMaintainCache(); Vector employees = (Vector) session.executeQuery(query); list.addAll(employees); // Display list box. .... // Get selected employee from list. Employee selectedEmployee = (Employee)session.readObject(list.getSelectedItem()); return selectedEmployee;
/* Read all the employees from the database, ask the user to choose one and return it. This uses the report query to read just the last name of the employees. It then uses the primary key stored in the report query result to read the real object.*/
List list;// Fetch data from database and add to list box.
ExpressionBuilder builder = new ExpressionBuilder(); ReportQuery query = new ReportQuery (Employee.class, builder); query.addAttribute("lastName"); query.retrievePrimaryKeys(); Vector reportRows = (Vector) session.executeQuery(query); list.addAll(reportRows);// Display list box.
....// Get selected employee from list.
Employee selectedEmployee = (Employee) ((ReportResult)list.getSelectedItem()).readObject; return selectedEmployee;
Although the differences between the two examples are slight, there is a substantial performance improvement by using partial objects and report query.
In the example called "No optimization" , all of the full employee objects are created even though only the employee's last name is displayed in the list. All of the data that makes up an employee object must be read.
In the example called "Optimization through partial object reading" , partial object reading is used to read only the last name (and the primary key, if specified) of the employees. Read employee objects are still created, but only the last name (and primary key) is set. The other employee attributes are left as null or as their constructor defaults. This reduces the amount of data read from the database.
In this example, the report query is used to read only the last name of the employees. This reduces the amount of data read from the database and avoids instantiating any employee instances.
Specifying fewer partial attributes and querying larger objects improves the overall performance gain of these optimizations.
The amount of data read by your application affects performance, but how that data is read also affects performance.
Reading a collection of rows from the database is significantly faster than reading each row individually. The most common performance problem is reading a collection of objects that have a one-to-one reference to another object. If this is done without optimizing how the objects are read, N + 1 database calls are required. That is, one read operation is required to read in all of the source rows, and one call for each target row is required in the one-to-one relationship.
The next three examples show a two-phase query that reads the addresses of a set of employees individually, and then reads them using TopLink's query optimization features. The optimized read accesses the database only twice, so it is significantly faster.
/*Read all the employees, and collect their address' cities. This takes N + 1 queries if not optimized. */// Read all of the employees from the database. This requires 1 SQL call.
Vector employees = session.readAllObjects(Employee.class,new ExpressionBuilder().get("lastName").equal("Smith"));//SQL: Select * from Employee where l_name = `Smith'
// Iterate over employees and get their addresses.
// This requires N SQL calls.
Enumeration enum = employees.elements(); Vector cities = new Vector(); while(enum.hasMoreElements()) Employee employee = (Employee) enum.nextElement(); cities.addElement(employee.getAddress().getCity()); //SQL: Select * from Address where address_id = 123, etc }
/* Read all the employees, and collect their address' cities. Although the code is almost identical because joining optimization is used it only takes 1 query. */
// Read all of the employees from the database, using joining. This requires 1 SQL call. ReadAllQuery query = new ReadAllQuery(); query.setReferenceClass(Employee.class); query.setSelectionCriteria(new ExpressionBuilder().get("lastName").equal("Smith")); query.addJoinedAttribute("address"); Vector employees = session.executeQuery(query);// SQL: Select E.*, A.* from Employee E, Address A where E.l_name = `Smith' and E.address_id = A.address_id Iterate over employees and get their addresses. The previous SQL already read all of the addresses so no SQL is required.
Enumeration enum = employees.elements(); Vector cities = new Vector(); while (enum.hasMoreElements()) { Employee employee = (Employee) enum.nextElement(); cities.addElement(employee.getAddress().getCity());
/* Read all the employees, and collect their address' cities. Although the code is almost identical because batch reading optimization is used it only takes 2 queries. */
// Read all of the employees from the database, using batch reading. This requires 1 SQL call, note that only the employees are read.
ReadAllQuery query = new ReadAllQuery(); query.setReferenceClass(Employee.class); query.setSelectionCriteria(new ExpressionBuilder().get("lastName").equal("Smith")); query.addBatchReadAttribute("address"); Vector employees = (Vector)session.executeQuery(query);// SQL: Select * from Employee where l_name = `Smith'
// Iterate over employees and get their addresses.
// The first address accessed will cause all of the addresses to be read in a single SQL call.
Enumeration enum = employees.elements(); Vector cities = new Vector(); while (enum.hasMoreElements()) { Employee employee = (Employee) enum.nextElement(); cities.addElement(employee.getAddress() .getCity());// SQL: Select distinct A.* from Employee E, Address A where E.l_name = `Smith' and E.address_id = A.address_i
}
By using TopLink query optimization, a number of queries are reduced to a single query. This leads to much greater performance.
It may seem that because joining requires only a single query that batch reading would never be required. The advantage of batch reading is that it allows for delayed loading through value holders and has much better performance where the target objects are shared. For example, if all of the employees lived at the same address, batch reading would read much less data than joining, because batch reading uses a SQL DISTINCT to filter duplicate data. Batch reading is also supported for one-to-many relationships where joining is supported only for one-to-one relationships.
Although this technique is very efficient, it should only be used when all of the desired objects (such as addresses) are required. Otherwise the resources spent reading all of the objects could hurt performance.
TopLink provides a high-level query mechanism. This query mechanism is powerful, but currently does not support everything possible through raw SQL. If you have a complex query required by your application, and the query must be done optimally, the best solution in many cases is to use raw SQL.
Some parts of an application may require information from a variety of objects rather than from just one object. This can be very difficult to implement and very performance intensive. In such situations, it may be advantageous to define a new read-only object to encapsulate this information and map it to a view on the database. Set the object to be read-only by using the addDefaultReadOnlyClass()
API in the oracle.toplink.sessions.Project
class.
/* Gather the information to report on an employee and return the summary of the information. In this situation a hashtable is used to hold the report information. Notice that this reads a lot of objects from the database, but uses very little of the information contained in the objects. This may take 5 queries and read in a large number of objects.*/
public Hashtable reportOnEmployee(String employeeName) { Vector projects, associations; Hashtable report = new Hashtable();// Retrieve employee from database.
Employee employee = session.readObject(Employee.class, new ExpressionBuilder.get("lastName").equal(employeeName)); // Get all of the projects affiliated with the employee. projects = session.readAllObjects(Project.class, "SELECT P.* FROM PROJECT P, EMPLOYEE E WHERE P.MEMBER_ID = E.EMP_ID AND E.L_NAME = " + employeeName);// Get all of the associations affiliated with the employee.associations =session.readAllObjects(Association.class, "SELECT A.* FROM ASSOC A, EMPLOYEE E WHERE A.MEMBER_ID = E.EMP_ID AND E.L_NAME = " + employeeName);
}
report.put("firstName", employee.getFirstName()); report.put("lastName", employee.getLastName()); report.put("manager", employee.getManager()); report.put("city", employee.getAddress().getCity()); report.put("projects", projects); report.put("associations", associations); return report;}
CREATE VIEW NAMED EMPLOYEE_VIEW AS (SELECT F_NAME = E.F_NAME, L_NAME = E.L_ NAME,EMP_ID = E.EMP_ID, MANAGER_NAME = E.NAME, CITY = A.CITY, NAME = E.NAME FROM EMPLOYEE E, EMPLOYEE M, ADDRESS A WHERE E.MANAGER_ID = M.EMP_ID AND E.ADDRESS_ID = A.ADDRESS_ID)
Then, define a descriptor for the EmployeeReport
class:
tableName
to be EMPLOYEE_VIEW
.
numberOfProjects
and associations, a transformation mapping can be used to get the required data.
Now, the report can be queried from the database like any other TopLink-enabled object.
/* Return the report for the employee.*/ public EmployeeReport reportOnEmployee(String employeeName) { EmployeeReport report; report = (EmployeeReport) session.readObject(EmployeeReport.class, new ExpressionBuilder.get("lastName").equal (employeeName)); return report;}
Table 6-2 lists the write optimization features provided with TopLink.
TopLink also provides several write optimization features. The most common write performance problem is a batch job that inserts a large volume of data into the database.
Consider a batch job that requires to load a large amount of data from one database and migrate the data into another. Assume that the objects are simple employee objects that use generated sequence numbers as their primary key, and have an address that also uses a sequence number. The batch job requires to load 10,000 employees from the first database and insert them into the target database.
First lets approach the problem naively and have the batch job read all of the employees from the source database, and then acquire a unit of work from the target database, register all of the objects and commit the unit of work.
/* Read all the employees, acquire a unit of work and register them. */ // Read all of the employees from the database. This requires 1 SQL call, but will be very memory intensive as 10,000 objects will be read.Vector employees = sourceSession.readAllObjects(Employee.class);
//SQL: Select * from Employee // Acquire a unit of work and register the employees.UnitOfWork uow = targetSession.acquireUnitOfWork();
uow.registerAllObjects(employees);
uow.commit();
//SQL: Begin transaction //SQL: Update Sequence set count = count + 1 where name = 'EMP' //SQL: Select count from Sequence //SQL: ... repeat this 10,000 times + 10,000 times for the addresses ... //SQL: Commit transaction //SQL: Begin transaction //SQL: Insert into Adresss (...) values (...) //SQL: ... repeat this 10,000 times //SQL: Insert into Employee (...) values (...) //SQL: ... repeat this 10,000 times //SQL: Commit transaction}
This batch job would have extremely poor performance and would cause 60,000 SQL executions. It also reads huge amounts of data into memory that can cause memory performance issues. There are a number of TopLink optimization that can be used to optimize this batch job.
The first performance problem is that loading from the source database may cause memory problems. To optimize the problem, a cursored stream should be used to read the employees from the source database. Also, a cache identity map should be used in both the source and target databases, not a full identity map (a weak identity map could be used in JDK 1.2).
The cursor should be streamed in groups of 100 using the releasePrevious()
method after each read. Each batch of 100 employees should be registered in a new unit of work and committed. Although this does not change the amount of SQL executed, it does fix the memory problems. You should be able to notice a memory problem in a batch job through noticing the performance degrading over time and possible disk swapping occurring.
SQL select calls are more expensive than SQL modify calls, so the biggest performance gain is in reducing any select being issued. In this example, selects are used for the sequence numbers. Using sequence number pre-allocation dramatically improves the performance.
In TopLink, the sequence pre-allocation size can be configured on the login; it defaults to 50. In the non-optimized example, we used a pre-allocation size of 1 to demonstrate this point. Because batches of 100 are used, a sequence pre-allocation size of 100 should also be used. Because both employees and address use sequence number, we can get even better pre-allocation by having them share the same sequence. In this case, we set the pre-allocation size to 200. This optimization reduces the number of SQL execution from 60,000 to 20,200.
TopLink supports batch writing on batch compliant databases in JDK 1.1 and through batch compliant JDBC 2.0 drivers in JDK 1.2. Batch writing allows for a group of SQL statements to be batched together into a single statement and sent to the database as a single database execution. This reduces the communication time between the application and the server and can lead to huge performance increases.
Batch writing can be enabled on the login through the useBatchWriting()
method. In our example, each batch of 100 employees can be batched into a single SQL execution. This reduces the number of SQL execution from 20,200 to 300.
TopLink supports parameterized SQL and prepared statement caching. Using parameterized SQL can improve write performance by avoiding the prepare cost of a SQL execution through reusing the same prepared statement for multiple executions.
Batch writing and parameterized SQL cannot be used together, because batch writing does not use individual statements. The performance benefits of batch writing are much greater than parameterized SQL; therefore, if batch writing is supported by your database, it is strongly suggested that you use batch writing and not use parameterized SQL.
Parameterized SQL avoids only the prepare part of the SQL execution, not the execute; therefore, it normally does not give a huge performance gain. However, if your database does not support batch writing, parameterized SQL can improve performance. In this example, the number of SQL executions is still 20,200, but the number of SQL prepares is reduced to 4.
Multiple processes and even multiple machines can be used to split the batch job into several smaller jobs.
Splitting the batch job across ten threads leads to performance increases. In this case, the read from the cursored stream could be synchronized and parallel units of work could be used on a single machine.
Even if the machine has only a single processor, this can lead to a performance increase. During any SQL execution the thread must wait for a response from the server, but in this waiting time the other threads can be processing.
The final optimized example does not show multi-processing as normally the other features are enough to improve the performance.
/* Read each batch of employees, acquire a unit of work and register them. */ targetSession.getLogin().useBatchWriting(); targetSession.getLogin().setSequencePreallocationSize(200);
// Read all of the employees from the database, into a stream. This requires 1 SQL call, but none of the rows will be fetched. ReadAllQuery query = new ReadAllQuery(); query.setReferenceClass(Employee.class); query.useCursoredStream(); CursoredStream stream; stream = (CursoredStream) sourceSession.executeQuery(query); //SQL: Select * from Employee. Process each batch while (! stream.atEnd()) { Vector employees = stream.read(100); // Acquire a unit of work to register the employees UnitOfWork uow = targetSession.acquireUnitOfWork(); uow.registerAllObjects(employees); uow.commit(); } //SQL: Begin transaction //SQL: Update Sequence set count = count + 200 where name = 'SEQ' //SQL: Select count from Sequence where name = 'SEQ' //SQL: Commit transaction //SQL: Begin transaction //BEGIN BATCH SQL: Insert into Address (...) values (...) //... repeat this 100 times //Insert into Employee (...) values (...) //... repeat this 100 times //END BATCH SQL: //SQL: Commit transactionJava optimization
In most client-server database applications, most of the performance problems come from the communications between the client and the server. This means that optimizing Java code is normally not as important as optimizing database interactions. However, you should still try to write clean, optimized Java code, since very poorly optimized Java code does affect the performance of your application.
The following is a general checklist to keep in mind when developing Java applications.
String
manipulations, use a StringBuffer
instead of the +
operator for appending Strings
.
When designing your database schema and object model, optimization is very important. The key element to remember in the design of your object model and database schema is to avoid complexity. The most common object-relational performance problem is when the database schema is derived directly from a complex object model. This normally produces an over-normalized database schema that can be slow and difficult to query.
Although it is best to design the object model and database schema together, there should not be a direct one-to-one mapping between the two.
A common schema optimization technique is to de normalize two tables into one. This can improve read and write performance by requiring only one database operation instead of two.
This technique is demonstrated through analyzing the ACME Member Location Tracking System.
Elements |
Details |
|||
---|---|---|---|---|
Title |
ACME Member Location Tracking System |
|||
Classes |
Member, Address |
|||
Tables |
MEMBER, ADDRESS |
|||
Relationships |
Source |
Instance Variable |
Mapping |
Target |
|
Member |
address |
one-to-one |
Address |
Elements | Details | |||
---|---|---|---|---|
Classes |
Member, Address |
|||
Tables |
MEMBER |
|||
Relationships |
Source |
Instance Variable |
Mapping |
Target |
|
Member |
address |
aggregate |
Address |
In the ACME Member Location Tracking System, employees and addresses are always looked up together.
Querying a member based on address information requires an expensive database join. Reading a member and its address requires two read statements. Writing a member requires two write statements. This unnecessarily adds complexity to the system and results in poor performance.
Since members are always read and written with their address information, considerable performance can be gained through combining the MEMBER and ADDRESS tables into a single table, and changing the one-to-one relationship to an aggregate relationship.
This allows all of the information to be read in a single operation, and doubles the speed of updates and inserts as only one row from one table is modified.
This example demonstrates how a table schema can be further normalized to provide performance optimization.
Frequently, relational schemas can stuff too much data into a particular table. The table may contain a large number of columns, but only a small subset of those may be frequently used.
By splitting the large table into two or even several smaller tables, the amount of data traffic can be significantly reduced, improving the overall performance of the system.
This system is responsible for assigning employees to projects within an organization. The most-common operation is to read a set of employees and projects, assign some employees to different projects, and update the employees. Occasionally the employee's address or job classification is used to determine which project would be the best placement for the employee.
When a large volume of employees is read from the database at one time, their aggregate parts must also be read. Because of this, the system suffers from a general read performance problem. The only solution is to reduce the amount of data traffic to and from the server.
In this system, normalize the EMPLOYEE table into the EMPLOYEE, ADDRESS, PHONE, EMAIL, and JOB tables.
Since normally only the employee information is read, the amount of data transferred from the database to the client is reduced by splitting the table. This improves your read performance by reducing the amount of data traffic by 25%.
When models are designed in an object-oriented design and then transformed into a relational model, a common mistake is to make a large hierarchy of tables on the database. This makes it necessary to perform a large number of joins and makes querying difficult. Normally it is a good idea to collapse some of the levels in your inheritance hierarchy into a single table.
Elements | Details |
---|---|
Classes |
Tables |
Person |
<none> |
Employee |
EMPLOYEE |
SalesRep |
EMPLOYEE |
Staff |
EMPLOYEE |
Client |
CLIENT |
Contact |
CLIENT |
In this system, the clients of the company are assigned to its sales force representatives. The managers track which sales representatives are under them.
The system suffers from over-complexity, which hinders the development and performance of the system. Large expensive joins are required to do almost anything, making every database operation expensive.
By collapsing the three-level table hierarchy into one, the complexity of the system is reduced. All of the expensive joins in the system are eliminated and simplified queries allow read performance to be further optimized leading to greatly improved system performance.
A common situation is for an object to have a collection of other objects where only one of the other objects in the collection is commonly used. In this situation, it is desirable to add an instance variable just for this special object. This way, the important object can be accessed and used without requiring the instantiation of all of the other objects in the collection.
This system is used by an international shipping company, which wants to be able to track the location of its packages as they travel from their source to their destination. When a package is moved from one location to another, a location is created in real-time on the database. The application normally receives a request for the current location of a particular package and displays this for the user.
A package could accumulate many locations as it travels to its destination, so reading all of these locations from the database is expensive.
By adding a specific instance variable for just the current location and a one-to-one mapping for the instance variable, the current location can be accessed without reading in all of the other locations. This drastically improves the performance of the system.
|
Copyright © 2002 Oracle Corporation. All Rights Reserved. |
|