6
Performance Optimization

Designing for peak efficiency ensures that your TopLink application is fast, smooth, and accurate. This chapter discusses how to optimize TopLink-enabled applications. It discusses

Basic performance optimization

Performance consideration should be factored into every part of the development cycle. This means that you should be aware of performance issues in your design and implementation. This does not mean, however, that you should try to optimize performance in the first iteration. Optimizations that complicate the design or implementation should be left until the final iteration of your application. However, you should plan for these performance optimizations from your first iteration to make it easier to integrate them later.

The single most important aspect of performance optimization is knowing what to optimize. To improve the performance of your application, you must fully understand exactly what areas of your application have performance problems. You must also fully understand the causes of performance problems.

TopLink provides a diverse set of features to optimize performance. Most of these features can be turned on or off in the descriptors and/or database session and result in a global system performance improvement, without any changes to application code.

When optimizing the performance of your application, you should first check to see if a TopLink feature can solve the optimization problem. If no such feature is present then you should consider more complex optimizations, such as those provided in the later sections of this chapter.

TopLink reading optimization features

Certain read and write operations can be optimized through TopLink. The following two key concepts are used to optimize reading:

changing how much data is read from the database
changing the way the data is queried from the database

Table 6-1 lists the read optimization features provided with TopLink.

Table 6-1 Read optimization features

Feature	Effect on performance
Unit of Work	Tracks object changes within the unit of work. Only register objects that will change to minimize the amount of tracking required.
Object indirection	"Value holders" are used to stand in for real domain objects to avoid reading them until they are accessed. The usage of value holders is strongly recommended as they provide a major performance benefit.
Weak identity map	Client-side caching of objects read from database. The client-side cache holds only objects referenced by the application. Avoids database calls by reading objects from cache. Efficient use of memory. The benefit of caching with the weak identity map may not be as great as the soft cache weak identity map, but it uses less memory.
Soft cache weak identity map	Client-side caching of objects read from database. The client-side cache holds only objects referenced by the application and releases objects not referenced by the application when memory becomes low. Avoids database calls by reading objects from cache. Efficient use of memory. Gives the benefit of caching, but does not cause memory problems.
Full identity map	Client side caching of objects read from the database. This permits database calls to be avoided if the object has already been read in. Caution: Ensure that the cache size does not grow too large, as this may cause severe performance problems.
Cache identity map	Client side cache that will always use only a fixed amount of memory. Gives the benefit of caching, but does not cause memory problems.
No identity map	Cache lookup can be avoided completely for objects that do not need to be cached.
Batch reading and joining	Both of these features can be used to dramatically reduce the number of database accesses that are required to perform a read query. Reduces database access by batching many queries into a single query that reads more data.
Partial object reading	Allows reading of a subset of a result set of the object's attributes. Reduces the amount of data that needs to be read from the database to improve performance.
Report query	Similar to partial object reading, but returns only the data instead of the objects. Gives the same performance benefit as partial object reading. The report query also supports complex reporting functions such as aggregation functions and "group by". Complex results can be computed on the database instead of reading the objects into the application and computing the result in memory.

Reading Case 1: Displaying names in a list - optimized through partial object reading and report query

An application often asks the user to choose a particular element from a list. The list displays only a subset of the information contained in the objects, and therefore it is wasteful to query all of the information for all of the objects from the database. It is possible to query only the information required to display in the list, and then, when the user chooses one, read only that object from the database.

TopLink has two features, partial object reading and report query, that allow the performance of these types of operations to be optimized.

Partial object reading

Partial object reading is a query designed to extract only the required information from a selected record in a database, rather than all of the information the record contains.

When using partial object reading, the object is not fully populated, so it cannot be cached. Consequently, the object cannot be edited. Because the primary key is required to re-query the object (so it can be edited for example), and because TopLink does not automatically include the primary key information in a partially populated object, the primary key must be explicitly specified as a partial attribute.

Example 6-1 No optimization

/* Read all the employees from the database, ask the user to choose one and 
return it. This must read in all the information for all of the employees.*/
List list;

// Fetch data from database and add to list box.
Vector employees = (Vector) session.readAllObjects(Employee.class);
list.addAll(employees);

// Display list box.
....

// Get selected employee from list.
Employee selectedEmployee = (Employee) list.getSelectedItem();

return selectedEmployee;

Example 6-2 Optimization through partial object reading

/* Read all the employees from the database, ask the user to choose one and 
return it. This uses partial object reading to read just the last name of the 
employees. Note that TopLink does not automatically include the primary key of 
the object. If this is needed to select the object for a query, it must be 
specified as a partial attribute so that it can be included. In this way, the 
object can easily be read for editing. */
List list;
// Fetch data from database and add to list box.
ReadAllQuery query = new ReadAllQuery(Employee.class);
query.addPartialAttribute("lastName");
// add this if the primary key is required for re-querying the object
query.addPartialAttribute("id");
/* TopLink does not automatically include the primary key of the object. If this 
is needed to select the object for a query, it must be specified as a partial 
attribute so that it can be included.*/
query.addPartialAttribute("id");
query.dontMaintainCache();
Vector employees = (Vector) session.executeQuery(query);
list.addAll(employees);

// Display list box.
....
// Get selected employee from list.
Employee selectedEmployee = 
(Employee)session.readObject(list.getSelectedItem());
return selectedEmployee;

Note:

If query.dontMaintainCache() is not included in this example, a query exception is thrown.

Example 6-3 Optimization through report query

/* Read all the employees from the database, ask the user to choose one and 
return it. This uses the report query to read just the last name of the 
employees. It then uses the primary key stored in the report query result to 
read the real object.*/
List list;
// Fetch data from database and add to list box.
ExpressionBuilder builder = new ExpressionBuilder();
ReportQuery query = new ReportQuery (Employee.class, builder);
query.addAttribute("lastName");
query.retrievePrimaryKeys();
Vector reportRows = (Vector) session.executeQuery(query);
list.addAll(reportRows);

// Display list box.
....

// Get selected employee from list.
Employee selectedEmployee = (Employee) 
((ReportResult)list.getSelectedItem()).readObject;

return selectedEmployee;

Conclusion

Although the differences between the two examples are slight, there is a substantial performance improvement by using partial objects and report query.

In the example called "No optimization" , all of the full employee objects are created even though only the employee's last name is displayed in the list. All of the data that makes up an employee object must be read.

In the example called "Optimization through partial object reading" , partial object reading is used to read only the last name (and the primary key, if specified) of the employees. Read employee objects are still created, but only the last name (and primary key) is set. The other employee attributes are left as null or as their constructor defaults. This reduces the amount of data read from the database.

In this example, the report query is used to read only the last name of the employees. This reduces the amount of data read from the database and avoids instantiating any employee instances.

Specifying fewer partial attributes and querying larger objects improves the overall performance gain of these optimizations.

Reading Case 2: Batch reading objects

The amount of data read by your application affects performance, but how that data is read also affects performance.

Reading a collection of rows from the database is significantly faster than reading each row individually. The most common performance problem is reading a collection of objects that have a one-to-one reference to another object. If this is done without optimizing how the objects are read, N + 1 database calls are required. That is, one read operation is required to read in all of the source rows, and one call for each target row is required in the one-to-one relationship.

The next three examples show a two-phase query that reads the addresses of a set of employees individually, and then reads them using TopLink's query optimization features. The optimized read accesses the database only twice, so it is significantly faster.

Example 6-4 No optimization

/*Read all the employees, and collect their address' cities. This takes N + 1 
queries if not optimized. */

// Read all of the employees from the database. This requires 1 SQL call.
Vector employees = session.readAllObjects(Employee.class,new 
ExpressionBuilder().get("lastName").equal("Smith"));

//SQL: Select * from Employee where l_name = `Smith'

// Iterate over employees and get their addresses.
// This requires N SQL calls.
Enumeration enum = employees.elements();
Vector cities = new Vector();
while(enum.hasMoreElements()) Employee employee = (Employee) enum.nextElement();
cities.addElement(employee.getAddress().getCity());

//SQL: Select * from Address where address_id = 123, etc }

Example 6-5 Optimization through joining

/* Read all the employees, and collect their address' cities. Although the code 
is almost identical because joining optimization is used it only takes 1 query. 
*/

// Read all of the employees from the database, using joining. This requires 1 SQL 
call.
ReadAllQuery query = new ReadAllQuery();
query.setReferenceClass(Employee.class);
query.setSelectionCriteria(new 
ExpressionBuilder().get("lastName").equal("Smith"));
query.addJoinedAttribute("address");
Vector employees = session.executeQuery(query);

// SQL: Select E.*, A.* from Employee E, Address A where E.l_name = `Smith' and 
E.address_id = A.address_id Iterate over employees and get their addresses. The 
previous SQL already read all of the addresses so no SQL is required.
Enumeration enum = employees.elements();
Vector cities = new Vector();
while (enum.hasMoreElements()) {
Employee employee = (Employee) enum.nextElement();

   cities.addElement(employee.getAddress().getCity());

Example 6-6 Optimization through batch reading

/* Read all the employees, and collect their address' cities. Although the code 
is almost identical because batch reading optimization is used it only takes 2 
queries. */

// Read all of the employees from the database, using batch reading. This 
requires 1 SQL call, note that only the employees are read. 
ReadAllQuery query = new ReadAllQuery();
query.setReferenceClass(Employee.class);
query.setSelectionCriteria(new 
ExpressionBuilder().get("lastName").equal("Smith"));
query.addBatchReadAttribute("address");
Vector employees = (Vector)session.executeQuery(query);

// SQL: Select * from Employee where l_name = `Smith'

// Iterate over employees and get their addresses.
// The first address accessed will cause all of the addresses to be read in a 
single SQL call.
Enumeration enum = employees.elements();
Vector cities = new Vector();
while (enum.hasMoreElements()) {

   Employee employee = (Employee) enum.nextElement();
   cities.addElement(employee.getAddress()
   .getCity());
   // SQL: Select distinct A.* from Employee E, Address A where E.l_name = 
   `Smith' and E.address_id = A.address_i

}

Conclusion

By using TopLink query optimization, a number of queries are reduced to a single query. This leads to much greater performance.

It may seem that because joining requires only a single query that batch reading would never be required. The advantage of batch reading is that it allows for delayed loading through value holders and has much better performance where the target objects are shared. For example, if all of the employees lived at the same address, batch reading would read much less data than joining, because batch reading uses a SQL DISTINCT to filter duplicate data. Batch reading is also supported for one-to-many relationships where joining is supported only for one-to-one relationships.

Although this technique is very efficient, it should only be used when all of the desired objects (such as addresses) are required. Otherwise the resources spent reading all of the objects could hurt performance.

Reading Case 3: Using complex custom SQL queries

TopLink provides a high-level query mechanism. This query mechanism is powerful, but currently does not support everything possible through raw SQL. If you have a complex query required by your application, and the query must be done optimally, the best solution in many cases is to use raw SQL.

Reading Case 4: Viewing objects

Some parts of an application may require information from a variety of objects rather than from just one object. This can be very difficult to implement and very performance intensive. In such situations, it may be advantageous to define a new read-only object to encapsulate this information and map it to a view on the database. Set the object to be read-only by using the addDefaultReadOnlyClass() API in the oracle.toplink.sessions.Project class.

Example 6-7 No optimization

/* Gather the information to report on an employee and return the summary of the 
information. In this situation a hashtable is used to hold the report 
information. Notice that this reads a lot of objects from the database, but uses 
very little of the information contained in the objects. This may take 5 queries 
and read in a large number of objects.*/

public Hashtable reportOnEmployee(String employeeName)

   {
   Vector projects, associations;
   Hashtable report = new Hashtable();
   // Retrieve employee from database.
   Employee employee = session.readObject(Employee.class, new 
   ExpressionBuilder.get("lastName").equal(employeeName)); 
   // Get all of the projects affiliated with the employee.
   projects = session.readAllObjects(Project.class, "SELECT P.* FROM PROJECT P, 
   EMPLOYEE E WHERE P.MEMBER_ID = E.EMP_ID AND E.L_NAME = " + employeeName);
   // Get all of the associations affiliated with the employee.associations 
   =session.readAllObjects(Association.class, "SELECT A.* FROM ASSOC A, 
   EMPLOYEE E WHERE A.MEMBER_ID = E.EMP_ID AND E.L_NAME = " + employeeName);

}

report.put("firstName", employee.getFirstName());
report.put("lastName", employee.getLastName());
report.put("manager", employee.getManager());
report.put("city", employee.getAddress().getCity());
report.put("projects", projects);
report.put("associations", associations);
return report;}

Example 6-8 Optimization through view object

CREATE VIEW NAMED EMPLOYEE_VIEW AS (SELECT F_NAME = E.F_NAME, L_NAME = E.L_
NAME,EMP_ID = E.EMP_ID, MANAGER_NAME = E.NAME, CITY = A.CITY, NAME = E.NAME 
FROM EMPLOYEE E, EMPLOYEE M, ADDRESS A 
WHERE E.MANAGER_ID = M.EMP_ID
AND E.ADDRESS_ID = A.ADDRESS_ID)

Then, define a descriptor for the EmployeeReport class:

Define the descriptor as normal; however, set tableName to be EMPLOYEE_VIEW.
Map only the attributes required for the report; in the case of numberOfProjects and associations, a transformation mapping can be used to get the required data.

Now, the report can be queried from the database like any other TopLink-enabled object.

Example 6-9 With optimization

/* Return the report for the employee.*/
public EmployeeReport reportOnEmployee(String employeeName) 
{

   EmployeeReport report;
   report = (EmployeeReport) session.readObject(EmployeeReport.class, new 
   ExpressionBuilder.get("lastName").equal
   (employeeName));
   return report;}

TopLink writing optimization features

Table 6-2 lists the write optimization features provided with TopLink.

Table 6-2 Write optimization features

Feature	Effect on performance
Unit of Work	Minimal update of object changes on commit of the unit of work. Improves performance by updating only the changed fields and objects. Tracks object changes within the unit of work. Minimizes the amount of tracking required (which can be expensive) by registering only those objects that will change. Note: The unit of work supports marking classes as read-only, which allows the unit of work to avoid tracking changes of objects that will not be changed.
Parameterized SQL	The session or an individual query can be configured to use a prepared statement and cache the statement, thus avoiding the SQL prepare call on subsequent executions of the query. Performance improves in situations when the same SQL statement is executed many times.
Batch writing	Supported in both JDK 1.1 and JDK 1.2. Allows for all of the insert, update, and delete commands from a transaction to be grouped into a single database call. Performance improves dramatically because the number of calls to the database is reduced.
Sequence number preallocation	Sequence numbers are cached (pre-allocated) on the client side to dramatically improve insert performance.
Does exist alternatives	"Does exist" call on write object can be avoided in certain situations by checking the cache for "does exist" or assuming existence.

Writing Case 1: Batch writes

TopLink also provides several write optimization features. The most common write performance problem is a batch job that inserts a large volume of data into the database.

Consider a batch job that requires to load a large amount of data from one database and migrate the data into another. Assume that the objects are simple employee objects that use generated sequence numbers as their primary key, and have an address that also uses a sequence number. The batch job requires to load 10,000 employees from the first database and insert them into the target database.

First lets approach the problem naively and have the batch job read all of the employees from the source database, and then acquire a unit of work from the target database, register all of the objects and commit the unit of work.

Example 6-10 No optimization

/* Read all the employees, acquire a unit of work and register them. */

// Read all of the employees from the database. This requires 1 SQL call, but 
will be very memory intensive as 10,000 objects will be read.
Vector employees = sourceSession.readAllObjects(Employee.class);

//SQL: Select * from Employee

// Acquire a unit of work and register the employees.
UnitOfWork uow = targetSession.acquireUnitOfWork();
uow.registerAllObjects(employees);
uow.commit();

//SQL: Begin transaction
//SQL: Update Sequence set count = count + 1 where name = 'EMP'
//SQL: Select count from Sequence
//SQL: ... repeat this 10,000 times + 10,000 times for the addresses ...
//SQL: Commit transaction
//SQL: Begin transaction
//SQL: Insert into Adresss (...) values (...)
//SQL: ... repeat this 10,000 times
//SQL: Insert into Employee (...) values (...)
//SQL: ... repeat this 10,000 times
//SQL: Commit transaction}

This batch job would have extremely poor performance and would cause 60,000 SQL executions. It also reads huge amounts of data into memory that can cause memory performance issues. There are a number of TopLink optimization that can be used to optimize this batch job.

Batching and cursoring

The first performance problem is that loading from the source database may cause memory problems. To optimize the problem, a cursored stream should be used to read the employees from the source database. Also, a cache identity map should be used in both the source and target databases, not a full identity map (a weak identity map could be used in JDK 1.2).

The cursor should be streamed in groups of 100 using the releasePrevious() method after each read. Each batch of 100 employees should be registered in a new unit of work and committed. Although this does not change the amount of SQL executed, it does fix the memory problems. You should be able to notice a memory problem in a batch job through noticing the performance degrading over time and possible disk swapping occurring.

Sequence number pre-allocation

SQL select calls are more expensive than SQL modify calls, so the biggest performance gain is in reducing any select being issued. In this example, selects are used for the sequence numbers. Using sequence number pre-allocation dramatically improves the performance.

In TopLink, the sequence pre-allocation size can be configured on the login; it defaults to 50. In the non-optimized example, we used a pre-allocation size of 1 to demonstrate this point. Because batches of 100 are used, a sequence pre-allocation size of 100 should also be used. Because both employees and address use sequence number, we can get even better pre-allocation by having them share the same sequence. In this case, we set the pre-allocation size to 200. This optimization reduces the number of SQL execution from 60,000 to 20,200.

Batch writing

TopLink supports batch writing on batch compliant databases in JDK 1.1 and through batch compliant JDBC 2.0 drivers in JDK 1.2. Batch writing allows for a group of SQL statements to be batched together into a single statement and sent to the database as a single database execution. This reduces the communication time between the application and the server and can lead to huge performance increases.

Batch writing can be enabled on the login through the useBatchWriting() method. In our example, each batch of 100 employees can be batched into a single SQL execution. This reduces the number of SQL execution from 20,200 to 300.

Parameterized SQL

TopLink supports parameterized SQL and prepared statement caching. Using parameterized SQL can improve write performance by avoiding the prepare cost of a SQL execution through reusing the same prepared statement for multiple executions.

Batch writing and parameterized SQL cannot be used together, because batch writing does not use individual statements. The performance benefits of batch writing are much greater than parameterized SQL; therefore, if batch writing is supported by your database, it is strongly suggested that you use batch writing and not use parameterized SQL.

Parameterized SQL avoids only the prepare part of the SQL execution, not the execute; therefore, it normally does not give a huge performance gain. However, if your database does not support batch writing, parameterized SQL can improve performance. In this example, the number of SQL executions is still 20,200, but the number of SQL prepares is reduced to 4.

Multi-processing

Multiple processes and even multiple machines can be used to split the batch job into several smaller jobs.

Splitting the batch job across ten threads leads to performance increases. In this case, the read from the cursored stream could be synchronized and parallel units of work could be used on a single machine.

Even if the machine has only a single processor, this can lead to a performance increase. During any SQL execution the thread must wait for a response from the server, but in this waiting time the other threads can be processing.

The final optimized example does not show multi-processing as normally the other features are enough to improve the performance.

Example 6-11 Fully optimized

/* Read each batch of employees, acquire a unit of work and register them. */
targetSession.getLogin().useBatchWriting();
targetSession.getLogin().setSequencePreallocationSize(200);

// Read all of the employees from the database, into a stream. This requires 1 
SQL call, but none of the rows will be fetched.
ReadAllQuery query = new ReadAllQuery();
query.setReferenceClass(Employee.class);
query.useCursoredStream();
CursoredStream stream;
stream = (CursoredStream) sourceSession.executeQuery(query);
//SQL: Select * from Employee. Process each batch

   while (! stream.atEnd()) {
   Vector employees = stream.read(100);

// Acquire a unit of work to register the employees

      UnitOfWork uow = targetSession.acquireUnitOfWork();
      uow.registerAllObjects(employees);
      uow.commit();

   }
   //SQL: Begin transaction
   //SQL: Update Sequence set count = count + 200 where name = 'SEQ'
   //SQL: Select count from Sequence where name = 'SEQ'
   //SQL: Commit transaction
   //SQL: Begin transaction
   //BEGIN BATCH SQL: Insert into Address (...) values (...)

      //... repeat this 100 times
      //Insert into Employee (...) values (...)
      //... repeat this 100 times

   //END BATCH SQL:
   //SQL: Commit transactionJava optimization

In most client-server database applications, most of the performance problems come from the communications between the client and the server. This means that optimizing Java code is normally not as important as optimizing database interactions. However, you should still try to write clean, optimized Java code, since very poorly optimized Java code does affect the performance of your application.

Optimization check list

The following is a general checklist to keep in mind when developing Java applications.

Do not make code more complicated than necessary.
Write encapsulated code so that complex behavior can be easily optimized within the encapsulation.
Use instance or static variables to cache the results of expensive computations.
Use hash tables for large collections that are looked up by key.
Always provide default sizes to vectors and hash tables if only a few elements will be added to them.
Postpone executing expensive tasks until absolutely necessary.
Make use of multi-tasking to perform background jobs.
When performing a lot of String manipulations, use a StringBuffer instead of the + operator for appending Strings.
Consider lazy initialization in cases where the value's initialization in the constructor is normally not required.
If using RMI or CORBA, avoid fine-grain remote message sends.

Schema optimization

When designing your database schema and object model, optimization is very important. The key element to remember in the design of your object model and database schema is to avoid complexity. The most common object-relational performance problem is when the database schema is derived directly from a complex object model. This normally produces an over-normalized database schema that can be slow and difficult to query.

Although it is best to design the object model and database schema together, there should not be a direct one-to-one mapping between the two.

Schema Case 1: Aggregation of two tables into one

A common schema optimization technique is to de normalize two tables into one. This can improve read and write performance by requiring only one database operation instead of two.

This technique is demonstrated through analyzing the ACME Member Location Tracking System.

Table 6-3 Original schema

Elements	Details
Title	ACME Member Location Tracking System
Classes	Member, Address
Tables	MEMBER, ADDRESS
Relationships	Source	Instance Variable	Mapping	Target
	Member	address	one-to-one	Address

Table 6-4 Optimized schema

Elements	Details
Classes	Member, Address
Tables	MEMBER
Relationships	Source	Instance Variable	Mapping	Target
	Member	address	aggregate	Address

Domain

In the ACME Member Location Tracking System, employees and addresses are always looked up together.

Problem

Querying a member based on address information requires an expensive database join. Reading a member and its address requires two read statements. Writing a member requires two write statements. This unnecessarily adds complexity to the system and results in poor performance.

Solution

Since members are always read and written with their address information, considerable performance can be gained through combining the MEMBER and ADDRESS tables into a single table, and changing the one-to-one relationship to an aggregate relationship.

This allows all of the information to be read in a single operation, and doubles the speed of updates and inserts as only one row from one table is modified.

Schema Case 2: Splitting one table into many

This example demonstrates how a table schema can be further normalized to provide performance optimization.

Frequently, relational schemas can stuff too much data into a particular table. The table may contain a large number of columns, but only a small subset of those may be frequently used.

By splitting the large table into two or even several smaller tables, the amount of data traffic can be significantly reduced, improving the overall performance of the system.

Table 6-5 Original schema

Elements	Details
Title	ACME Employee Workflow System
Classes	Employee, Address, PhoneNumber, EmailAddress, JobClassification, Project
Tables	EMPLOYEE, PROJECT, PROJ_EMP
Relationships	Source	Instance Variable	Mapping	Target
	Employee	address	aggregate	Address
	Employee	phoneNumber	aggregate	EmailAddress
	Employee	emailAddress	aggregate	EmailAddress
	Employee	job	aggregate	JobClassification
	Employee	projects	many-to-many	Project

Table 6-6 Optimized schema

Elements	Details
Classes	Employee, Address, PhoneNumber, EmailAddress, JobClassification, Project
Tables	EMPLOYEE, ADDRESS, PHONE, EMAIL, JOB, PROJECT, PROJ_EMP
Relationships	Source	Instance Variable	Mapping	Target
	Employee	address	one-to-one	Address
	Employee	phoneNumber	one-to-one	EmailAddress
	Employee	emailAddress	one-to-one	EmailAddress
	Employee	job	one-to-one	JobClassification
	Employee	projects	many-to-many	Project

Domain

This system is responsible for assigning employees to projects within an organization. The most-common operation is to read a set of employees and projects, assign some employees to different projects, and update the employees. Occasionally the employee's address or job classification is used to determine which project would be the best placement for the employee.

Problem

When a large volume of employees is read from the database at one time, their aggregate parts must also be read. Because of this, the system suffers from a general read performance problem. The only solution is to reduce the amount of data traffic to and from the server.

Solution

In this system, normalize the EMPLOYEE table into the EMPLOYEE, ADDRESS, PHONE, EMAIL, and JOB tables.

Since normally only the employee information is read, the amount of data transferred from the database to the client is reduced by splitting the table. This improves your read performance by reducing the amount of data traffic by 25%.

Schema Case 3: Collapsed hierarchy

When models are designed in an object-oriented design and then transformed into a relational model, a common mistake is to make a large hierarchy of tables on the database. This makes it necessary to perform a large number of joins and makes querying difficult. Normally it is a good idea to collapse some of the levels in your inheritance hierarchy into a single table.

Table 6-7 Original schema

Elements	Details
Title	ACME Sales Force System
Classes	Tables
Person	PERSON
Employee	PERSON, EMPLOYEE
SalesRep	PERSON, EMPLOYEE, REP
Staff	PERSON, EMPLOYEE, STAFF
Client	PERSON, CLIENT
Contact	PERSON, CONTACT

Table 6-8 Optimized schema

Elements	Details
Classes	Tables
Person	<none>
Employee	EMPLOYEE
SalesRep	EMPLOYEE
Staff	EMPLOYEE
Client	CLIENT
Contact	CLIENT

Domain

In this system, the clients of the company are assigned to its sales force representatives. The managers track which sales representatives are under them.

Problem

The system suffers from over-complexity, which hinders the development and performance of the system. Large expensive joins are required to do almost anything, making every database operation expensive.

Solution

By collapsing the three-level table hierarchy into one, the complexity of the system is reduced. All of the expensive joins in the system are eliminated and simplified queries allow read performance to be further optimized leading to greatly improved system performance.

Schema Case 4: Choosing one out of many

A common situation is for an object to have a collection of other objects where only one of the other objects in the collection is commonly used. In this situation, it is desirable to add an instance variable just for this special object. This way, the important object can be accessed and used without requiring the instantiation of all of the other objects in the collection.

Table 6-9 Original schema

Elements	Details
Title	ACME Shipping Package Location Tracking System
Classes	Package, Location
Tables	PACKAGE, LOCATION
Relationships	Source	Instance Variable	Mapping	Target
	Package	locations	one-to-many	Location

Table 6-10 Optimized schema

Elements	Details
Classes	Package, Location
Tables	PACKAGE, LOCATION
Relationships	Source	Instance Variable	Mapping	Target
	Package	locations	one-to-many	Location
	Package	currentLocation	one-to-one	Location

Domain

This system is used by an international shipping company, which wants to be able to track the location of its packages as they travel from their source to their destination. When a package is moved from one location to another, a location is created in real-time on the database. The application normally receives a request for the current location of a particular package and displays this for the user.

Problem

A package could accumulate many locations as it travels to its destination, so reading all of these locations from the database is expensive.

Solution

By adding a specific instance variable for just the current location and a one-to-one mapping for the instance variable, the current location can be accessed without reading in all of the other locations. This drastically improves the performance of the system.

6Performance Optimization

Basic performance optimization

TopLink reading optimization features

Table 6-1 Read optimization features

Reading Case 1: Displaying names in a list - optimized through partial object reading and report query

Partial object reading

Example 6-1 No optimization

Example 6-2 Optimization through partial object reading

Example 6-3 Optimization through report query

Conclusion

Reading Case 2: Batch reading objects

Example 6-4 No optimization

Example 6-5 Optimization through joining

Example 6-6 Optimization through batch reading

Conclusion

Reading Case 3: Using complex custom SQL queries

Reading Case 4: Viewing objects

Example 6-7 No optimization

Example 6-8 Optimization through view object

Example 6-9 With optimization

TopLink writing optimization features

Table 6-2 Write optimization features

Writing Case 1: Batch writes

Example 6-10 No optimization

Batching and cursoring

Sequence number pre-allocation

Batch writing

Parameterized SQL

Multi-processing

Example 6-11 Fully optimized

Optimization check list

Schema optimization

Schema Case 1: Aggregation of two tables into one

Table 6-3 Original schema

Table 6-4 Optimized schema

Domain

Problem

Solution

Schema Case 2: Splitting one table into many

Table 6-5 Original schema

Table 6-6 Optimized schema

Domain

Problem

Solution

Schema Case 3: Collapsed hierarchy

Table 6-7 Original schema

Table 6-8 Optimized schema

Domain

Problem

Solution

Schema Case 4: Choosing one out of many

Table 6-9 Original schema

Table 6-10 Optimized schema

Domain

Problem

Solution

6
Performance Optimization