Tuesday, March 23, 2010

Getting the Persistence Context Picture (Part I)

This article series deals with Hibernate‘s basic APIs and how Hibernate is used in Grails applications. The first part of this series is meant to be seen as overall introduction to objects, persisting objects and Hibernate as a persistence framework.
I have been in a lot of projects where Hibernate and the persistence layer was handled as the application‘s holy grail: whenever an error was thrown, programmers did not try to understand the concrete problem, but consumed their time by finding work-arounds. From my experience, that behavior was simply caused by a lack of knowledge about basic persistence context patterns. In this article i will try to explain the most fundamental patterns and concepts which should already help to gain knowledge on how persistence frameworks core data-structures work.

Objects, objects everywhere...

Let's start with object-orientation. Implementing a persistence framework mainly involves the question on how to map objects from an object-oriented domain into a relational data model, which is found in most databases we're dealing today. In order to understand persistence mechanisms from bottom-up, we should revise the basic concepts on objects and classes. The basic definition of objects is:




Whenever we are talking about objects in an object-oriented context, we speak of runtime representatives of classes, whereas the classes can be seen as construction plans to be used when running the program and constructing new instances. A class consists of attributes and operations on that attributes. Objects at runtime represent the attribute‘s values which are tightly connected with the class‘s operations on them. If seen from a logical view, an object is represented by it‘s state and operations.



Attributes might be of any datatype available in the programming environment. Most programming languages decide between simple datatypes and custom class data types, whereas custom class data-types contain custom as well as API classes. At runtime therefore attribute values either contain scalar values (e.g. a number, a string, a boolean value, etc.) or references to other objects. A reference‘s value might either reference another object or is void.

Every object created during the execution of an object-oriented program has an object identity. Depending on its context object identity has two meanings:

1. By reference: an object A denotes as being equal to another object B if their references are equal.




2. By state: an object A denotes as being equal to another object B if their attribute values are equal.



In object-orientational theory the first one is named „object identity“ and the latter one „object equality“, thus being identical is not as being the same.

Unfortunately in the Java environment being an identical object means the same as being equal to another object, since java.lang.Object.equals() implements reference comparison with the == operator by default.

Let‘s change our view to a completely relational mapping model. In a first naive approach our tables would correspond to classes, whereas columns represent the class‘s attributes.



At runtime an object instance would be represented by a single database row filled with values for each of the available columns. In fact, this is how it is done most of the time when we are using persistence frameworks like Hibernate.

The examples above show very simple structured object's classes. In practice persistence frameworks also need a way to map relationships (1:n, m:n, m:1, 1:1) between objects and database tables. Usually this is done using foreign keys in the relational model, but imagine a collection with a lot of referring objects - the persistence frameworks APIs need to provide mechanisms for batch loading, cursor support etc.

We have already seen object identity and its two characteristics - with database persistency a third identity comes into play: the object's primary key. But the problem is that objects don't necessarily know about their primary key until a key is explicitly requested from the database. It gets even trickier if you think of relationships between objects - how can a programmer ensure that relationships are mapped in the correct order depending on foreign key constraints between the objects database tables.

Since programmers really should not deal with issues like object relational mapping, object identity, batch loading relationships for relationship traversal, etc. various persistence or object-relational mapping (ORM) frameworks have prospered. They all have in common that they provide functionality that persists objects of an object-oriented programming environment into some persistent store, thus persisted objects are called persistent objects.


The Persistence Context

For persistent objects there needs to be some explicitly defined context in which creation, modification and retrieval of persistent objects can happen. This context is known as Persistence Context (or Persistent Closure). In fact, most of the persistence frameworks provide APIs that provides access to a persistence context, even though the persistence context‘s functionality is often split into several APIs.

Let's take a look at the basic definition of persistence contexts ([0] "Persistence Context" pattern):



Notice that the term „business transaction“ does not refer to database transactions. A business transaction is a logical transaction that might span several operations e.g. ordering a pizza from the customer‘s view is a single business transaction, but it might be the case that this single business transaction involves several technical transactions to complete the request.

To lookup the current persistence context during execution classes might use a Registry which provides access to the current persistence context.

The persistence context deals with few problems caused by the object-orientation/relational mapping mismatch. It ensures that all operations on objects are tracked for the persistence context‘s life time, to keep possible db transactions as short as possible. It handles the problem of object identity and ensures that there will never be multiple object instances with the same database primary key. It tracks associations and resolves them in order to satisfy foreign key constraints. It implements a cache mechanism to automatically gain a certain transaction isolation level to solve repeatable read problems and to lower the number of executed SQL statements. Overall, a persistence provider already has gained a lot of knowledge about database systems and ORM so I would consider decisions for custom implementation of persistence contexts as highly risky.

Hibernate‘s Persistence Context APIs

Let‘s take a look at how Hibernate implements the Persistence Context and related patterns. Overall, Hibernate‘s persistence context API mainly consists of the following classes:

org.hibernate.SessionFactory

The SessionFactory resembles the persistence context managing component in Hibernate - the registry. This is the central place for global configuration and set-up. During runtime  there will only be a single session factory instance in your application‘s environment (except for multiple data-sources, e.g. legacy databases, ldap providers etc.) which acts as factory for retrieving persistence context objects.

org.hibernate.Session

A Session represents a component which tracks (de-)attachment, modification and retrieval of persistent objects. This is Hibernate‘s persistence context (finally). Whenever you need a persistence context in your application you have to look up a SessionFactory and create a new Session using the openSession() method. Be aware that Hibernate is not restricting programmers in how you handle sessions. If you decide to implement a long-running business transaction (aka conversation) with a single session instance, you are free to do so.

org.hibernate.Transaction

A Transaction actually is used to implement a Unit of Work within the application. A single session might span multiple transactions and it is recommended that there is at least a single uncommitted transaction when working in a session. Note that the actual implementation of how the transaction is handled on database-side is hidden by Hibernate‘s implementation of that interface.

Still if you use Hibernate without an explicit call to session.beginTransaction() Hibernate will operate in auto-commit mode (be sure to specify connection.autocommit=„true“ in your configuration‘s xml).

Lifecycle Management

So far we have heard from persistent objects as being the type of objects which have been persisted by the persistence context. But that persistent state is just a single station in the life-cycle of objects managed by Hibernate as a persistence provider.

In fact, I assume that in most applications the domain model entities will have to be kept in some persistent store. Therefore an entity object‘s instance will run through several states between instance creation and being actually stored in the persistent store of your choice:

  1. (De-) Attaching Instances
  2. Saving/Updating Instances
  3. Removing Instances

(De-) Attaching Instances



Imagine your first application‘s bootstrap. Chances are good that you might have to create some persistent objects on startup to get the system working. Whenever creating object instances which have never been persisted we are talking of transient objects. These objects have in no way been connected to a persistence context.

The process of letting the persistence context know of the existing of a transient object is called „attaching“. Therefore, newly attached objects are called either attached or persistent objects.

On the other way around the process of disconnecting persistent objects from the current persistence context is known as „dettaching“ or „evicting“.

Saving or Updating Instances



Save or update operations can only be applied to persistent objects. If you need to save a transient or dettached object, you as an application developer have to attach that object instance to the current persistence context. Hibernate eases these two steps (attaching/save) since it provides update/save methods which automatically attach transient or detached objects.

Removing Instances



Removing can only be applied on persistent objects. As it is the case with save/update operations transient or dettached objects first needs to be attached to the current persistence context to get removed. Whenever a persistent object is removed it actually is just marked as being in state „removed“ by the underlying persistence mechanisms.

Summary

The gained knowledge about the Persistence Context pattern and Life-Cycle of persistent objects already equips us with a lot of basic knowledge on how persistence frameworks like Hibernate operate. In the next part we will take a look at using Hibernate APIs in applications and how Grails utilizes Hibernate.

[0] Patterns of Enterprise Application Architecture, Martin Fowler


No comments:

Post a Comment