Equals and HashCode
Java's Collections and Relational database (and thus Hibernate) relies heavily on being able to distinguish objects in a unified way. In Relational database's this is done with primary keys, in Java we have equals() and hashCode() methods on the objects. This page tries to discuss the best strategies for implementation of equals() and hashcode() in your persistent classes.
Why is equals() and hashcode() important
Normally, most Java objects provide a built-in equals() and hashCode() based on the object's identity; so each new() object will be different from all others.
This is generally what you want in ordinary Java programming. And if all your objects are in memory, this is a fine model. Hibernate's whole job, of course, is to move your objects out of memory. But Hibernate works hard to prevent you from having to worry about this.
Hibernate uses the Hibernate session to manage this uniqueness. When you create an object with new(), and then save it into a session, Hibernate now knows that whenever you query for an object and find that particular object, Hibernate should return you that instance of the object. And Hibernate will do just that.
However, once you close the Hibernate session, all bets are off. If you keep holding onto an object that you either created or loaded in a Hibernate session that you have now closed, Hibernate has no way to know about those objects. So if you open another session and query for "the same" object, Hibernate will return you a new instance. Hence, if you keep collections of objects around between sessions, you will start to experience odd behavior (duplicate objects in collections, mainly).
The general contract is: if you want to store an object in a List, Map or a Set then it is an requirement that equals and hashCode are implemented so they obey the standard contract as specified in the documentation.
What is the problem after all?
So let's say you do want to keep objects around from session to session, e.g. in a Set related to a particular application user or some other scope that spans several Hibernate sessions.
The most natural idea that comes to mind is implementing equals() and hashCode() by comparing the property you mapped as a database identifier (ie. the primary key attribute). This will cause problems, however, for newly created objects, because Hibernate sets the identifier value for you after storing new objects. Each new instance therefore has the same identifier, null (or <literal>0</literal>). For example, if you add some new objects to a Set:
// Suppose UserManager and User are Beans mapped with Hibernate
UserManager u = session.load(UserManager.class, id);
u.getUserSet().add(new User("newUsername1")); // adds a new Entity with id = null or id = 0
u.getUserSet().add(new User("newUsername2")); // has id=null, too, so overwrites last added object.
// u.getUserSet() now contains only the second User
As you can see relying on database identifier comparison for persistent classes can get you into trouble if you use Hibernate generated ids, because the identifier value won't be set before the object has been saved. The identifier value will be set when session.save() is called on your transient object, making it persistent.
If you use manually assigned ids (e.g. the "assigned" generator), you are not in trouble at all, you just have to make sure to set the identifier value before adding the object to the Set. This is, on the other hand, quite difficult to guarantee in most applications.
Seperating object id and business key
To avoid this problem we recommend using the "semi"-unique attributes of your persistent class to implement equals() (and hashCode()). Basically you should think of your database identifier as not having business meaning at all (remember, surrogate identifier attributes and automatically generated vales are recommended anyway). The database identifier property should only be an object identifier, and basically should be used by Hibernate only. Of course, you may also use the database identifier as a convenient read-only handle, e.g. to build links in web applications.
Instead of using the database identifier for the equality comparison, you should use a set of properties for equals() that identify your individual objects. For example, if you have an "Item" class and it has a "name" String and "created" Date, I can use both to implement a good equals() method. No need to use the persistent identifier, the so called "business key" is much better. It's a natural key, but this time there is nothing wrong in using it!
The combination of both fields is stable enough for the life duration of the Set containing your Items. It is not as good as a primary key, but it's certainly a candidate key. You can think of this as defining a "relational identity" for your object -- the key fields that would likely be your UNIQUE fields in your relational model, or at least immutable properties of your persistent class (the "created" Date never changes).
In the example above, you could probably use the "username" property.
Note that this is all that you have to know about equals()/hashCode() in most cases. If you read on, you might find solutions that don't work perfectly or suggestions that don't help you much. Use any of the following at your own risk.
Workaround by forcing a save/flush
If you really can't get around using the persistent id for equals() / hashCode(), and if you really have to keep objects around from session to session (and hence can't just use the default equals() / hashCode()), you can work around by forcing a save() / flush() after object creation and before insertion into the set:
// Suppose UserManager and User are Beans mapped with Hibernate
UserManager u = session.load(UserManager.class, id);
User newUser = new User("newUsername1");
// u.getUserSet().add(newUser); // DO NOT ADD TO SET YET!
session.save(newUser);
session.flush(); // The id is now assigned to the new User object
u.getUserSet().add(newUser); // Now OK to add to set.
newUser = new User("newUsername2");
session.save(newUser);
session.flush();
u.getUserSet().add(newUser); // Now userSet contains both users.
Note that it's highly inefficient and thus not recommended. Also note that it is fragile when using disconnected object graphs on a thin client:
// on client, let's assume the UserManager is empty:
UserManager u = userManagerSessionBean.load(UserManager.class, id);
User newUser = new User("newUsername1");
u.getUserSet().add(newUser); // have to add it to set now since client cannot save it
userManagerSessionBean.updateUserManager(u);
// on server:
UserManagerSessionBean updateUserManager (UserManager u) {
// get the first user (this example assumes there's only one)
User newUser = (User)u.getUserSet().iterator().next();
session.saveOrUpdate(u);
if (!u.getUserSet().contains(newUser)) System.err.println("User set corrupted.");
}
This will actually print "User set corrupted." since newUser's hashcode will change due to the saveOrUpdate call.
This is all frustrating because Java's object identity seems to map directly to Hibernate-assigned database identity, but in reality the two are different -- and the latter doesn't even exist until an object is saved. The object's identity shouldn't depend on whether it's been saved yet or not, but if your equals() and hashCode() methods use the Hibernate identity, then the object id does change when you save.
It's bothersome to write these methods, can't Hibernate help?
Well, the only "helping" hand Hibernate can provide is hbm2java.
hbm2java generates a default implementation of equals() and hashcode() which relies on id comparison. After all hbm2java has no way to know your unique buisness key. Often, you will want plain old Java equality rather than persistent-id equality; there is a JIRA issue that you can vote for if you want to enable hbm2java to not generate its default implementation.
In the future we might provide a feature in hbm2java that allows you to mark which properties make up the unique identity of your persistent class; this would support the "candidate key" type of identity.
Summary
To sum all this stuff up, here is a listing of what will work or won't work with the different ways to handle equals/hashCode:
no eq/hC at all
eq/hC with the id property
eq/hC with buisness key
use in a composite-id
No
Yes
Yes
multiple new instances in set
Yes
No
Yes
equal to same object from other session
No
Yes
Yes
collections intact after saving
Yes
No
Yes
Where the various problems are as follows:
use in a composite-id:
To use an object as a composite-id, it has to implement equals/hashCode in some way, == identity will not be enough in this case.
multiple new instances in set:
Will the following work or not:
HashSet someSet = new HashSet();
someSet.add(new PersistentClass());
someSet.add(new PersistentClass());
assert(someSet.size() == 2);
equal to same object from another session:
Will the following work or not:
PersistentClass p1 = sessionOne.load(PersistentClass.class, new Integer(1));
PersistentClass p2 = sessionTwo.load(PersistentClass.class, new Integer(1));
assert(p1.equals(p2));
collections intact after saving:
Will the following work or not:
HashSet set = new HashSet();
User u = new User();
set.add(u);
session.save(u);
assert(set.contains(u));
Any best practicies for equals and hashcode
Read the links in 'Background material' and the API docs - they provide the gory details.
Furthermore I encourage anyone with information and tips about equals and hashcode implementations to come forward and show their "patterns" - I might even try to incorporate them inside hbm2java to make it even more helpful ;)
Background material:
Effective Java Programming Language Guide, sample chapter about equals() and hashCode()
Java theory and practice: Hashing it out, Article from IBM
Sam Pullara (BEA) comments on object identity: Blog comment
Article about how to implement equals and hashCode correctly by Manish Hatwalne: Equals and HashCode
Forum thread discussing implementation possibilities without defining a business identity: Equals and hashCode: Is there *any* non-broken approach?