Saturday, September 20, 2008

WeakHashMap is not a cache! Understanding WeakReference and SoftReference

If you ever found yourself in the need to implement a simple caching functionality in your Java programs, chances are that you at least considered using the WeakHashMap class as the cache.

It turns out that the WeakHashMap makes for a terrible cache, and for two reasons. The first reason is that it uses weak references as the underlying memory management mechanism. The second reason is that the weak references are used for the keys and not for the values, which is what you would want.

Reference classes and reachability

To understand what the WeakHashMap is good for, we need to understand the WeakReference and SoftReference classes and what is the difference between them. Both extend from the Reference class, which resides, along with its children, in the java.lang.ref package. The Reference classes are used to represent object references that are weaker than regular java references, which are called strong references Objects that can be reached by a chain of only strong references never get garbage collected. The weaker the references to an object, the more likely the object will be reclaimed by the garbage collector.

The stronger type of reference is the strong reference, like when you declare String name = "John Doe";. The name variable is a strong reference to the "John Doe" String object. SoftReferences are weaker than strong references, and WeakReferences are weaker than SoftReferences. There is also an even weaker type of reference, the PhantonReference, of which I'm not going to talk about here.

The type of references involved in the reference chain that starts from a local or a static variable and ends in an object defines the type of object's reachability. The Java API explains the different categories of object reachability this way:

  • An object is strongly reachable if it can be reached by some thread without traversing any reference objects. A newly-created object is strongly reachable by the thread that created it.

  • An object is softly reachable if it is not strongly reachable but can be reached by traversing a soft reference.

  • An object is weakly reachable if it is neither strongly nor softly reachable but can be reached by traversing a weak reference. When the weak references to a weakly-reachable object are cleared, the object becomes eligible for finalization.

  • An object is phantom reachable if it is neither strongly, softly, nor weakly reachable, it has been finalized, and some phantom reference refers to it.

  • Finally, an object is unreachable, and therefore eligible for reclamation, when it is not reachable in any of the above ways.


if the garbage collector determines that an object is strongly reachable, it will not reclaim the object. This is what we would expect. Nobody wants to have an object garbage collected when it can still be reached by a chain of strong references. Now, here is the important point that is not written in the explanation above: If the garbage collector determines that an object is softly reachable, it may clear atomically all soft references to the object, in the case that it finds that memory is running low, or at its own discretion. But if the garbage collector determines that an object is weakly reachable, it will clear atomically all weak references to the object. This is the major difference between weak and soft references and the reason that makes the WeakReference ill-suited for caching.

What is the WeakHashMap good for?

Now it is easy to understand why the WeakHashMap doesn't work for caching. First of all it wouldn't work anyway because it uses soft references for the keys and not for the map values. But additional to that, the garbage collector aggressively reclaims the memory that is referenced only by weak references. It means that once you lose the last strong reference to an object that is working as a key in a WeakHashMap, the garbage collector will soon reclaim that map entry.

If the WeakHashMap is no good for caching, then what is it good for? It is good to implement canonical maps. Lets say you want to associate some extra information to an object that you have a strong reference to. You put an entry in a WeakHashMap with the object as the key, and the extra information as the map value. Then, as long as you keep a strong reference to the object, you will be able to check the map to retrieve the extra information. And once you release the object, the map entry will be cleared and the memory used by the extra information will be released.

Can I just copy and paste the WeakHashMap class to write my cache?

No. Please, don't copy and paste the WeakHashMap source code replacing WeakReference with SoftReference. This won't be effective. To understand why, look at this example:
SoftHashMap cache = new SoftHashMap(); // A copy and paste from WeakHashMap
SomeExpensiveClass myReference1 = .... // get expensive class instance
cache.put(new Long(10), myReference1); // put the expensive object in the cache
... // do some stuff, but keep the myReference1 variable around!
SomeExpensiveClass myReference2 = cache.get(new Long(10)); // query the cache
if (myReference2 == null) {
// Uh-oh, the cache got rid of the object, even though I
// still had a reference to it in the myReference1 variable!
}

You would expect the cache to keep the reference to the object, since the myReference1 variable was still around. This may happen, but it may also happen that the cache will have been cleared. Why is this? Because at the "do some stuff" block, the garbage collector may have kicked in, noticed that the map's key was softly reachable, and chosen to garbage collect it. Remember, the WeakHashMap uses WeakReference for the keys, so your copy-paste implementation would use SoftReference for the keys. This reduces the effectiveness of the cache because you would expect the object to remain in the cache since the strong reference to it is still around. But the problem is that there is no strong reference to the key in the example above.

So how the hell do I implement a cache in Java?

My suggestion is to use one of the freely available Cache implementations, like JCS, OSCache and others. Those libraries provide better memory management with LRU and FIFO policies for instance, disk overflow, data expiration and many other optional advanced features.

If you still want to implement a cache class that takes advantage of SoftReferences yourself, implement your own Map class that extends AbstractMap and internally wrap the map values in SoftReferences. Then you need to implement an internal mechanism to remove the stale entries from the map. For this, you may want to use a ReferenceQueue, and poll the ReferenceQueue for collected entries. Then, for each collected entry, you will need to remove it from the map. Make sure you do this in an efficient way! Or you may end up with some nested loops and O(n2) operations and your cache performance will suck.

More Information

API documentation of the java.lang.ref package for an explanation of the Reference classes and object reachability.
API documentation of the WeakHashMap class for more information on the limitations and application of the SeakHashMap class.

12 comments:

meadandale said...

You CAN use a weak hashmap as a temporal cache and I've done this before.

In my case, I was implementing flyweight support for jfreechart, which tends to create lots of primitive wrapper objects. It was causing millions of objects to be created for every graph I created.

I implemented a wrapper object cache using a weak hashmap so that strings and primitives would map to shared instances of wrapper objects. It allowed charts to reuse the same wrapper objects and also allowed concurrently executing threads to also share the same values.

It RELIED on the fact that once the charts were created, the caches got purged by the GC.

It decreased the object churn (and associated performance problems we were seeing) by better than 3 orders of magnitude.

But, in general, you are correct in that weak hashmaps don't make good caches based on the common definition most people have for a cache.

Domingos Neto said...

Hi meadandale,

You application of the WeakHashMap is actually a very clever one! I wouldn't have thought of it myself and I am already thinking of places where I can apply it in my own applications :)

I do agree with you that what you described is not what is conventionally called a cache :) A cache is a memory location where you can store data that is otherwise expensive to obtain. I just couldn't find yet a better term for the kind of structure that you created, so, even though it can be called a cache for lack of a better name, it is not what is conventionally known by this name!

Dimitris Andreou said...

http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/ReferenceMap.html

Domingos Neto said...

Thanks for that Dimitris!

JodaStephen said...

"Please, don't copy and paste the WeakHashMap source code replacing WeakReference with SoftReference"

This would be against Sun's copyright. Please don't copy someone else's code.

The google collections ReferenceMap follows the ideas started in Apache Commons:
http://commons.apache.org/collections/api/org/apache/commons/collections/ReferenceMap.html

Note that the Apache Commons version is suitable for Java 1.4, and the Google one is generified.

Domingos Neto said...

jodastephen:

very good point :)

Ros said...

Interesting to know.

thaakshana said...

Hi Domingos,
Your article is informative.

However I wish to clarify one point.

In the copy pasted example code snippet you say..

// Uh-oh, the cache got rid of the object, even though I // still had a reference to it in the myReference1 variable!

But you really mean that the key could be GC'd and therefore you cannot get at the value (because there is no key!) which however is still being referenced by myReference1 right?

Bit confused here would be grateful if you could set the record straight.

Matthias said...

Great article, reflects my confusion about WeakHashMap being cited as a caching facility.

Also thanks for the pointer to Google's ReferenceMap, gotta use that instead.

arrun said...

Hi,
We can implement a caching system [LRU] using linkedhashmap. Please refer to the documentation in spec for details.

Anonymous said...

"This reduces the effectiveness of the cache because you would expect the object to remain in the cache since the strong reference to it is still around"

Ummm no. This is something you made up. Nobody would expect a cache to know that you are keeping a reference to an object. Even if the cache were strictly LRU, other threads could have queried the cache so many times that the object you retrieved would no longer be in the cache.

Swaraj Gupta said...

I think there is typo under "What is the WeakHashMap good for?" section.

The line saying ..."First of all it wouldn't work anyway because it uses soft references for the keys ..."

Here it should "...it uses weak references ..."