This blog is mainly about Java...

Monday, March 7, 2011

Creating an efficient memory based cache

If you find your self in the situation where you are creating global maps as cache, then you have to stop and rethink.
Global maps are prone to memory leaks

You should instead consider using soft reference and WeakHashMap or what I prefer, the MapMaker of Google Guava.

In this blog post, I will describe an efficient way of creating a map based cache. This can can either be stored in the session, application, or your existing cache.

Weak References
- What are weak references?

Weak reference basically means that the garbage collector can come and remove it when it is no longer in use. You have no guarantee that whatever you put in the map, will actually be around when you try to get it.
The reason why that is so useful is if you don't want to (or cannot afford to) retain an object indefinitely in memory.


Consider the following use case: You need to associate information with classes. Now, since you are running in an environment, where classes might get reloaded (say, a Tomcat, or OSGi environment), you want the garbage collector to be able to reclaim old versions of a class as soon as it deems safe to do so.

An initial attempt to implement this, might look like something like this:
 class ClassAssociation {  
   private final IdentityHashMap<Class<?>,MyMetaData> cache = new ...;  
 }  

The problem here is; this would keep all classes in the cache member forever (or at least, unless they are manually removed), forcing the garbage collector to retain them indefinitely, including everything referenced from the class (static member values, class loader information, etc).

By using weak references, the garbage collector can reclaim old version of the class as soon as no other references to it (usually instances) exist.
On the other hand, as long as such references exist, the value is guaranteed to also be reachable from the weak reference object, and thus, is a valid key in the cache table.

MapMaker FTW!

The thing about MapMaker is that there are many options for the kind of map you build, which enables those maps to serve many different purposes.
With the MapMaker you can choose between weak keys or weak values.

  • Soft values are useful for caching, as you can cache values in the map without worrying about running out of memory since the system is free to evict entries from the cache if it needs memory.
  • You can choose to have entries expire after a certain amount of time. This is also useful for caching, since you may want certain data cached for a specific period of time before doing an expensive operation to update it.
  • One of my favorite things is making a computing map. A computing map uses a Function to automatically retrieve the value associated with a given key if it isn't already in the map. This combines well with soft values and/or expiration times. After an entry is evicted by the map (due to memory demand or expiration), the next time the value associated with that key is requested it will automatically be retrieved and cached in the map once more.
Consider this example:
You have an expensive computation or query which you want to cache for performance gains. You store the value in a map with an id as key which you will use to retrieve your values.
Normally you would store these values in a regular HashMap and store the hashmap in the cache, session or application. Now we have seen that this is generally not a good idea, since it will consume a lot of memory.
It is in these situations the MapMaker shines!
Lets say you have a list of Tasks for each User. 
You would normally query the tasks like this:
Map<User,List<Task>> cache = new HashMap<User,List<Task>>(); //the global cache defined somewhere
 if(cache.get(user) == null) {
   List<Task> userTasks = getTasksForUser(user); // perform an intensive computation/query which we want to cache
   cache.put(user, userTasks);
 }
 return cache.get(user);

If you want to rewrite this to use a Computing MapMaker you would write like this:
ConcurrentMap<String, List<Task>> cache = ...// Get the cache
    if(cache != null) {
      //If the tasks have been garbage collected, the function is applied, and you get the tasks 
      return cache.get(user);
    } else {
      ConcurrentMap<String, List<Task>> cache = new MapMaker().softValues().expireAfterWrite(2L, TimeUnit.HOURS)
        .makeComputingMap(new Function<User, List<Task>>() {
        @Override
        public List<Task> apply(User user) {
          return getTasksForUser(user); // perform an intensive computation/query which we want to cache 
        }
      });
      
      cache.put(user, getTasksForUser(user));
      return cache.get(user);
    }

Here we have created a ConcurrentMap with weak values, which will be garbage collected in two hours. If the tasks have been garbage collected and the user is retrieving the tasks, the function is applied, and you get the tasks automatically, and put it back in the cache for another two hours.

Simple and great!

No comments:

Labels