Monday, May 24, 2010

Designing Cache

Cache is one very important concept in computer science. It is one design technique that is likely to help you when your system is thwarted by performance problems. Give this it is highly likely that at some point in your life as a Software Guy you have to design a cache. Cache generally employs multiple data structures for efficiency. Cache has limited space to store data so depending on your application you might have to chose a replacement policy. Ideally you would want to throw the element that would be unused to make room for new elements. But our world is not perfect and we have no way of knowing which element is likely to be unused in the future. The best thing we can do is to make a guess based on our application or may be log the usage statistics and fine tune the cache. Based on the replacement policy we can design caches in different ways.

LRU Cache

LRU cache is the simple cache where the least recently used item is thrown off the cache to make room for new one. As cache requires fast lookup we can always rely on hashtable for fast lookup. But how can we find least recently used item in Hashtable? We can't. So we need to find some other data structure that does it effectively. Why not use a queue? Whenever element is accessed put it on the tail. So front of the queue will be the least recently used one. Cool but we cannot access elements randomly from the queue. Queue is a data structure where you can remove element from the front and put the element from the back. To be able to access element anywhere we can use LinkedList. This solves the problem pretty effectively. SO here is the simple design: Create Node class that holds cache key and data. This node will be node of the linked list and also based on the key, it is inserted into hastable. We can use pointers so that both hashtable and Linked List point to same node. Lookup will require looking up in the hastable, if the data is not there and cache is full then remove the first element from the linked list and fetch new element and put it in back of the list. If cache is not full and data is there simply put the Node to the back of the list. The operations are quite simple.

MRU Cache
The neeed of this kind of cache may not arise that often but it can be implemented similarly as before but you can remove the tail of the list to remove most recently used item.

LFU cache
Least frequently used cache. This is quite difficult to design because we need to keep count of the accesses of each entry in cache. Here is my idea: Use Binary Search Tree with frequency of access as key and Hashtable for fast lookup. For lookup, if entry is in the cache, take the entry Node , delete it from the BST, increase its frequency and reinsert it into BST. While replacing the entry chose the leftmost element.

There are different other schemes. I would also like to hear from readers about these schemes and their solution.

No comments:

Post a Comment