Wednesday, May 26, 2010

Number base conversion and arithmetic

Perhaps most of us came across base conversions when we were in high school. It's pretty easy to convert numbers from and to decimal if we remember 2 simple rules. If we desire to convert from/to base b to/from decimal we can use following rules:

Conversion from decimal to base b

Divide the given decimal number by b and store the remainder in a string. Keep on dividing until we the number can be divided no more.

num be the given decimal number

digit[i] = (num /b ^ (i+1)) % b.

resulting number in base b is digit[0] digit[1] ...digit[n-1].

Conversion from base b to decimal

num be the given number in base b and n be the number of digits

decimal = digit[n - 1] * b ^ (n -1)+...+digit[i] * b ^ (i) + digit[1] * b^1 + digit[0] * b^0


These 2 techniques are sufficient to convert between numbers in any bases.
Lets say you want to convert a number from base x into base y. To solve this you can convert base x into decimal and then convert the decimal into base y. These 2 techniques are easily transformed into programs. But it might not be a good idea to use these techniques when bases are power of 2. If bases are power of 2, bit manipulation techniques can be applied to convert. For example for converting from decimal to hex, it's always lot faster to take 4 bits at a time and convert them to hexadecimal. For example if given number is 35 (00100011, take 4 rightmost bits (0011) which is 3 and again take next 4 bits(0010) which is 2. Thus the Hex number for this is 23. This is likely to be faster than division.

Arithmetic

How would you add/subtract or do some other arithmetic operations on 2 numbers in base b? The simple idea is to convert the numbers into decimal and do the operation and then convert the resulting number back to b. The operation can be implemented for the given base itself but requires significant effort than the conversion technique.

Tuesday, May 25, 2010

Web Application Performance Improvement

When a web application is accessed by few users the performance may not be a huge problem but as the numbers of users grow it might bring your application to a halt if not designed properly with scalability in mind.It's always good to start out by profiling the application first. The profiling can give you insight on the piece of code that is expensive. A piece of code that is expensive might not necessarily be the main culprit. Always look for the piece of code that is extensively called. For example you might have one method that takes 2 seconds to run and then the other method that takes 200 ms. First intuition might be to make the first method efficient. But it might be the case that your application calls the method 1 or 2 times whereas it calls the second method thousands of time. In this case it's always good to optimize the second method.

Connection Pooling

If your application accesses database a lot make sure you are using connection pooling. And also consider using Stored Procedure if you need to filter results or make multiple requests to the database in the same code path.

Caching

If results can be reused across multiple requests, it's always a good idea to employ cache.

Compression

Make sure server supports compression and is using it to send results. This decrease communication time.


Clustering

Use server cluster for your application.

Resource contention

Minimize resource contention because if lot of requests are competing for same resource they degrade performance and hinder scalability. For example servlet is a shared resource and having synchronized method in servlets may cause scalability problems.

Memory Usage

Watch memory usage of your applications and try to reduce unnecessary memory usage.Use of large amount of memory might trigger garbage collector frequently which uses precious CPU time which could have been used for serving users.

Monday, May 24, 2010

Designing Cache

Cache is one very important concept in computer science. It is one design technique that is likely to help you when your system is thwarted by performance problems. Give this it is highly likely that at some point in your life as a Software Guy you have to design a cache. Cache generally employs multiple data structures for efficiency. Cache has limited space to store data so depending on your application you might have to chose a replacement policy. Ideally you would want to throw the element that would be unused to make room for new elements. But our world is not perfect and we have no way of knowing which element is likely to be unused in the future. The best thing we can do is to make a guess based on our application or may be log the usage statistics and fine tune the cache. Based on the replacement policy we can design caches in different ways.

LRU Cache

LRU cache is the simple cache where the least recently used item is thrown off the cache to make room for new one. As cache requires fast lookup we can always rely on hashtable for fast lookup. But how can we find least recently used item in Hashtable? We can't. So we need to find some other data structure that does it effectively. Why not use a queue? Whenever element is accessed put it on the tail. So front of the queue will be the least recently used one. Cool but we cannot access elements randomly from the queue. Queue is a data structure where you can remove element from the front and put the element from the back. To be able to access element anywhere we can use LinkedList. This solves the problem pretty effectively. SO here is the simple design: Create Node class that holds cache key and data. This node will be node of the linked list and also based on the key, it is inserted into hastable. We can use pointers so that both hashtable and Linked List point to same node. Lookup will require looking up in the hastable, if the data is not there and cache is full then remove the first element from the linked list and fetch new element and put it in back of the list. If cache is not full and data is there simply put the Node to the back of the list. The operations are quite simple.

MRU Cache
The neeed of this kind of cache may not arise that often but it can be implemented similarly as before but you can remove the tail of the list to remove most recently used item.

LFU cache
Least frequently used cache. This is quite difficult to design because we need to keep count of the accesses of each entry in cache. Here is my idea: Use Binary Search Tree with frequency of access as key and Hashtable for fast lookup. For lookup, if entry is in the cache, take the entry Node , delete it from the BST, increase its frequency and reinsert it into BST. While replacing the entry chose the leftmost element.

There are different other schemes. I would also like to hear from readers about these schemes and their solution.

Monday, May 10, 2010

Reversing a Linked List

I heard this is one of the most frequently asked questions in an interview. It seems to be asked quite a lot in phone screens. May be because it's simple, gives a good idea whether you have grasp of pointers and whether you have an idea of basic data structure (Of course linked list). There are 2 ways to approach this problem. The first one is recursive. But you might not want to come up with recursive solution during interview unless interviewer explicitly directs you to do so. Why? Because recursion is slow and takes up lot of memory if linked list is long. The second solution is also obvious. You go through each node in the list and adjust the pointers. Before proceeding it might be wise to ask the interviewer whether it's an singly or doubly linked list. Here we assume it's a singly linked list. The code is given below.



Node* reverse_list(Node *head) {
if(head == NULL)
return;

Node *current = head, *temp, *new_list = 0;

while(current != 0) {
temp = current->next;
current->next = new_list;
new_list = current;
current = temp;
}

//return new head
return new_list;

}