Sunday, November 30, 2008

Java Performance

I remember 12 years ago, when Java was on the rise, and it was seen as "too high level" and slow when compared to C and C++. The latter languages were the languages of choice when you wanted to achieve maximum performance.

Java, on the other hand, was seen back then as a high overhead alternative, due to its features like bytecode generation by the compiler, automatic memory management, references instead of pointers, and built in array bounds checks. At that time, those features looked like a costly overhead. The reasoning behind that point of view went more or less along these lines:

Java provides automatic array bounds verification, which means that it has to check the limits of the array at every access, therefore it has to be slower.


Java doesn't have pointers. There must be an overhead every time a reference is accessed, therefore Java is slower than C/C++.


Java has the garbage collector, therefore the virtual machine has to spend a long time finding unreachable objects, therefore Java is slower than C/C++.


Java compiles to bytecode, therefore it loses to C and C++ because those two languages compile to native code.


The reasoning above may have been true back then, in the pre-HotSpot era of Java 1.0 and 1.1, but since the introduction of the HotSpot virtual machine in java 1.3 those problems have been greatly reduced and in some cases Java now even beats C and C++ in terms of performance.

Memory management

The garbage collector overhead is ofter misunderstood. It is impossible to explain more than the basics of Java garbage collectors in a few paragraphs. I'll just scratch the surface of this topic, in order to give a very high level idea of the techniques employed by the virtual machine to optimize both memory allocation and collection.

The HotSpot VM divides the heap into three different areas: the young generation, the old generation and the permanent generation. The idea behind this division is the weak generational hypothesis: most recently allocated objects don't remain reachable for a long time, and there are few references from older objects to newer ones. New objects are allocated in a region of the young generation called the Eden, and objects from the young generation that survive a few collection cycles are tenured, or promoted to the old generation area. The young generation is garbage collected more frequently than the old generation, preventing the garbage collector from processing the whole heap unless it is necessary.

Because the Eden is always compacted, object allocation is very cheap in Java: according to Sun, it takes around 10 machine instructions to allocate a new object, through a technique called bump the pointer: just reserve the next n bytes for the object and increment the free space pointer by n. This is way cheaper than the free list management required by C/C++ malloc/free functions.

In most of the HotSpot garbage collectors the old generation is also compacted, which means that Java doesn't suffer from the C/C++ heap fragmentation problem, that happens when the heap has enough free space, but it is fragmented and not contiguously available.

Just in Time Compiler

The HotSpot virtual machine has a very sophisticated Just In Time compiler, or JIT, that employs several optimization techniques. The JIT detects hotspots, the areas of the code that get run most of the time, and focuses on compiling and optimizing only those, while leaving the rest to be run by the interpreter. While the hotspots are the areas of the code that get run more often, they cover only a fraction of the total application code, and as a result, the JIT can employ sophisticated optimization techniques on those areas without a big impact on program performance, on which it would incur if it had to compile and apply the same level of optimization on all the application code.

Among many other optimizations, the JIT compiler's optimizer provides range check elimination, which consists in optimizing away most if not all of the array bounds verification code. The final result is that you get the benefits of never risking to go past the limits of an array, without most of the overhead. If you think it is not a big deal, go talk to a C/C++ developer and ask him how much of his time is spent tracking down the cause of memory corruption.

The JIT is adaptive, which means that it monitors the program execution so that runtime information can be used to perform its optimizations. This is an advantage over languages that get compile before execution, since at that time no runtime information is available.

Object References Implementation

Java doesn't use handles to implement object references. Instead, all object references are implemented as actual C++ pointers. The result is that there is no overhead when accessing a reference, at the expense of having the garbage collector having to find and update object references whenever it moves objects around during garbage collection. Fortunately, this can be done very effectively.

Another advantage of using references instead of pointer is that it makes the life of the optimizer much easier. The existence of pointers in C++ is a barrier against more aggressive optimizations, because the compiler can't be sure about where a pointer has been copied to.

Benchmarks

Several benchmarks comparing C++ and Java exist on the web. The results are mixed: some show that Java is actually faster than C++, while most show that C++ is still faster than Java but by a small margin. The purpose of this post is to talk about the theory behind the Java optimization techniques, so I didn't set out to create my own benchmarks myself, but nothing like hard data to prove a point. So here are the links to some benchmarks found on the web:
  • The Java is Faster than C++ and C++ Sucks Unbiased Benchmark: despite the name, this benchmark shows very similar results between Java and C++, with the occasional scenario where C++ beats Java hands down.
  • The Java Faster than C++' Benchmark Revisited: someone who didn't like the benchmark above and found different results, where C++ has a clearer lead. Even so, Java is still close, wins some benchmarks, and is clearly slow only in a handful of tests.
  • The Computer Language Benchmarks Game: compares a number of programming languages using different algorithms. Gnu C++ and Java 6 are compared, and C++ wins most of the comparisons, but in most of the cases by a very close margin, and Java is the occasional winner in some of the tests.


Conclusion

The fears regarding Java performance could have been true when the language was introduced, but the enhancements made to the Sun JVM since then turned Java into a very fast platform. Java reached a level where its performance is very close to C and C++ in most applications. Those languages survive only on certain niches these days, like in the gaming industry and systems programming.

Fears of Java performance persist just because of a general lack of information on the optimization techniques that exist in the platform. Of course, no amount of performance optimization techniques will be sufficient if the developers don't do their part and pay attention to performance when writing their code.

More information

The Java HotSpot Performance Engine Architecture - provides a summary of the HotSpot performance optimization techniques.
Java Memory Management Whitepaper [pdf] - provides an overview of the Java Memory Management concepts and Garbage Collection implementation and techniques.

Sunday, November 2, 2008

Quotes from Randy Pausch

I've watched Randy Pausch's last lecture a few times and I always find I can learn something from it. It is a one hour session packed with many amazing life lessons. If you don't know who Randy Pausch is (from which planet are you from??), take the time to read about his story and watch his presentation.

I chose three quotes that even though they are no programming related, can still be applied to work on a team or within a company.

"When you're screwing up, and nobody is saying anything to you
anymore, that means they gave up."


Nobody likes to listen to criticism, but this is the best way to improve ourselves. Often the self image we have is totally different from the way other people see us, and this is why it is very important to hear what other people have to say about us so that we can get better and better.

"When you see yourself doing something badly and nobody's bothering to
tell you anymore, that's a very bad place to be. Your critics are the
ones telling you that they still love you and care"


Embrace criticism with an open mind. When you are failing at something, your enemies will not tell you anything. If somebody is taking the time to go through the embarrassment of talking to you about a flaw in your ways, it is because that person really cares about you and wants to see you get better.

"Experience is what you get when you didn't get what you wanted"


Love this one. When someone in my team is frustrated at a stand up meeting because they didn't make any progress at a problem, I tell them that they were able to eliminate all the dead ends they found, and this is already progress.

"Wait long enough and people will surprise and impress you. When you're pissed off at somebody and you're angry at them, you just haven't given them enough time".


I've seen this pattern many times at many places. I learned that the good people want to do things right. It is specially frustrating when you know that someone has the potential but does not deliver. But if you give that person enough time you will be pleasantly surprised.