Sunday, November 30, 2008

Java Performance

I remember 12 years ago, when Java was on the rise, and it was seen as "too high level" and slow when compared to C and C++. The latter languages were the languages of choice when you wanted to achieve maximum performance.

Java, on the other hand, was seen back then as a high overhead alternative, due to its features like bytecode generation by the compiler, automatic memory management, references instead of pointers, and built in array bounds checks. At that time, those features looked like a costly overhead. The reasoning behind that point of view went more or less along these lines:

Java provides automatic array bounds verification, which means that it has to check the limits of the array at every access, therefore it has to be slower.


Java doesn't have pointers. There must be an overhead every time a reference is accessed, therefore Java is slower than C/C++.


Java has the garbage collector, therefore the virtual machine has to spend a long time finding unreachable objects, therefore Java is slower than C/C++.


Java compiles to bytecode, therefore it loses to C and C++ because those two languages compile to native code.


The reasoning above may have been true back then, in the pre-HotSpot era of Java 1.0 and 1.1, but since the introduction of the HotSpot virtual machine in java 1.3 those problems have been greatly reduced and in some cases Java now even beats C and C++ in terms of performance.

Memory management

The garbage collector overhead is ofter misunderstood. It is impossible to explain more than the basics of Java garbage collectors in a few paragraphs. I'll just scratch the surface of this topic, in order to give a very high level idea of the techniques employed by the virtual machine to optimize both memory allocation and collection.

The HotSpot VM divides the heap into three different areas: the young generation, the old generation and the permanent generation. The idea behind this division is the weak generational hypothesis: most recently allocated objects don't remain reachable for a long time, and there are few references from older objects to newer ones. New objects are allocated in a region of the young generation called the Eden, and objects from the young generation that survive a few collection cycles are tenured, or promoted to the old generation area. The young generation is garbage collected more frequently than the old generation, preventing the garbage collector from processing the whole heap unless it is necessary.

Because the Eden is always compacted, object allocation is very cheap in Java: according to Sun, it takes around 10 machine instructions to allocate a new object, through a technique called bump the pointer: just reserve the next n bytes for the object and increment the free space pointer by n. This is way cheaper than the free list management required by C/C++ malloc/free functions.

In most of the HotSpot garbage collectors the old generation is also compacted, which means that Java doesn't suffer from the C/C++ heap fragmentation problem, that happens when the heap has enough free space, but it is fragmented and not contiguously available.

Just in Time Compiler

The HotSpot virtual machine has a very sophisticated Just In Time compiler, or JIT, that employs several optimization techniques. The JIT detects hotspots, the areas of the code that get run most of the time, and focuses on compiling and optimizing only those, while leaving the rest to be run by the interpreter. While the hotspots are the areas of the code that get run more often, they cover only a fraction of the total application code, and as a result, the JIT can employ sophisticated optimization techniques on those areas without a big impact on program performance, on which it would incur if it had to compile and apply the same level of optimization on all the application code.

Among many other optimizations, the JIT compiler's optimizer provides range check elimination, which consists in optimizing away most if not all of the array bounds verification code. The final result is that you get the benefits of never risking to go past the limits of an array, without most of the overhead. If you think it is not a big deal, go talk to a C/C++ developer and ask him how much of his time is spent tracking down the cause of memory corruption.

The JIT is adaptive, which means that it monitors the program execution so that runtime information can be used to perform its optimizations. This is an advantage over languages that get compile before execution, since at that time no runtime information is available.

Object References Implementation

Java doesn't use handles to implement object references. Instead, all object references are implemented as actual C++ pointers. The result is that there is no overhead when accessing a reference, at the expense of having the garbage collector having to find and update object references whenever it moves objects around during garbage collection. Fortunately, this can be done very effectively.

Another advantage of using references instead of pointer is that it makes the life of the optimizer much easier. The existence of pointers in C++ is a barrier against more aggressive optimizations, because the compiler can't be sure about where a pointer has been copied to.

Benchmarks

Several benchmarks comparing C++ and Java exist on the web. The results are mixed: some show that Java is actually faster than C++, while most show that C++ is still faster than Java but by a small margin. The purpose of this post is to talk about the theory behind the Java optimization techniques, so I didn't set out to create my own benchmarks myself, but nothing like hard data to prove a point. So here are the links to some benchmarks found on the web:
  • The Java is Faster than C++ and C++ Sucks Unbiased Benchmark: despite the name, this benchmark shows very similar results between Java and C++, with the occasional scenario where C++ beats Java hands down.
  • The Java Faster than C++' Benchmark Revisited: someone who didn't like the benchmark above and found different results, where C++ has a clearer lead. Even so, Java is still close, wins some benchmarks, and is clearly slow only in a handful of tests.
  • The Computer Language Benchmarks Game: compares a number of programming languages using different algorithms. Gnu C++ and Java 6 are compared, and C++ wins most of the comparisons, but in most of the cases by a very close margin, and Java is the occasional winner in some of the tests.


Conclusion

The fears regarding Java performance could have been true when the language was introduced, but the enhancements made to the Sun JVM since then turned Java into a very fast platform. Java reached a level where its performance is very close to C and C++ in most applications. Those languages survive only on certain niches these days, like in the gaming industry and systems programming.

Fears of Java performance persist just because of a general lack of information on the optimization techniques that exist in the platform. Of course, no amount of performance optimization techniques will be sufficient if the developers don't do their part and pay attention to performance when writing their code.

More information

The Java HotSpot Performance Engine Architecture - provides a summary of the HotSpot performance optimization techniques.
Java Memory Management Whitepaper [pdf] - provides an overview of the Java Memory Management concepts and Garbage Collection implementation and techniques.

14 comentários:

Jason Whaley said...

Great and informative post! I'll use this as additional ammo the next time I get another "java is slow" troll.

Casper Bang said...

@Jason, there is perceived performance (how long it takes to get a response) and there's raw performance (calculations pr. sec. when JIT'ed).

It's my experience critical people refer to the first one, and it very much falls into desktop vs. server usage of Java.

If only Sun had not ignored the desktop from 98' - 05'. That is, if Sun had started modularizing the JRE years ago, fixed the browser plugin and relied on more on native peers (like SWT), the general opinion might be somewhat different.

That's the legacy of Java on the desktop whether you like it or not. Why do you suppose a company like Google would prefer to pour money into Wine for a cross-platform experience, than writing Picasa and Google Earth in Java?

rzei said...

What was missing from the JIT part is escape analysis. I'm not sure if it's in JRE 6u10 yet, but will be in 7 at least.

Escape analysis makes sure that for example defensive programming (returning cloned/copy-constructed instances of private fields) might not really return a copy, if the code used after the getter in question doesn't modify the instance at all.

Arpad said...

Object allocation is not costly, object initialization is. Even a simple object holding 3 float-s takes long to initialize to 0. Object reuse is (still) the solution.

The JIT is able to eliminate only the most obvious bounds checks in loops. Eliminating bounds checks has also its runtime costs. The solution may be a hint annotation to enable more aggressive bounds check elimination.

The problem with GC is not the GC itself, but cost of object initialization, and that objects are laid out randomly in memory, thus when sequentially accessing objects in an array, the CPU cannot benefit from the cache. Accessing an array of primitives is way faster then accessing objects in an array. There is no stack allocation in Java. Escape analysis may prove to be the solution for that.

The overhead of objects makes working with lots of small objects inefficient. There is an RFE to have a less costly "struct"-like construct in Java, but its been there for years, without Sun implementing it, and Java conservatives don't like it.

Java still goes trough JNI for some math calculations, while C++ can take use of the newest SSE instruction sets. Automatic vectorization of loops and use of new instruction sets would be the solution. I doubt that a matrix library written in Java can ever be so efficient like a hand-optimized C++ matrix library. Unless such a library becomes part of the JDK itself and is implemented in up-to-date native code.

Once, Java had better threading than C++, but now OpenMP is supported by most C++ compilers, while Java doesn't have it.

Java came close to C++ at one moment, but its lagging behind again.

Martin Wildam said...

I was used to develop in Visual Basic. I guess that the performance impact of Java may be less than for VB.

From my experience with Java the only performance impact that can be seen with the "free eye" as the normal user is the startup delay of a Java application because of JVM loading in the background. However, it already got much better with Java 6 in relation to Java 5 - so my impression.

ECC said...

Last year, we have to developp a small application that transforms a "raw ticket" (paths in a graph in text format) generated by application A for application B to use it.
I wrote it in C++ and the average ticket processing time was 200ms. A friend wrote it in Java to proove it can perform the same and after 3 optimization sessions, his highscore was ... 1200ms !!!
It was only string processing ...
When performance is a hint, forget Java. Really.

Isaac Gouy said...

Domingos > The purpose of this post is to talk about the theory behind the Java optimization techniques... but nothing like hard data to prove a point. So here are the links to some benchmarks found on the web

You haven't tried to connect your specific comments on Java optimization techniques with specific example programs from those websites!

Have you actually looked at what those measured programs do?

Both the "The Java Faster than C++" websites are from 2004, and both are based on Doug Bagley's website which was last updated in 2001.

Back in 2004, The Computer Language Benchmarks Game started out with those same tests from Doug Bagley's website - but those tests have been replaced with new ones that do some work, and new implementations are still being contributed.


Domingos > ... C++ wins most of the comparisons, but in most of the cases by a very close margin ...

What does "very close margin" mean - 20%, 100%, 400% ?

Anonymous said...

It's not Java that has gotten so much faster. It's the hardware.

eCompositor said...

It's not java that is slow it is the people who make comments like the one above. Faster hardware makes both languages faster.

akira said...

Maybe a bit off-topic, but IMHO I always though that Sun made a very bad move when it was releasing Java. Most of the people still remembers Java from the times when those gray applet boxes used to freeze the whole browser. If Java were released as a way to distribute open source desktop applications via webstart for example, Sun could have avoided the negative psychological effect the old applets still have.

Anonymous said...

eCompositer > reduce the memory of the machines (to the need of a program written in C/C++) and use a Java program doing the same!
In most cases the Java-VM would not start at all!
So, if you have a big memory - use Java, in all other cases - use C/C++!

A big question for me is: why not compile the RT-packages of Java directly to binary and use them on starting faster? (first time start of the VM -> make a compile run and save the whole binary And use this binary as an input to the JIT)?

Casper Bang said...

"why not compile the RT-packages of Java directly to binary and use them on starting faster?"

Something along those lines actually happens today. The binary rt.jar is cached and can be copied into memory very fast without bytecode parsing and validation.
What I don't understand is why this mechanism isn't utilized more for libraries and 3'rd part applications as well. .NET can do something along those lines (compile to GAC) if instructed as far as I remember.

Anonymous said...

In a technical discussion I like to see quantitative comparisons with carefully investigated numbers.

As a java and C/C++ user (a programmer since 25 years ago) I cannot trust your judgment about C++ and Java.

As a scientist (doing mostly numerical and simulation works) I have accepted the fact that development of a Java software is easier while C++ will produce faster software.

I hate to see people try to ignore facts.

dleskov said...

@Anonymous: Today's Java implementations can be as fast as C++ even on number crunching.

@Casper Bang: Ahead-Of-Time compilation can greatly improve the perceived performance of Java apps on the desktop.