I am interested in Java performance issues for some time. Now we look not so much at the Java code performance but at Garbage Collection process performance. I am inspired by a lecture from a Warsaw JUG group, which was about Hot Spot in low latency Java. Lecturer Wojciech Kudla prepared a simple small application to simulate HFT system in GC point of view. All the source code with great GC Viewer (plugin for JVisualVM) is available on thier GitHub: heaptrasher.

In a nutshell heaptrasher is an application which is used to generate a lot of garbage to GC operate on. In addition it contains some code to generate histogram with latency statistics. Mentioned statistics data can be collected in two different ways. Worth mentioning is that the first one (array) allocate a lot of memory in Old Generation space and the second does not. I was later to find, it has a considerable impact on GC performance.

All charts bellow are from JVisualVM with mentioned GC Viewer and VisualGC plugins.

Test platform

CPU: i7-3612QM
RAM: 8GB
OS: Windows 7 Ultimate
JDK: 1.7.0_u21

A common feature of all the tests is to use parallel GC for Young Space collections. The GC for Old Generation doesn't matter because in this case Old Generation collection will never happen.

First test

Description

4GB heap, array as statistics data holder

Command line

java -XX:+UnlockDiagnosticVMOptions -Xmx4g -Xms4g -XX:+UseParallelGC Main array

Garbage Collector pauses duration

Garbage Collections times

Message latency

 50,0000% of message latency is less than: 1466 nanos
 75,0000% of message latency is less than: 2443 nanos
 90,0000% of message latency is less than: 3421 nanos
 99,0000% of message latency is less than: 4888 nanos
 99,9000% of message latency is less than: 11241 nanos
 99,9900% of message latency is less than: 19549 nanos
 99,9990% of message latency is less than: 75263 nanos
 99,9999% of message latency is less than: 3480174 nanos

Summary

6,5 ms pauses? It isn't low latency :(. The graph is also very irregular. 99,9% messaged was processed in below 11241 nanos.

Second test

Description

4GB heap, direct statistics data collection

Command line

java -XX:+UnlockDiagnosticVMOptions -Xmx4g -Xms4g -XX:+UseParallelGC Main direct

Garbage Collector pauses duration

Garbage Collections times

Message latency

 50,0000% of message latency is less than: 1466 nanos
 75,0000% of message latency is less than: 2443 nanos
 90,0000% of message latency is less than: 2933 nanos
 99,0000% of message latency is less than: 4887 nanos
 99,9000% of message latency is less than: 11240 nanos
 99,9900% of message latency is less than: 19060 nanos
 99,9990% of message latency is less than: 82104 nanos
 99,9999% of message latency is less than: 762892 nanos

Summary

Hmmm much better, isn't it? It appears that Old Generation is major low-latency apps enemy... The graph is irregular like in first case. Change does not affected the 99,9% of messages process time.

Third test

Description

4GB heap, direct statistics data collection, 3GB NewSize

Command line

java -XX:+UnlockDiagnosticVMOptions -Xmx4g -Xms4g -XX:MaxNewSize=3g -XX:NewSize=3g -XX:+UseParallelGC Main direct

Garbage Collector pauses duration

Garbage Collections times

Message latency

 50,0000% of message latency is less than: 1466 nanos
 75,0000% of message latency is less than: 2443 nanos
 90,0000% of message latency is less than: 3421 nanos
 99,0000% of message latency is less than: 4888 nanos
 99,9000% of message latency is less than: 11241 nanos
 99,9900% of message latency is less than: 20038 nanos
 99,9990% of message latency is less than: 69887 nanos
 99,9999% of message latency is less than: 652930 nanos

Summary

In general changes does not affected a single collection time, but the entire collections count is more than 2 times less. It is interesting that increase Young Generation space does not affected single collection time. It is just because app generates a lot of short lived objects, that means GC has relative small graph of living objects to check. One more time 99,9% of messages are processed with the same latency.

Third test

Description

4GB heap, direct statistics data collection, 3GB NewSize, number_of_gc_threads=cpu_core_number-1

Command line

java -XX:+UnlockDiagnosticVMOptions -Xmx4g -Xms4g -XX:MaxNewSize=3g -XX:NewSize=3g -XX:ParallelGCThreads=7 -XX:+UseGCTaskAffinity -XX:+BindGCTaskThreadsToCPUs Main direct

Garbage Collector pauses duration

Garbage Collections times

Message latency

 50,0000% of message latency is less than: 1466 nanos
 75,0000% of message latency is less than: 2443 nanos
 90,0000% of message latency is less than: 3421 nanos
 99,0000% of message latency is less than: 5375 nanos
 99,9000% of message latency is less than: 11729 nanos
 99,9900% of message latency is less than: 21015 nanos
 99,9990% of message latency is less than: 70865 nanos
 99,9999% of message latency is less than: 612855 nanos

Summary

It isn't a lot of changes in GC pause times. Must admit that chart is more regular. We have also the first time for a long time change in 99,9% messages processing time.

Fourth test

Description

4GB heap, direct statistics data collection, 3GB NewSize, number_of_gc_threads=cpu_core_number-1 with ParGCCardsPerStrideChunk

Command line

java -XX:+UnlockDiagnosticVMOptions -Xmx4g -Xms4g -XX:MaxNewSize=3g -XX:NewSize=3g -XX:ParallelGCThreads=7 -XX:+UseGCTaskAffinity -XX:+BindGCTaskThreadsToCPUs -XX:ParGCCardsPerStrideChunk=32768 Main direct

Garbage Collector pauses duration

Garbage Collections times

Message latency

 50,0000% of message latency is less than: 1466 nanos
 75,0000% of message latency is less than: 2443 nanos
 90,0000% of message latency is less than: 3421 nanos
 99,0000% of message latency is less than: 5375 nanos
 99,9000% of message latency is less than: 10752 nanos
 99,9900% of message latency is less than: 21503 nanos
 99,9990% of message latency is less than: 76729 nanos
 99,9999% of message latency is less than: 629960 nanos

Summary

It look almost the same as last time but we have significant better performance in 99,9% messages process time.

Summary

Durning the presentation lecturer use Linux based platform. During presentation the lecturer was used Linux based platform. In Linux it look that individual switches has more impact for collections time. A little summary here:

Test number	Collections number	Collections time	Avg. collection time	99,90% latency less than	99,99% latency less than
first	126	706,342 ms	5,60 ms	11241 nanos	19549 nanos
second	126	92,002 ms	0,73 ms	11240 nanos	19560 nanos
third	57	53,738 ms	0,94 ms	11729 nanos	21015 nanos
fourth	57	46,508 ms	0,81 ms	10752 nanos	21503 nanos

Warsaw JUG meeting

More info???

3 comments :

RyanMay 24, 2013 at 6:19 AM
Does your app intend to run for any extended period of time? If so, ignoring old gen and calling it GC tuning is like getting a girls phone number and saying you are engaged.

If your app isn't going to run for a long time - just give it a heap so humongous that it never needs to do any GC (young or otherwise) and you would have no GC pauses
UnknownJuly 8, 2013 at 9:37 AM
JVM Architecture Specification Basic The Heap Area Introduction, teach about the JVM Heap Area in details
http://www.youtube.com/watch?v=c-A7ZzxjWUI

JVM Architecture Specification Basic The Method Area explained, teach about the JVM method area
http://www.youtube.com/watch?v=a5GzF2fSSCE
Alexandre VerriJanuary 31, 2014 at 5:45 AM
This is an excellent information. I was using Oracke JRockit in my trading app, it has a very nice GC behavior with low pauses. Unfortunately, it crashes a lot. I am testing the JVM switches that you shared.

Wednesday, May 22, 2013

Java GC tuning for High Frequency Trading apps

Test platform

First test

Description

Command line

Garbage Collector pauses duration

Garbage Collections times

Message latency

Summary

Second test

Description

Command line

Garbage Collector pauses duration

Garbage Collections times

Message latency

Summary

Third test

Description

Command line

Garbage Collector pauses duration

Garbage Collections times

Message latency

Summary

Third test

Description

Command line

Garbage Collector pauses duration

Garbage Collections times

Message latency

Summary

Fourth test

Description

Command line

Garbage Collector pauses duration

Garbage Collections times

Message latency

Summary

Summary

Warsaw JUG meeting

More info???

3 comments :