Project Valhalla Space-Time Continuum


Project Valhalla Space-Time Continuum



Background

Currently, for all its talents, Java at the lowest level cannot be as efficient in space and time measurements as a language such as C or C++ due to the simple fact it can only use object references and not linearize objects. This significantly affects applications with large data-sets where the overhead of object headers and references adds up. Project Valhalla will enable far more efficient storage (space) and processing (time) of large data-sets with the introduction of value-types.

For example, an array of 10M `PricePoint` objects (each of which has two 64-bit primitive fields) will take 320MB in Java but only 160MB in C++ (these numbers will be justified below). Furthermore processing every object in that array will likely cause many, many CPU cache misses with all the resultant slow down as main-memory is accessed to load the references.

Disclaimers

I hope to demonstrate equivalent Java and C++ code and run some benchmarks to back up these claims in a subsequent article.

I am only using C++ as a counterpoint to Java to demonstrate where Java falls down, and where Project Valhalla will improve the situation. I am not writing this article in praise of C++ over Java: I am far more productive in Java than I ever was in C++!

Example: Finding the maximum price of an array of PricePoint objects

Imagine a trading application that has many (in this example 10M) price changes and there is a need to find the maximum price of these prices. The pseudo-Java to achieve this follows. It uses old-school arrays and “for” loops to keep it and the C++ version as similar as possible. Any use of Java 8 streams (parallel or otherwise) or collection classes in either language wouldn't change the underlying point of this article.

class PricePoint {
   
double price; // A BigDecimal would be more correct for money
   
long timestamp; // e.g. millis since 1970-JAN-01

   
PricePoint(double price, long timestamp) {
       
this.price = price;
       
this.timestamp = timestamp;
    }
   
   
// .. constructor, accessors, etc, etc
}

class Test {
   
private static final int TEN_MILLION = 10_000_000;

   
static double findMaximumPrice(PricePoint[] pricePoints) {
       
double maxPrice = Double.MIN_VALUE;
       
for(int i=0; i < TEN_MILLION; i++) {
           
if (pricePoints[i].price > maxPrice) {
                maxPrice = pricePoints[i].
price;
            }
        }

       
return maxPrice;
    }

   
public static void main(String [] args) {
        PricePoint[] prices =
new PricePoint[TEN_MILLION];
       
for (int i=0; i < prices.length; i++) {
            prices[i] =
new PricePoint((double)i, (long)i);
        }

       
double maxPrice = findMaximumPrice(prices);

        System.
out.println("Max price = " + maxPrice);
    }
}

Analysis of Example Written in Java

Without going into every single detail of object layout, every single object (e.g. non-primitive) allocated in a JVM has the following structure taking memory: -
Header
32-bit VM = 8 bytes
64-bit JVM with < 32GB heap = 12 bytes
64-bit JVM with > 32GB heap = 16 bytes
Payload
For PricePoint = 16 bytes (2 * 64 bit primitives)

So each PricePoint takes 12 + 16 = 28 bytes (assuming a 64-bit JVM with < 32 GB heap).

An array `PricePoint[10,000,000]` would take: -
1.       10M * 4B = 40MB for the array itself.
2.       10M * 28B = 280MB for the PricePoint instances.
3.       Total = 320MB

In this case there is as much memory taken in overhead as for the raw data (160MB). This efficiency improves with classes that have more fields.

Heap Layout of Java Array
Heap Layout of Java Array


Execution of Java `findMaximumPrice`

As each PricePoint is not in a contiguous block with the array, each access of a PricePoint requires

  1. Getting the reference from the array for the index.
  2. Getting the PricePoint – this will likely cause a cache-miss and expensive fetch from main memory. 
  3.  Finally access the price field of the fetched PricePoint.

Optimisation of Java Version

An obvious optimisation is to denormalise the single array of PricePoint’s into separate double arrays for price and timestamp – arrays of primitives are laid out linerarly and won’t suffer from the dereferencing causing cache-miss and subsequent fetch.

This will improve performance but at the cost of breaking encapsulation and the semantic meaning of what a PricePoint is. As Donald Knuth said, “Premature optimisation is the root of all evil”!

Analysis of Example Written in C++

Note that in C++, the `findMaximumPrice` function would change to take the array by-reference: a direct translation as above would pass the entire array by-value, putting all 10M PricePoints onto the stack - not what we want at all!

With a plain-vanilla C++ array there is zero per-instance overhead, there is just the payload, i.e. each PricePoint just takes 16 bytes total for the price and timestamp.

A simple `PricePoint[] priceHistory` in C++ will be laid out as a single contiguous block of memory. There is no scattering of objects all over the heap as in Java: -

0
1
2
3
9,999,999
PricePoint
PricePoint
PricePoint
PricePoint
PricePoint

Execution of C++ `findMaximumPrice`

As the array is laid out linearly in one block, the C++ can be compiled using simple offset calculations.

It is likely that the CPU will fetch multiple PricePoints into the CPU cache, therefore the number of cache-hits will be higher than the Java version which will improve execution time.

Summary



Space
For this example, the C++ version takes half the memory of the Java version
Time
Because the C++ array arranges all the items in a contiguous block of memory, it is cache-efficient, especially on sequential access such as in the `findMaximumPrice` function.

Because Java has to store references and each referent will require a dereference, `findMaximumPrice` will cause lots of cache misses which will trigger expensive visits to main RAM for the data.

Conclusion

Java is inefficient accessing large data-sets. Project Valhalla will address the above issues allowing for performance equivalent to the C++ version.

In the second part of this article, I will explain why Project Valhalla can make Java as optimal as the C++ version.


Comments

Popular posts from this blog

Sleep, Lord's Prayer, Now and the Future

A Bit of Background