Project Valhalla Space-Time Continuum
Project
Valhalla Space-Time Continuum
Background
Currently, for all its talents, Java at the lowest level cannot
be as efficient in space and time measurements as a language such as C or C++
due to the simple fact it can only use object references and not linearize
objects. This significantly affects applications with large data-sets where the
overhead of object headers and references adds up. Project Valhalla will
enable far more efficient storage (space) and processing (time) of large
data-sets with the introduction of value-types.
For example, an array of 10M `PricePoint` objects (each of which has two
64-bit primitive fields) will take 320MB in Java but only 160MB in C++ (these
numbers will be justified below). Furthermore processing every object in that
array will likely cause many, many CPU cache misses with all the resultant slow
down as main-memory is accessed to load the references.
Disclaimers
I hope to demonstrate equivalent Java and C++ code and run
some benchmarks to back up these claims in a subsequent article.
I am only using C++ as a counterpoint to Java to demonstrate
where Java falls down, and where Project Valhalla will improve the situation. I
am not writing this article in praise of C++ over Java: I am far more
productive in Java than I ever was in C++!
Example: Finding the maximum price of an array of PricePoint objects
Imagine a trading application that has many (in this example
10M) price changes and there is a need to find the maximum price of these
prices. The pseudo-Java to achieve this follows. It uses old-school arrays and “for”
loops to keep it and the C++ version as similar as possible. Any use of Java 8
streams (parallel or otherwise) or collection classes in either language
wouldn't change the underlying point of this article.
class PricePoint {
double price; // A BigDecimal would be more correct for money
long timestamp; // e.g. millis since 1970-JAN-01
PricePoint(double price, long timestamp) {
this.price = price;
this.timestamp = timestamp;
}
// .. constructor, accessors, etc, etc
}
class Test {
private static final int TEN_MILLION = 10_000_000;
static double findMaximumPrice(PricePoint[] pricePoints) {
double maxPrice = Double.MIN_VALUE;
for(int i=0; i < TEN_MILLION; i++) {
if (pricePoints[i].price > maxPrice) {
maxPrice = pricePoints[i].price;
}
}
return maxPrice;
}
public static void main(String [] args) {
PricePoint[] prices = new PricePoint[TEN_MILLION];
for (int i=0; i < prices.length; i++) {
prices[i] = new PricePoint((double)i, (long)i);
}
double maxPrice = findMaximumPrice(prices);
System.out.println("Max price = " + maxPrice);
}
}
double price; // A BigDecimal would be more correct for money
long timestamp; // e.g. millis since 1970-JAN-01
PricePoint(double price, long timestamp) {
this.price = price;
this.timestamp = timestamp;
}
// .. constructor, accessors, etc, etc
}
class Test {
private static final int TEN_MILLION = 10_000_000;
static double findMaximumPrice(PricePoint[] pricePoints) {
double maxPrice = Double.MIN_VALUE;
for(int i=0; i < TEN_MILLION; i++) {
if (pricePoints[i].price > maxPrice) {
maxPrice = pricePoints[i].price;
}
}
return maxPrice;
}
public static void main(String [] args) {
PricePoint[] prices = new PricePoint[TEN_MILLION];
for (int i=0; i < prices.length; i++) {
prices[i] = new PricePoint((double)i, (long)i);
}
double maxPrice = findMaximumPrice(prices);
System.out.println("Max price = " + maxPrice);
}
}
Analysis of Example Written in Java
Without going into
every single detail of object layout, every single object (e.g. non-primitive)
allocated in a JVM has the following structure taking memory: -
Header
|
32-bit VM = 8
bytes
64-bit JVM with
< 32GB heap = 12 bytes
64-bit JVM with
> 32GB heap = 16 bytes
|
Payload
|
For PricePoint = 16
bytes (2 * 64 bit primitives)
|
So each PricePoint
takes 12 + 16 = 28 bytes (assuming a 64-bit JVM with < 32 GB heap).
An array
`PricePoint[10,000,000]` would take: -
1.
10M * 4B = 40MB for the array itself.
2.
10M * 28B = 280MB for the PricePoint instances.
3.
Total = 320MB
In
this case there is as much memory taken in overhead as for the raw data (160MB).
This efficiency improves with classes that have more fields.
As each PricePoint is not in a contiguous block with the
array, each access of a PricePoint requires
- Getting the reference from the array for the index.
- Getting the PricePoint – this will likely cause a cache-miss and expensive fetch from main memory.
- Finally access the price field of the fetched PricePoint.
Optimisation of Java Version
An obvious optimisation is to denormalise the single array of
PricePoint’s into separate double arrays for price and timestamp – arrays of primitives
are laid out linerarly and won’t suffer from the dereferencing causing cache-miss
and subsequent fetch.
This will improve performance but at the cost of breaking
encapsulation and the semantic meaning of what a PricePoint is. As Donald Knuth
said, “Premature optimisation is the root of all evil”!
Analysis of Example Written in C++
Note that in C++,
the `findMaximumPrice` function would change to take the array by-reference: a
direct translation as above would pass the entire array by-value, putting all
10M PricePoints onto the stack - not what we want at all!
With a plain-vanilla C++ array there is
zero per-instance overhead, there is just the payload, i.e. each PricePoint
just takes 16 bytes total for the price and timestamp.
A simple
`PricePoint[] priceHistory` in C++ will be laid out as a single contiguous
block of memory. There is no scattering of objects all over the heap as in
Java: -
0
|
1
|
2
|
3
|
…
|
9,999,999
|
PricePoint
|
PricePoint
|
PricePoint
|
PricePoint
|
PricePoint
|
Execution of C++ `findMaximumPrice`
As the array is laid out linearly in one block, the C++ can be
compiled using simple offset calculations.
It is likely that the CPU will fetch multiple PricePoints into
the CPU cache, therefore the number of cache-hits will be higher than the Java
version which will improve execution time.
Summary
Space
|
For this example, the C++ version takes
half the memory of the Java version
|
Time
|
Because the C++ array arranges all the
items in a contiguous block of memory, it is cache-efficient,
especially on sequential access such as in the `findMaximumPrice` function.
Because Java has to store references and
each referent will require a dereference, `findMaximumPrice` will cause lots
of cache misses which will trigger expensive visits to main RAM for the data.
|
Conclusion
Java is inefficient accessing large
data-sets. Project Valhalla will address the above issues allowing for
performance equivalent to the C++ version.
In the second part of this article, I will
explain why Project Valhalla can make Java as optimal as the C++ version.
Comments
Post a Comment