How to Prioritize Hot Data Selection to Maximize RAM Usage

In this post we will address the growing need for a faster loading process for your most important data and the benefits of customizing Hot Data Selection for your applications. We will also review a real world example of an application of GigaSpaces MemoryXtend, and how it can help you achieve optimum business results.


With growing market demands for faster processing and constantly growing datasets, more and more companies are turning to in-memory computing in order to achieve real-time processing and benefit from application scalability and high availability. In-memory computing is RAM-based, and since RAM can be expensive, in-memory computing may seem costly to many organizations.

Thus, IT architects and decision-makers are facing a performance dilemma between memory and storage. Businesses need a way to serve applications at the same time that data is being analyzed— in a way that will not impact performance or require maintaining multiple infrastructures. The answer is GigaSpaces InsightEdge Platform, powered by Intel® architecture.

Combining InsightEdge Platform, Intel® Xeon® Scalable processors, and Intel® Optane™ solid state drives (SSDs) with integrated BigDL makes it easy to innovate real-time analytics and AI apps with low risk, low total cost of ownership (TCO), and high agility—now and into the future.

With the ability to define your organization’s priority data to be saved in RAM along with the latest innovative technologies, such as Intel Optane technology, exciting opportunities for in-memory computing solutions are now available.

High latency read is a problem every developer faces, with every application. Using the key values methodology is perfectly fine when you don’t have too much data or if you have only simple data. However, if you need to cache large amounts of data, then you definitely require a faster solution that can handle such volume, with easy query capabilities.

How to Maximize RAM and Control TCO

There’s a better solution to help you make the most of your RAM and control TCO and to make sure that valuable RAM space is used for valuable data. With customized hot data selection, you can take control and utilize RAM for the data that you deem as critical while saving (evicting) the desired data to multi-tiered data storage such as SSD and Storage-Class Memory (3DXPoint).

Simple Customized Hot Data Selection with MemoryXtend

With the new GigaSpaces’ MemoryXtend Customized Hot Data Selection feature, you can easily define the exact data you need for easy, quick load.

MemoryXtend for Flash/SSD delivers built-in high-speed persistence leveraging local or attached SSD devices and all-flash-arrays (AFA). It delivers low latency write and read performance, as well as fast data recovery.

MemoryXtend is based on RocksDB, which is optimized for fast storage environments.

In the MemoryXtend architecture, data is stored across two tiers: a space partition instance (managed JVM heap) and an embedded key/value store (the blob store). MemoryXtend comes with a built-in blob store cache. This cache is part of the space partition tier, and stores objects in their native form.

This cache is part of the space partition tier, and stores objects in their native form.

The space partition instance acts as an LRU cache against the underlying blob store. The space stores the data blueprint (indices, class metadata, etc.). Upon a space read operation:

  1. If the object exists in the JVM heap (including the blob store cache), it will be immediately returned to the client. This is known as a cache hit.
  2. Otherwise, the space will load it from the underlying blob store and place it on the JVM heap. This is known as a cache miss.

MemoryXtend users can customize which data will be stored in the blob store cache. By defining “hot data” criteria, only data that matches the criteria would be stored. Up until now, this customization was relevant only for initial load stage. On apps start/restart, persisted hot data would be loaded to the cache.

We have now expanded this customization to operate throughout the applications operation stages. This means that at any given time only custom hot data will be stored in cache.

What is a Customized Hot Data Selection?

In most cases, a data grid stores the result of recent data queries, assuming that these queries will be repeated in the near future and that they are all equally important. For example, the grid may hold critical data for 3,000 stock prices, while an application may query data such as ‘Get me all inactive users’, which is no longer in the RAM. If you have not defined a customized Hot Data Selection, then such a query will flush critical data from RAM and replace it with the less important ‘inactive user’ data from the latest query.

When customizing your Hot Data Selection, important queries and data are predefined and prioritized to ensure fast and predictable results according to your business goals.

Four Top Benefits of MemoryXtend Customized Hot Data Selection

  1. Reduction of the loading process time
  2. Partitioning and replication of data for consistency, high availability, stability, and partition tolerance
  3. Creation of a real hierarchy of data, even for super complex data. MemoryXtend allows you to index your data in RAM, thus keeping the structure of the objects and nested data
  4. Expansion of the initial load feature, allowing you to define exactly which hot data will be cached in RAM

Customizing the Blob Store Cache: A Financial Use Case

A typical scenario for Customized Hot Data Selection can be seen in eCommerce (Black Friday for example), transportation, and financial industries. In this post we will focus on a trading use case to help better understand the use and benefits of customizing caching. is a hypothetical online NYSE trade platform. Millions of users buy and sell stocks, stock options, and bonds, based on their current price. The platform generates and consumes multiple data types, from user personal information to stock analysis. Not all data is created equal. Certain data, like stock data, will be in much higher demand. The platform designers decide to store this high-demand data in a fast storage.

In our example, “hot data” fits any the following criteria:

  1. Stocks with trade volumes above $100M
  2. Stock with a market value above $1B
  3. Bonds with names that start with “super”

The POJOs (getters and setters are omitted for brevity):

class Stock{
String name;
double price;
double volume;
double marketValue;

class Bond{
String name;
double price;
double volume;
String rating;

TradeWallStreet can customize which data will be stored in the blob store cache. This is done by defining a set of SQL queries:

BlobStoreStorageHandler rocksDbStorageHandler = new RocksDBBlobStoreConfigurer()

BlobStoreDataCachePolicy blobStorePolicy = new BlobStoreDataCachePolicy()
.addCacheQuery(new SQLQuery(Stock.class, "volume >= 100"))
.addCacheQuery(new SQLQuery(Stock.class, "marketValue >= 1000"))
.addCacheQuery(new SQLQuery(Bond.class, "name LIKE 'super'"));

GigaSpace gigaSpace = new GigaSpaceConfigurer(new EmbeddedSpaceConfigurer("mySpace")


Objects that fit one of the queries will be stored in the blob store cache.

Monitoring the Blob Store Cache

In read operations, different scenarios might occur:

  1. Reading hot data that is stored in RAM – a cache hit
  2. Reading cold data – a cold data cache miss
  3. Reading hot data that is NOT stored in RAM – a hot data cache miss. This scenario might occur when the RAM is full.

InsightEdge Platform and XAP expose the following metrics to monitor the blob store cache

space_blobstore_cache-size – number of objects stored in cache

space_blobstore_cache-hit – number of cache hits

space_blobstore_cache-miss – total cache misses (hot_data_misses + cold_data_misses)

space_blobstore_hot-data-cache-miss – subgroup of cache misses.

For directions on how to configure these metrics follow this link.

Final Thoughts

InsightEdge and XAP support datasets of tens and even hundreds of terabytes and enable storing, processing, and analyzing data at speeds comparable to RAM, with significantly reduced TCO and a decreased server footprint.

Combined with the unification of transactional and analytical processing (HTAP), your data engineers, application developers, and scientists can operate on the same data set, right when it’s born, and at massive scale – paving the way for your organization to become insight-driven.

Each industry and organization has different needs for its data. You have to find the right fit for you.

In our GigaSpaces version 12.3 we introduced several new significant features MemoryXtend™.  Explore the many benefits GigaSpaces MemoryXtend Customized Hot Data Selection has to offer by downloading InsightEdge Platform 12.3, and try it out for free here.