“We’ve come a long, long way together,” sang Fatboy Slim in the lyrics of one of my favorite songs ‘Praise You’. This phrase perfectly describes the evolution of software and hardware solutions that are bridging the gap between performance and cost, for organizations that require low latency processing and advanced analytics on their growing data sets in today’s millisecond economy.
As data continues to grow exponentially and organizations begin to understand that this data is among their greatest assets, new methods of extracting the highest value as soon as the data is born, are being implemented and integrated into business operations.
We’ve Definitely Come a Long, Long Way Together
In this article we will discuss the requirements and challenges faced by Enterprises today as they grapple with increasing amounts of data. We will outline how, smarter, faster insights and actions can be achieved via machine learning models that can run on a more complex feature vector and provide more accurate inferencing and scoring – including historical context to the real-time decision/business logic. The solution becomes more compelling as enterprises can save on hardware costs as well as reduce the hardware footprint while maintaining desired performance levels.
All of this can be made possible by combining the latest Intel® Optane™ DC Persistent Memory with GigaSpaces’ InsightEdge* application, providing customers with the required performance at an affordable cost, for uncompromised business results.
The Power of In-Memory Computing
Let’s start with a little background on in-memory computing.
“The digitalization of businesses generates an inexhaustible demand for faster performance, greater scalability and deeper real-time insight. This is boosting adoption of in-memory computing (IMC)- enabling technology.”
– Predicts 2018: In-Memory Computing Technologies Remain Pervasive as Adoption Grows, Gartner, 2017
More and more organizations are indeed leveraging in-memory computing in order to achieve real-time processing of their data and to obtain application scalability and high availability. In-memory computing is random access memory (RAM) based and provides ultra-low latency and high performance for mission critical time-sensitive applications. But since RAM is more expensive than disk storage, in-memory computing entails costs that may be prohibitive to many organizations.
Additionally, the usage of AI and machine learning models for instant insights and actions across use cases as diverse as fraud detection, location based advertising, dynamic pricing, predictive maintenance and more, requires access to both real-time and historical data from various sources (structured, semi-structured and unstructured) to reap smarter insights.
The ability to ingest, aggregate, process and instantly analyze the various types of data, both historically and in real time, and at millisecond speeds, will power enterprises to deliver the best, timely experiences to their customers.
Three Strategic Questions
With the growing volume and variety of data, along with velocity requirements for real-time processing and insights, organizations are seeking answers to the following strategic questions:
- How can we achieve speed and performance in data processing with high throughput with low latency?
- What is the best and easiest way to extract the greatest value from data – gaining real-time, actionable business insights?
- Where should we store data and how do we manage the data to enable the above in the most cost-efficient way?
Hardware Evolution to Meet Customers’ Requirements
Over the years, computer data has been managed and used by applications in two tiers: memory and storage. SSD technology along with smart control of multi-tier data storage has helped close the gap between the performance and cost. With Intel Optane DC persistent memory, a new, exciting and disruptive storage tier has been introduced to the market.
These persistent memory modules are based on Intel® 3D XPointTM technology and differ from the already-available Intel Optane SSDs (also based on the same memory media technology).
Persistent memory technologies allow development of products with the attributes of both storage and memory. The products are persistent, like storage, meaning they hold their content across power cycles, and they are byte-addressable, like memory, meaning programs can access data structures in-place. What really makes Intel persistent memory technology shine is that it is fast enough for the processor to access directly without stopping to do the block I/O required for traditional storage.
The Right Information in the Right Layer at the Right Time
Unlike traditional DRAM, Intel Optane DC persistent memory offers the unprecedented combination of high-capacity, affordability and persistence. By expanding affordable system memory capacities (greater than 3TB per CPU socket), end customers can use systems enabled with this new class of memory to better optimize their workloads by moving and maintaining larger amounts of data closer to the processor and minimizing the higher latency of fetching data from system storage.
Instant Insights to Actions – Combining InsightEdge with Intel Optane DC Persistent Memory
From a business value perspective, in order to capture the art of machine learning, a lot has to happen in real time. To become insight driven or insight-centric, an organization must get from data to analytics to action with sub-second latency in the pipeline. Ultimately, if a firm wants to continue to grow and be competitive, it must turn to unifying analytical and transactional processing – that is, to run real-time analytics on hot data as it is born, and enrich it with historical context.
Traditional databases work well for business intelligence analytics, which are mostly historical or retrospective in nature. Businesses run reports, which are usually predefined questions, against their data. How many sales did we have over the past 30 days? What are the sources of our revenue? What zip code/region/state/country generated the most revenue or the most profit? However, the analytics world moves much faster than this. It’s no longer sufficient to derive business insights.
In contrast to the traditional computing paradigm of moving data to a separate database, doing some processing, and then saving it back to the data store, with In-Memory Computing, everything is put in the in-memory data grid and distributed across a horizontally scalable architecture. This is accomplished at low latency, because you’ve taken out the disk I/O that prevents workloads and mixed heterogeneous workloads from happening in real time.
When running Machine Learning and Deep learning models on data, accurate insights require access to hot, warm and cold data (real-time, recent and historical, respectively). A multi-tiered data storage approach, managed with InsightEdge, an intelligent software platform that can seamlessly connect these tiers, meets the needs of enterprises looking to incorporate ML into their operations.
The latest Intel Optane DC persistent memory is an ideal way to solve the issue. Layering this memory technology between SSD and DRAM is the ultimate solution for fast yet affordable data access.
This transition keeps more data closer to memory, making it more readily available for AI initiatives. It also offers the ability to perform significant write operations while concurrently reading data.
InsightEdge – Intelligent In-Memory Software Platform for Optimized Results
DRAM has an upper limit regarding the amount of data it can store in a cost-effective manner, so complementary enablers must be considered. The GigaSpaces MemoryXtend module for InsightEdge* enables a multi-tiered data storage architecture that creates a hybrid storage model where the relevant data (hot data) which is defined by the application, is stored in RAM and other data (warm and cold) resides on Intel Optane DC persistent memory and Intel Optane SSD, as well as external data lakes (e.g. Amazon* S3/HDFS) respectively. This provides an optimal way to manage large amounts of data according to the needs of each business application where the data can be automatically moved to RAM in real-time according to the changing conditions.
Unlike traditional databases, InsightEdge can handle massive workloads and processing tasks, ultimately pushing big-data store (e.g. Hadoop) asynchronous replication to the background to put the multi-petabytes in cold storage but still having seamless access to that data for enriching real-time data when running machine learning models.
In order to preserve performance, MemoryXtend holds the indexes of the data objects in on-heap memory, off-heap memory or on the high performance Intel Optane DC persistent memory. Since this memory technology will be available in larger capacities than DRAM is today, and is expected to be lower priced, we estimate a significant reduction in server footprint, while retaining in-memory performance. Functioning as an extension to the in-memory data grid, MemoryXtend provides the same InsightEdge application-level data abstraction with a variety of data access APIs (Object/SQL, JDBC, JCache, JMS, and NoSQL).*
Users can define custom queries that determine what data should be loaded into RAM upon initialization, to ensure that ‘hot’ data is always available at the fastest possible access speed. Data objects that don’t match the custom queries (‘cold data’) are held in the less costly storage tiers, where they can still be accessed quickly.
By utilizing advanced off-heap persistence and customizable, business driven configurations of “hot data” per application, the grid can scale out into the multi-tera range at relatively low cost while leveraging the petabytes of historical data on data lakes. At the same time, customers can maintain the performance required per application.
Cross Industry Examples of Customized Multi-tier Data Storage Preferences
Customers across different industries can leverage customized multi-tier data storage for different applications. Here are a few examples.
Transportation organizations need to be able to continually analyze their data and make changes on the fly for strategic applications that forecast routes, predict delays, manage personnel schedules, personalize offers and optimize pricing policies. In order to be agile and smarter in their operations and services, all relevant data must be accessed and analyzed in real-time to be able to instantly act upon insights.
For example, machine learning models can run in real-time on the latest data coming in (weather, locations of planes, availability of staff, operational data and more). The models can access historical data to understand how weather changes as well as other parameters such as personnel sickness and vacation patterns. Access to the data that is required is automatically made accessible for the faster data layers. These insights then can influence business applications and services instantly.
Think of how this optimized control can work in other industries such as Finance, Insurance and Retail. For example in Finance, a trading application can benefit with the ability to prioritize data to be stored in RAM which meet defined parameters or metrics such as stock price or volume, data from social or news feeds and others. Access to historical information correlates how the stock has performed in the last 30 days. The data that meets defined values is automatically moved to the fastest layer so that it can be analyzed at sub-second latency.
How Far Have we Come Together?
Gigaspaces and Intel have had a longstanding collaboration that has enabled customers to reap the benefits of the same. Our latest partnership around Intel Optane DC persistent memory and the GigaSpaces InsightEdge application can provide customers with the required performance and cost optimization for uncompromised business results:
- Affordable, in-memory extreme performance
- Anticipated reduction in required server hardware by due to higher available memory capacities, which decreases footprint, power, maintenance, software licenses, network and other overhead costs.
- Optimized manageability via less networking, data movement, security configuration.
- Smarter, faster insights and actions allowing machine learning models to run on a more complex feature vector and provide more accurate inferencing and scoring, including historical context to the real-time decision/business logic.
- Faster access to historical data lakes (Hadoop*/Amazon S3) via smart indexing leveraging the Intel Optane DC persistent memory layer.
- More customization opportunities and flexibility according to application priorities are available with InsightEdge’s business driven intelligent tiered storage module.
- Minutes-to-seconds recovery time as there is no need for an additional persistence layer.
Don’t Be Left Behind
AI boasts transformative abilities, but there are hurdles that must be overcome to move beyond theoretical value and into tangible results. Enterprise architectures today are not yet optimally equipped to meet the data intensive needs of true AI solutions.
However, a new storage class memory component like Intel Optane DC persistent memory combined with intelligent in-memory software platforms like GigaSpaces InsightEdge are closing the gap.
As we’ve seen happen so many times when businesses are too slow to adopt new technologies, they often fall behind the curve.
*Other brands and names may be claimed by others